Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Computer Science 9618/12

8.1 Database Concepts


1. Flat file

Originally all data were held in files. A typical file would consist of a large
number of records each of which would consist of a number of fields. This led
to very large files that were difficult to process.

Field Name Data Type


Description String
Cost Price Currency
Selling Price Currency
Number in Stock Integer
Reorder Level Integer
Supplier Name String
Supplier Address String

Further ad hoc enquiries are virtually impossible.

This means that a match is very difficult because the data is inconsistent.

Another problem is that each time a new product is added to the database both
the Name and address of the supplier must be entered.

This leads to redundant data or data duplication.

Fig. Shows how data can be proliferated when each department keeps its own files.

File containing Stock


Programs to
Code, Description,
Purchasing place orders
Re-order level, Cost
Department when stocks are
Price, Sale Price
low
Supplier name and
address, etc

File containing Stock


Programs to Code, Description,
Sales record orders Number sold, Sale
Department from customers Price, Customer name
and address, etc.

File containing
Programs to Customer name and
Accounts record accounts address, amount
Department of customers owing, dates of orders,
Fig.1 etc.

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

This method of keeping data uses flat files. Flat files have the following
limitations.

 Separation and isolation of data

As data is held in separate files so difficult find related data.

 Duplication of data

Same data is held in more than two one files

 Duplication is wasteful as it costs time and money. Data has to be


entered more than once; therefore, it takes up time and more space.

 Duplication leads to loss of data integrity.

 Data dependence

Data formats are defined in the application programs. If there is a


need to change any of these formats, whole programs have to be
changed.

Different applications may hold the data in different forms, again


causing a problem. Incompatibility of files

 Fixed queries and the proliferation of application programs

File processing was a huge advance on manual processing of


queries. This led to end-users wanting more and more information.
To try to overcome the search problems of sequential files, two types of
database were introduced.

Data redundancy: the same data stored more than once


Flat file

A flat file database is a database that stores data in a plain text file. Each
line of the text file holds one record, with fields separated by delimiters,
such as commas or tabs. While it uses a simple structure, a flat file
database cannot contain multiple tables like a relational database can.

Or

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

A flat file database is a type of database that stores data in a single


table. This is unlike a relational database, which makes use of
multiple tables and relations. Flat file databases are generally in
plain-text form, where each line holds only one record. The fields in
the record are separated using delimiters such as tabs and commas.

Advantages of Using a Relational Database (RDB)

Advantage Notes
Control of data redundancy Flat files have a great deal of data redundancy that
is removed using a RDB.
Consistency of data There is only one copy of the data so there is less
chance of data inconsistencies occurring.
Data sharing The data belongs to the whole organisation, not to
individual departments.
More information Data sharing by departments means that
departments can use other department's data to find
information.
Improved data integrity Data is consistent and valid.

Improved security The database administrator (DBA) can define data


security – who has access to what. This is enforced
by the Database Management System (DBMS).
Enforcement of standards The DBA can set and enforce standards. These
may be departmental, organisational, national or
international.
Economy of scale Centralisation means that it is possible to
economise on size. One very large computer can
be used with dumb terminals or a network of
computers can be used.
Improved data accessibility This is because data is shared.

Increased productivity The DBMS provides file handling processes instead


of each application having to have its own
procedures.
Improved maintenance Changes to the database do not cause applications
to be re-written.
Improved back-up and recovery DBMSs automatically handle back-up and
recovery. There is no need for somebody to
remember to back-up the database each day, week
or month.

Disadvantages are not in the Specification for this Module.

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

2. Relational Database
In the relational database model, each item of data is stored in a relation
which is a special type of table. The strange choice of name comes from a
mathematical theory. A relational database is a collection of relational
tables. When a table is created in a relational database it is first given a
name and then the attributes are named. In a database design, a table
would be given a name with the attribute names listed in brackets aft er
the table name. For example

A fundamental principle of a relational database is that a tuple is a set of


‘atomic’ values; each attribute has one value or no value.

Entity
Database entity is a thing, person, place, unit, object or any item about which
the data should be captured or stored in the form of properties or attribute.
Properties or attributes are required (because entity without properties is not
an entity).

Table

A database table is a structure that organises data into rows and columns
– forming a grid

Relation: the special type of table which is used in a relational


database
Attribute: a column in a relation that contains values
Tuple: a row in a relation storing data for one instance of the
relation

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

Primary key:

An attribute or a combination of attributes for which there is a value in


each tuple and that value is unique

Candidate key:

A key that could be chosen as the primary key

Secondary key:

A candidate key that has not been chosen as the primary key

Foreign key:

An attribute in one table that refers to the primary key in another table
Primary key

The key used to uniquely identify a record or tuple is called the primary
key.

In some cases, more than one attribute, or group of attributes, could act as
the primary key. Suppose we have the relation

Composite / Compound key

A key may consist of two attributes, in which case it is called a composite


/ compound key.

Example

EMP(EmpID, NINumber, Name, Address)

Clearly, EmpID could act as the primary key. However, NINumber could
also act as the primary key as it is unique for each employee. In this case
we say that EmpID and NINumber are candidate keys.

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

The Purpose of Keys


A field which is candidate for primary key other than primary field is called
secondary key.

Secondary key
A field which indexed other than primary key. A Primary is always indexed by
default.

If we choose EmpID as the primary key, then NINumber is called a secondary


key.

USES

To make searches fast

Example

Now look at these two relations that we saw

CINEMA(CID, Cname, Loc, MID)


MANAGER(MID, MName)

We see that MID occurs in CINEMA and is the primary key in MANAGER.
In CINEMA we say that MID is the foreign key.

Foreign key

An attribute is a foreign key in a relation if it is the primary key in


another relation. Foreign key is used to link relations or to implement
one to many relationship between two tables.

Referential integrity:

the use of a foreign key to ensure that a value can only be entered in one table
when the same value already exists in the referenced table

Example
CUSTOMER(Num, CustName, City)
CITY_COUNTRY(City, Country)

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

Let’s discuss the use of a foreign key using the database design shown in
Figure above. When the database is being created, the CITY_COUNTRY is
created first. City is chosen as the primary key because unique City name can
be guaranteed. Then the CUSTOMER table is created. Num is defined as the
primary key and the attribute City is identified as a foreign key referencing the
primary key in the CITY_COUNTRY. Once this relationship between primary
and foreign keys has been established, the DBMS will prevent any entry for
City in the CUSTOMER table being made if the corresponding value does not
exist in the CITY_COUNTRY table. This provides referential integrity which
is another reason why the relational database model helps to ensure data
integrity

Data integrity:

A requirement for data to be accurate and up to date

Indexing

Indexing is a data structure technique which allows you to quickly retrieve


records from a database file. An Index is a small table having only two
columns. The first column comprises a copy of the primary or candidate key of
a table. Its second column contains a set of pointers for holding the address of
the disk block where that specific key value stored.

An index -

Takes a search key as input


Efficiently returns a collection of matching records.

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

Pointer
A special variable holds only the address of a memory location.

Index Table Student Table

3 Types of relationships

We now consider the relationships between the entities.

The statements show two types of relationship. There are in fact four
altogether. These are

one-to-one represented by

one-to-many represented by

many-to-one represented by

many-to-many represented by

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

In relational database tables are linked to each other using common fields
(Primary and foreign keys)

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

4 Relational Databases and Normalisation

Consider the following paper-based delivery note from Easy Fasteners Ltd.

Easy Fasteners Ltd


Old Park, The Square, Berrington, Midshire BN2 5RG

To: Bill Jones No.: 005


London Date: 14/08/01
England

Product No. Description

1 Table
2 Desk
3 Chair

Fig 2

In a relational databases.

 Instead of fields we call the columns attributes and

 The rows are called tuples.

 The files are called relations (or tables).

We write the details of our CUSTOMER in standard notation as

CUSTOMER(Num, CustName, City, Country, (ProdID, Description))

ProdID and Description are put inside parentheses because they form a
repeating group. In tabular form the data may be represented by Fig. below

Repeating Groups

Num CustName City Country ProdID Description


005 Bill Jones London England 1,2,3 Table,
Desk, Chair

This again shows the repeating group. We say that this is in un-normalised form
(UNF). To put it into 1st normal form (1NF) we complete the table and identify a key
that will make each tuple unique. This is shown in Fig. Fig. below

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

Num CustName City Country ProdID Description


005 Bill Jones London England 1 Table
2 Desk
3 Chair

Num CustName City Country ProdID Description


005 Bill Jones London England 1 Table
005 Bill Jones London England 2 Desk
005 Bill Jones London England 3 Chair

To make each row unique we need to choose Num together with ProdID as the key.

CUSTOMER(Num, CustName, City, Country, ProdID, Description)

To indicate the key, we simply underline the attributes that make up the key.

Because we have identified a key that uniquely identifies each tuple, we have
removed the repeating group.

Definition of 1NF

A relation with repeating groups removed is said to be in First


Normal Form (1NF). That is, a relation in which the
intersection of each tuple and attribute (row and column)
contains one and only one value.

Let us now see how to move from 1NF to 2NF and on to 3NF.

Definition of 2NF

A relation that is in 1NF and every non-primary key attribute is


fully dependent on the primary key is in Second Normal Form
(2NF). That is, all the incomplete dependencies have been
removed.

We now have three relations.

CUSTOMER(Num, CustName, City, Country)


PRODUCT(ProdID, Description)
CUS_PROD(Num, ProdID)

Note the keys (underlined) for each relation. DEL_PROD needs a compound key

Country depends on City not directly on Num. We need to move on to


3NF.

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

Definition of 3NF

A relation that is in 1NF and 2NF, and in which no non-primary key


attribute is transitively dependent on the primary key is in 3NF. That is,
all non-key elements are fully dependent on the primary key.

Removing this transitive functional determinacy, we have

CUSTOMER(Num, CustName, City)


CITY_COUNTRY(City, Country)
PRODUCT(ProdID, Description)
CUS_PROD(Num, ProdID)

Let us now use the data above and see what happens to it as the relations are
normalised.

1NF
CUSTOMER
Num CustName City Country ProdID Description
005 Bill Jones London England 1 Table
005 Bill Jones London England 2 Desk
005 Bill Jones London England 3 Chair
008 Mary Hill Paris France 2 Desk
008 Mary Hill Paris France 7 Cupboard
014 Anne Smith New York USA 5 Cabinet
002 Tom Allen London England 7 Cupboard
002 Tom Allen London England 1 Table
002 Tom Allen London England 2 Desk

Convert to
2NF

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

CUSTOMER PRODUCT
Num CustName City Country ProdID Description
005 Bill Jones London England 1 Table
008 Mary Hill Paris France 2 Desk
014 Anne Smith New York USA 3 Chair
002 Tom Allen London England 7 Cupboard
5 Cabinet

CUS_PROD
Num ProdID
005 1
005 2
005 3
008 2
008 7
014 5
002 7
002 1
002 2

Convert to
3NF

CUSTOMER CUS_PROD
Num CustName City Num ProdID
005 Bill Jones London 005 1
008 Mary Hill Paris 005 2
014 Anne Smith New York 005 3
002 Tom Allen London 008 2
008 7
014 5
002 7
002 1
002 2

PRODUCT CITY_COUNTRY
ProdID Description City Country
1 Table London England
2 Desk Paris France
3 Chair New York USA
7 Cupboard
5 Cabinet

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

Now we can see that redundancy of data has been removed.

In tabular form we have

UNF

CUSTOMER(Num, CustName, City, Country, (ProdID, Description))

1NF

CUSTOMER(Num, CustName, City, Country, ProdID, Description)

2NF

CUSTOMER(Num, CustName, City, Country)


PRODUCT(ProdID, Description)
CUS_PROD(Num, ProdID)

3NF

CUSTPMER(Num, CustName, City)


CITY_COUNTRY(City, Country)
PRODUCT(ProdID, Description)
CUS_PROD(Num, ProdID)

Example 2

PRODUCT to find ProdID for Desk


CUS_PROD to find Num for this ProdID
DELNOTE to find City corresponding to Num
CITY_COUNTRY to find Country from City

Here is another example of normalisation.

Films are shown at many cinemas, each of which has a manager. A manager may
manage more than one cinema. The takings for each film are recorded for each
cinema at which the film was shown.

The following table is in UNF and uses the attribute names

FID Unique number identifying a film


Title Film title
CID Unique string identifying a cinema
Cname Name of cinema
Loc Location of cinema
MID Unique 2-digit string identifying a manager
MName Manager's name
Takings Takings for a film

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

FID Title CID Cname Loc MID MName Takings


15 Jaws TF Odeon Croyden 01 Smith £350
GH Embassy Osney 01 Smith £180
JK Palace Lye 02 Jones £220
23 Tomb Raider TF Odeon Croyden 01 Smith £430
GH Embassy Osney 01 Smith £200
JK Palace Lye 02 Jones £250
FB Classic Sutton 03 Allen £300
NM Roxy Longden 03 Allen £290
45 Cats & Dogs TF Odeon Croyden 01 Smith £390
LM Odeon Sutton 03 Allen £310
56 Colditz TF Odeon Croyden 01 Smith £310
NM Roxy Longden 03 Allen £250

Converting this to 1NF can be achieved by 'filling in the blanks' to give the relation

FID Title CID Cname Loc MID MName Takings


15 Jaws TF Odeon Croyden 01 Smith £350
15 Jaws GH Embassy Osney 01 Smith £180
15 Jaws JK Palace Lye 02 Jones £220
23 Tomb Raider TF Odeon Croyden 01 Smith £430
23 Tomb Raider GH Embassy Osney 01 Smith £200
23 Tomb Raider JK Palace Lye 02 Jones £250
23 Tomb Raider FB Classic Sutton 03 Allen £300
23 Tomb Raider NM Roxy Longden 03 Allen £290
45 Cats & Dogs TF Odeon Croyden 01 Smith £390
45 Cats & Dogs LM Odeon Sutton 03 Allen £310
56 Colditz TF Odeon Croyden 01 Smith £310
56 Colditz NM Roxy Longden 03 Allen £250

This is the relation

R(FID, Title, CID, Cname, Loc, MID, MName, Takings)

Title is only dependent on FID


Cname, Loc, MID, MName are only dependent on CID
Takings is dependent on both FID and CID

Therefore 2NF is

FILM(FID, Title)
CINEMA(CID, Cname, Loc, MID, MName)
TAKINGS(FID, CID, Takings)

In Cinema, the non-key attribute MName is dependent on MID. This means that it is
transitively dependent on the primary key. So we must move this out to get the 3NF
relations

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

FILM(FID, Title)
CINEMA(CID, Cname, Loc, MID)
TAKINGS(FID, CID, Takings)
MANAGER(MID, MName)

5 Entity-Relationship (E-R) Diagrams

Entity-Relationship (E-R) diagrams can be used to illustrate the


relationships between entities. In the earlier example we had the four
relations

CUSTOMER(Num, CustName, City)


CITY_COUNTRY(City, Country)
PRODUCT(ProdID, Description)
CUS_PROD(Num, ProdID)

In an E-R diagram DELNOTE, CITY_COUNTRY, PRODUCT and DEL_PROD are


called entities. Entities have the same names as relations but we do not usually show
the attributes in E-R diagrams.

We now consider the relationships between the entities.

The statements show two types of relationship. There are in fact four altogether.
These are

one-to-one represented by

one-to-many represented by

many-to-one represented by

many-to-many represented by

Fig. below is the E-R diagram showing the relationships between CUSTOMER,
CITY_COUNTRY, PRODUCT and DEL_PROD.

CUSTOMER

CITY_COUNTRY CUS_PROD

SAHIR MASROOR KHAN 03009543074


PRODUCT
Computer Science 9618/12

If the relations are in 3NF, the E-R diagram will not contain any many-to-many
relationships. If there are any one-to-one relationships, one of the entities can be
removed and its attributes added to the entity that is left.

Let us now look at our solution to the cinema problem which contained the relations

FILM(FID, Title)
CINEMA(CID, Cname, Loc, MID)
TAKINGS(FID, CID, Takings)
MANAGER(MID, MName)
in 3NF.

We have the following relationships.

takes
FILM TAKINGS
is for
connected by FID

takes
CINEMA is for TAKINGS
connected by CID
manages
MANAGER CINEMA
managed
by
connected by MID

These produce the ERD shown in Fig. below

CINEMA

MANAGER TAKINGS

FILM

In this problem we actually have the relationship

CINEMA shows many FILMs

FILM is shown at many CINEMAs


That is

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

CINEMA FILM

But this cannot be normalised to 3NF because it is a many-to-many relationship.


Many-to-many relationships are removed by using a link entity as shown here.

CINEMA LINK_ENTITY FILM

If you now look at Fig. above, you will see that the link entity is TAKINGS.

8.2 Database management system

The database approach It is vital to understand that a database is not


just a collection of data. A database is an implementation according to
the rules of a theoretical model. The basic concept was proposed by
ANSI (American National Standards Institute) in its three-level model.

The three levels are:


• the external level
• the conceptual level
• the internal level.

1 Database Management System (DBMS)

Let us first look at the architecture of a DBMS as shown in Fig. below

EXTERNAL
LEVEL User 1 User 2 User 3
(Individual
users)

CONCEPTUAL
LEVEL Company Level
(Integration of
all user views)

INTERNAL DISK/FILE
LEVEL organisation
(Storage view)

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

External level / view/ schema

This is the individual view of the database. In a multi user database there
will generally be several different external schema representing each
user view according to their needs and access right.

 At the external level there are many different views of the data.
 Each view consists of an abstract representation of part of the total database.
Application programs will use a data manipulation language (DML) to create
these views.

Conceptual level /view /schema

The overall view of the entire database, including entities, attributes and
relationships as designed by the database designer.

 At the conceptual level there is one view of the data.


 This view is an abstract representation of the whole database.

Internal level/view/ schema

This describes how the data will be stored and is concerned with file
organisation and access methods. It is generally transparent to the user.

 The internal view of the data occurs at the internal level.


 This view represents the total database as actually stored.
 It is at this level that the data is organised into random access, indexed and
fully indexed files.
 This is hidden from the user by the DBMS.

Data management system (DBMS):

software that controls access to data in a database

Database administrator (DBA):

A person who uses the DBMS to customise the database to suit user
and programmer requirements

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

The facilities provided by a DBMS

Developer interface:

Gives access to software tools provided by a DBMS for creating tables


and attributes to be defined together with their data types. In addition,
the DBMS provides facilities for a programmer to develop a user
interface called forms to make data entry and viewing fast.

Query processor:

Software tools provided by a DBMS to allow creation and execution of a


query

Query:

The query is the mechanism for extracting and manipulating data from
the database.

or

Used to select data from a database subject to defined conditions

Report generator

The other feature likely to be provided by the DBMS is the capability for
creating a report to present formatted output.

DBMS functions likely to be used by a DBA

There are a number of features that can improve performance.

Data dictionary

An important feature of the DBMS is the data dictionary which is part of


the database that is hidden from view from everyone except the DBA.

It contains metadata about the data. This includes details of all the
definitions of tables, attributes and so on but also of how the physical
storage is organised.

Metadata (Data about database)

Metadata is defined as the data providing information about one or more aspects of
the data; it is used to summarize basic information about data which can make
tracking and working with specific data easier. Some examples include: Means of
creation of the data. Purpose of the data. Time and date of creation.

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

Index:
A small secondary table used for rapid searching which contains one
attribute from the table being searched and pointers to the tuples in that
table

The index can be on the primary key or on a secondary key. Searching


an index table is much quicker than searching the full table.

Backup facilities

The integrity of the data in the database is a key concern. One potential
cause of problems occurs when a transaction is started but a system
problem prevents its completion. The result would be a database in an
undefined state.

The DBMS should have a built-in feature that prevents this from
happening. As with all systems, regular backup is a requirement. The
DBA will be responsible for backup of the stored data.

Access Rights

Sometimes it must not be possible for a user to access the data in a database.

Different views for different user / Record Locks

In this case the one user will be able to change the contents of the
database while the other will only be allowed to query the database.

Example 1

In a banking system, accounts must be updated with the day's transactions. While this
is taking place users must not be able to access the database. Thus, at certain times of
the day, users will not be able to use a cash point.

Levelled Passwords

Different user is given different password to access the database.


Receptionist password allows her to access certain data while manager
password allows him to access his related data.

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

Hardware restrictions

A manager can access the record from any terminal while workers can
access from their terminals only.

This is a hardware method of preventing access. All terminals have a unique address
on their network cards. This means that the DBMS can decide which data can be
supplied to a terminal.

8.3 Structured Query Language (SQL)

SQL is the programming language provided by a DBMS to support all of


the operations associated with a relational database. Even when a
database package offers high-level software tools for user interaction,
they create an implementation using SQL

Data Definition Language (DDL)

Data Definition Language (DDL) is the part of SQL provided for creating
or altering tables.

These commands only create the structure. They do not put any data
into the database.
The following are some examples of DDL that could be used in creating
the database

CREATE DATABASE BandBooking;

CREATE TABLE Band ( BandName varchar(25), MembersNum


integer);

ALTER TABLE Band ADD PRIMARY KEY (BandName);

ALTER TABLE Band-Booking ADD FOREIGN KEY (BandName


REFERENCES Band(BandName);

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

Data Manipulation Language (DML)

There are three categories of use for Data Manipulation Language (DML)
• The insertion of data into the tables when the database is created
• The modification or removal of data in the database
• The reading of data stored in the database

The following illustrate the two possible ways that SQL can be written to populate a
table with data:

INSERT INTO Band (‘ComputerKidz’, 5);

INSERT INTO Band-Booking (BandName, BookingID) VALUES


(‘ComputerKidz’, ‘2016/023’);

The following are some points to note.

• Parentheses are used in both versions.


• A separate INSERT command has to be used for each tuple in the table.
• There is an order defined for the attributes.
• Although the SQL will have a list of INSERT commands the subsequent use of the
table has no concept of the tuples being ordered.

The main use of DML is to obtain data from a database using a query. A query always
starts with the SELECT command. Two examples are:

The simplest form for a query has the attributes for which values are to be listed as
output identified aft er SELECT and the table name identified aft er FROM. For

example:

SELECT BandName FROM Band;

Note that the components of the query are separated by spaces.


The Band table only has two attributes. To list the values for both there are two
options:
SELECT BandName, MembersNum FROM Band;
or
SELECT * FROM Band;

which uses * to indicate all attributes. Note that in the first example the attributes are
separated by commas but no parentheses are needed.

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

It is possible to include instructions in the SQL to control the presentation of the


output. The following uses ORDER BY to ensure that the output is sorted to show the
data with the band names in alphabetical order.

SELECT BandName, Members FROM Band ORDER BY BandName;

In this query there is no question of duplicate entries because BandName is the


primary key of the BandName table. However, in the Band-Booking table an
individual value for BandName will occur many times. If a query were being used to
find which bands already had a booking there would be repeated names in the output.
This can be prevented by the use of GROUP BY as shown here:

SELECT BandName FROM Band-Booking GROUP BY BandName;

An extension of the control of the output from a query is to include a condition to


limit the selected data. This is provided by a WHERE clause. The following are

examples:

SELECT BandName FROM Band-Booking WHERE Headlining = ‘Y’


GROUP BY BandName;

which produces a single output for each band that has headlined. Note how a query
can have several component parts which are best presented on separate lines.

SELECT BandName, MembersNum FROM Band WHERE MembersNum > 2


ORDER BY BandName;

which excludes any two bands.

It is possible to qualify the SELECT statement by using a function. SUM, COUNT


and AVG are examples of functions that work on data held in several tuples for a
particular attribute and return one value. For this reason, these functions are called
aggregate functions. As an example, the following code displays the number of
members in a band:

SELECT Count(*) FROM Band;

This is a special case because there is no need to specify the attribute. An example
using a specific attribute would be:

SELECT AVG(MembersNum) FROM Band;

another example is:

SELECT SUM(MembersNum) FROM Band;

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

A query can be based on a ‘join condition’ between data in two tables. The most
frequently used is an inner join which is illustrated by:

SELECT VenueName, Date FROM Booking


WHERE Band-Booking.BookingID = Booking.BookingID
AND Band-Booking.BandName = ‘ComputerKidz’;

The SQL uses the full definitive name for each attribute with the table name and
attribute name separated by a dot. The query contains two conditions. The way that
the query works is as follows.

• The Band-Booking table is searched for instances where the BandName is


ComputerKidz.
• For each instance the BookingID is noted.

• Then there is a search of the Booking table to find the examples of tuples having this
value for BookingID.
• For each one found the VenueName and Date are presented in the output.

Some versions of SQL require the explicit use of INNER JOIN. The following is a
possible generic syntax:

SELECT table1.column1, table2.column2... FROM table1 INNER JOIN table2


ON table1.common _ field = table2.common _ field;

The other use of DML is to modify the data stored in the database. The UPDATE
command is used to change the data. If the band ComputerKidz recruited an extra
member the following SQL would make the change needed.

UPDATE Band SET MembersNum = 6 WHERE BandName =


‘ComputerKidz’;

Note the use of the WHERE clause. If you forgot to include this the UPDATE
command would change the number of band members to 6 for all of the bands.

The DELETE command is used to remove data from the database. This has to be done
with care. If the ITWizz band decided to disband the following SQL would remove
the name from
the database.

DELETE FROM Band-Booking WHERE BandName = ‘ITWizz’; DELETE


FROM Band WHERE BandName = ‘ITWizz’;

Note that if an attempt was made to carry out the deletion from Band first there would
be an error. This is because BandName is a foreign key in Band-Booking. Any entry
for BandName in Band-Booking must have a corresponding value in Band.

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

Appendix: Designing Databases

Consider the following problem.

A company employs engineers who service many products. A customer may own
many products but a customer's products are all serviced by the same engineer. When
engineers service products they complete a repair form, one form for each product
repaired. The form contains details of the customer and the product that has been
repaired as well as the Engineer's ID. Each form has a unique reference number.
When a repair is complete, the customer is given a copy of the repair form.

Step 1: highlight all the fields

A company employs engineers who service many products. A customer may own
many products but a customer's products are all serviced by the same engineer. When
engineers service products they complete a repair form, one form for each product
repaired. The form contains details of the customer and the product that has been
repaired as well as the engineer's ID. Each form has a unique reference number.
When an engineer has repaired a product, the customer is given a copy of the repair
form.

This suggests the following entities.

engineer
product
customer
repair form

Step 2: highlight all the verbs between fields

A company employs engineers who service many products. A customer may own
many products but a customer's products are all serviced by the same engineer. When
engineers service products they complete a repair form, one form for each product
repaired. The form contains details of the customer and the product that has been
repaired as well as the Engineer's ID. Each form has a unique reference number.
When a repair is complete, the customer is given a copy of the repair form.

This suggests the following relationships.

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

Relationship Type Notes


engineer services product many-to-many An engineer services many products
and a product can be serviced by
many engineers. For example,
many engineers service washing
machines.
customer owns product many-to-many A customer may own many products
and a product can be owned by
many customers. For example,
many customers own washing
machines.
engineer completes form one-to-many An engineer completes many forms
but a form is completed by only one
engineer.
customer is given form one-to-many A customer may receive many
forms but a form is given to only
one customer.
product is on form one-to-many Only one product can be on a form
but a product may be on many
different forms.

Which leads to the entity relationship diagram (ERD).

services completes
ENGINEER

serviced completed
by by
PRODUCT FORM

owned by given to

CUSTOMER
owns is given

There are two many-to-many relationships that must be changed to one-to-many


relationships by using link entities. This is shown below.

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

services completes
ENGINEER

ENG_PROD

serviced completed
by by
PRODUCT FORM
owned by given to

CUST_PROD

CUSTOMER
owns is given

This suggests the following relations (not all the attributes are given).

ENGINEER(EngineerID, Name, … )
ENG_PROD(EngineerID, ProductID)
PRODUCT(ProductID, Description, … )
CUST_PROD(CustomerID, ProductID)
CUSTOMER(CustomerID, Name, … )
FORM(FormID, CustomerID, EngineerID, … )

These are in 3NF, but you should always check that they are.

SAHIR MASROOR KHAN 03009543074


Computer Science 9618/12

Example Questions

1. A large organisation stores a significant amount of data. Describe the advantages


of storing this data in a database rather than in a series of flat files. (4)
A. -All the data is stored on the same structure and can therefore be accessed
through it.
-The data is not duplicated across different files, consequently…
-there is less danger of data integrity being compromised.
-Data manipulation/input can be achieved more quickly as there is only one
copy of each piece of data.
-Files need to be compatible, this problem does not arise with a database.

2. Explain what is meant by a table being in second normal form.


A. A table is in second normal form if each tuple contains no repeating attributes
(first normal form) and every attribute that is a non primary key is fully
dependent on the primary key.
Notes: Very dependent on the formal definitions, and certainly not expected in
these words in an exam. This is a very difficult idea for a student to convey, any
reasonable attempt that makes sense in plain English must be credited.

3. Every student in a school belongs to a form. Every form has a form tutor and all
the form tutors are members of the teaching body.
Draw an entity relationship diagram to show the relationships between the four
entities STUDENT, FORM, TUTOR, TEACHERS (6)

A.
STUDENT FORM

TEACHERS TUTOR

4. Explain what is meant by a foreign key. (3)


A. -A foreign key is an attribute in one table that is
-a primary key in another.
-It acts as a link between the two tables.

SAHIR MASROOR KHAN 03009543074

You might also like