Database Management Systems Handout

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 72

Intermediate Database Management

Table of contents

1. Introduction

1.1 Database Fundamentals


1.1.1 Forms and Levels of Data
1.1.2 Alternative approaches of data handling
1.2 Database Approach
1.2.1 What is a database?
1.2.2 What is DBMS?
1.2.3 DBMS components
1.2.4 Advantages and Disadvantages of DBMS
1.2.5 Users of the database, the DBA
1.3 Cost and Benefit considerations

2. Architecture of a database system

2.1Conceptual level
2.2 Internal level
2.3 External level
2.4 Other Architectures

3. Database Models

3.1 The relational mode


3.2 The non-relational model
3.2.1 Hierarchical model
3.2.2 Network model
3.3 Other Models

4. The relational model

4.1 Basics of relational model


4.2 Relational data objects
4.2.1 Domains and d attributes
4.2.2 Relations
4.2.3 Kinds of relations
4.3 Relational data integrity
4.3.1 Entity integrity
4.3.2 Referential integrity
4.4 Applying Database Modeling

pdfMachine
A pdf writer that produces quality PDF files with ease!
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

5. The Entity-Relationship Model

5.1 Stages in database design


5.2 Designing a Database
5.3 Designing a Relational Database
5.4 Information modeling
5.5 The ER model
5.6 Association
5.7 The ER diagram
5.8 Logical design

6. Normalization

6.1 First Normal Form


6.2 Second Normal Form
6.3 Third Normal Form
6.4 Boyce to Code Normal Form

7. The Structured Query Language (SQL)

7.1 Data definition


7.2 Data Manipulation
7.3 Manipulating Relational Database Systems
7.4 Using Query to generate summary report
7.5 Using Views

pdfMachine
A pdf writer that produces quality PDF files with ease!
2
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Chapter 1
1. Introduction
1.1 Data, information, information System (Data Processing)
 Data: is a collection of raw facts.
 Information: is a processed data in the form that is meaningful to the user.
 Information System (Data processing) is a system that:
 Receives data and instruction
 Processes the data as per the instruction
 Produces output
 Stores data/information for future use
1.2 Information System and Organization, database system
 Information System doesn’t exist without organization. That is, organization of
data is necessary if data is voluminous.
 Information System is a support system for the organizational activity to achieve a
certain goal.
 A database system is basically a computerized record keeping system. Users of
the database can perform a variety of operations. Such as:
 Adding new data to empty file
 Adding new data to existing file
 Retrieving data from existing file
 Modifying data to existing file
 Deleting data from existing file
 Searching for target information
1.3 Alternative approaches of data handling
Reasons to study alternative approaches:

I) Understanding the problems in these systems prevent us from repeating similar


problems in database systems.

pdfMachine
A pdf writer that produces quality PDF files with ease!
3
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

II) If you want to convert these approaches to a database system, understanding how
these systems work will be extremely useful.

a) Manual

 Typing the data on paper and put in a file cabinet


 Works well if the number of items to be stored is small.

Disadvantages:

 Data loss: due to damaged papers or unable to locate it.


 Redundancy: multiple copies of the same data with in the organization.
 Inconsistency: Modifications are not reflected on all multiple copies
 Prone to error

b) File based Approach

 File based approaches were an early attempt to computerize the manual filing
system.
 It is s a collection of application programs that performs services for the end
users.
 Each program defines and manages its own data.
 Each department has its own database
 Programming languages were used.
 Example of programming languages: C++, Cobol, Pascal

1.4 Limitations of file based approach


1. Separation/Isolation of data

When data is isolated in separate files, it is difficult to access data that should be
available. This is because; there is no concept of relationship between files. Therefore, we
need to create a temporary file for the participating files.

2. Duplication of data (Redundancy)

pdfMachine
A pdf writer that produces quality PDF files with ease!
4
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

 This is concerning with storage of similar information in multiple files


 Disadvantage of redundancy:

a) It costs time and money to enter the data


b) It takes up additional storage space (memory space)
c) Inconsistency: this is loss of data integrity. For instance, if modification in the
child table is unable to be reflected on the parent table.

3. Data Dependence

 Changes to an existing structure are difficult to make.


 Example: change in the size of Student Name (from 20 characters to 30
characters) requires a new program to convert student file to a new format.
 The new program opens original student file, open a temporary file, read records
from original student file and write to the temporary file, delete the original
student file and finally rename the temporary file as student file.
 It is time consuming
 Prone to error

4. Incompatible file formats

 The structure of file is dependent on the application programs.


 Incompatibility of files makes them difficult to process jointly.
 Example: consider two files with in the same enterprise but in different
departments, or in different branches:
If the first file is constructed using COBOL and the second file is written using
C++, then there will be a problem of integrity.

(1.2) Data base Approach

1.2.1 What is a database?


 A database is a collection of related data in an organized way.
 Most of the time, Organization is in tabular form.
 E.g. book database

pdfMachine
A pdf writer that produces quality PDF files with ease!
5
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Call no Title Author Publisher No of copies


QA46 Introduction to dbase Colony Addison Wesley 15

The organization of the database becomes necessary when the data is voluminous.
Otherwise, managing data will be very difficult.
E.g. A Manufacturing Company with product data
A Bank with account data
A Hospital with patients
A University with Students
A government with planning data

1.2.2 What is a database system?


 It is a computerized record keeping system, which stores related data in an
organized way.
 The overall purpose of a database system is to store information and to allow
users to add, delete, retrieve, search, query and update that information upon
request.

1.2.3 What is a data base management system (DBMS)?

A DBMS is software that enables users to define, create, maintain and control access to
the database.
Example: Ms Access, FoxPro, SQL Server, MySQL, Oracle …

1.2.4 Component of a Database system


A database system consists of four major components:

(I) Data: is the core component of the database system.

 The data stored in the database can be thought of as several distinct data files.
 In a multi-user environment, data can be shared.

pdfMachine
A pdf writer that produces quality PDF files with ease!
6
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

 A database contains both the operational data & the meta-data (data about data).

(II) Hardware: - The processor, storage media and the memory of the computer system

 Has impact on the overall performance of the database system.


 Hardware can range from a single pc, to mainframe computer, to a network of
computers.
 The particular hardware depends on the organization’s requirement and the DBMS
used.

(III) Software: The database management system allows the user to interact with the
data.

 It provides facilities to perform different operations:


E.g. Creation, insertion, update, retrieve, delete etc
 It totally covers (shields) the data from unexpected operations that damage the entire
data.

(IV) Users: Users can be:

a) Application developers that write programs for using the data.


b) End users: interact with the system using query language provided by the DBMS.
c) Database Administrator: that controls the enterprises data resource

1.3 Why DBMS?

Advantage of database approach over the previous data handing approaches:

a) Compactness: - no need of voluminous paper files

b) Speed: - Searches are fast in computerized systems.

c) Less error: errors can be reduced

d) Timeliness: - Accurate and up-to-date information is available at any time.

pdfMachine
A pdf writer that produces quality PDF files with ease!
7
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

e) Data redundancy is reduced by integrating the files so that multiple copies of the
same data will not be stored.

f) Data can be shard by all authorized users

g) Improved data integrity.

Database integrity refers to the validity and consistency of stored data. Integrity is
usually expressed in terms of constraints, which are consistency rules that the data
base is not permitted to violate.

h) Standards can be enforced.

Data formats to facilitate exchange of data between systems, naming conventions,


documentation standards, update procedures, and access rules.
E.g. Telephone: - (011)312341 or 557796 or 33-29-48
City: - Addis Ababa, A.A
I) Program data independence: - the separation of data descriptions (Meta data) from
the application programs that use the data is called data independence. E.g. change of
employee name from 25 to 30 does not affect the application program.

J) Improved security

Protection of the database from unauthorized users


E.g. username, password managed by the DBA.
Example: DBA has a access to all the data in the data base; a branch manger may
have access to all data that relates to his or her branch office and a sales assistant
may not have access to sensitive data such as staff salary details.

K) Improved back up and recovery services.

E.g. transition logs


E.g. file based approach takes a nightly back up of data. In the event of a failure
during the next day, the back up is restored and the work that has taken place since
the last back up is lost and has to be re-entered. In contrast, modern DBMS’s
Provide facilities to minimize the amount of processing that is lost following a
failure. (That is, transaction logs).

pdfMachine
A pdf writer that produces quality PDF files with ease!
8
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Disadvantages of DBMS:

A) Cost of the DBMS hardware, preparing , concession


B) Need to have a database related personnel
C) Increased Vulnerability: - Since resources are centralized, there is an increase in
security risk.

1.4 Database Administrator (DBA)


 DBA is a person who is responsible for designing or doing most of the database
system activities.
 The DBA must have knowledge of DBMS.

Tasks of DBA:

a) Preliminary database planning: preliminary investigation and feasibility study.


b) Identifying user requirements.
c) Designing the logical model.
d) Choosing a DBMS that best fits the environment (HW and SW) and the database
specification.
e) Developing and maintaining a data dictionary.
f) Defining integrity rules.
g) Defining backups and security rules.
h) Developing and enforcing data standards.
i) Train end users.
j) Developing operational procedures. Such as: starting the system every morning,
security procedures, authorization procedures, recording HW and SW failures,
shutting down the system at the end of each day.
k) Monitoring the system performance.
l) Suggest updates to the system.

pdfMachine
A pdf writer that produces quality PDF files with ease!
9
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

1.5 Cost and Benefit analysis

Cost and benefit analysis is required and computerization must be carried out if the
benefit outweighs the cost.

A) Tangible cost

Any cost or benefit which can be measured in monetary terms.


Example: Tangible cost:
 Purchase of HW and SW.
 Employment cost/additional man power
 Additional room costs, e. t. c.
Tangible benefit
 Increased productivity
 Error free report generation
 Quick retrieval of information
 On time completion of the job
 More customers, e. t. c
B) Intangible Cost
 Any cost or benefit which can not be measured or are difficult to measure in
monetary terms.
 Example: Intangible benefits:
 Customer satisfaction
 Good image of the organization, e. t. c
Intangible cost
 Organizational disruption by the new technology
 The need for greater coordination between data processing groups, …
Exercise:

a) Define a data warehouse


b) Write the purpose of a data warehouse
c) Give 3 different examples that can be stored in a data warehouse

pdfMachine
A pdf writer that produces quality PDF files with ease!
10
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Chapter 2
Architecture of a database system
The objective of the three levels architecture is to separate each user’s view of the
database from the way the database is physically represented. That is, to provide users
with an abstract view of data by hiding certain details of how data is stored and
manipulated.

2.1 Parts of the Architecture


The following architecture is a standard designed by ANSI and is applicable to most
modern database systems. The architecture is composed of three levels:

a) Internal Level
b) Conceptual Level
c) External Level

User View 1 User View 2 User View 3

Conceptual
Schema

Internal
Schema

Figure 1: The Three Levels of Data Abstraction.

pdfMachine
A pdf writer that produces quality PDF files with ease!
11
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Note: Each organization has one physical and one conceptual schema and one or more
user views. This means, the three levels architecture defines one database with
multiple views.

For the system to be usable, it must retrieve data efficiently. This concern has led to the
design of complex data structure for the representation of data in the database. Since
many database systems users are not computer literates, developers hides the complexity
from users through several levels of abstraction, to simplify user’s interactions with the
system:

a) Internal Level
 It is the one closest to physical storage. That is, it is concerned with the way the
data is physically stored.
 It is the way the DBMS and the Operating system perceive the data.
 It is concerned with how fields are represented, what physical sequence the stored
records are in, …
 Physical schema contains the specifications for how data from a conceptual
schema are stored in a computer secondary memory.
 It deals with assembly and similar language commands.

b) Conceptual/Logical level
 It is the community view of the database as seen by the DBA.
 It is a representation of the entire information in a particular enterprise.
 This is the next-higher level of abstraction next to internal level that describes
what data are stored in the database, and what relationships exist among those
data.
 It includes Entity-Relationship modeling, security and integrity constraints.
 Although implementation of the simple structure at the logical level may involve
complex physical-level structures, the user of the logical level does not need to be
aware of this complexity.
 The logical level of abstraction is used by database administrators, who must
decide what information should be kept in the database.

pdfMachine
A pdf writer that produces quality PDF files with ease!
12
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

c) External/View level

 The highest level of abstraction describes only some part of the entire database.
 It is concerned with how individual users view the database.
 Despite the use of simpler structures at the logical level, some complexity
remains. Because of the large size of the database, many users of the database
system will not be concerned with all this information. Instead, such users need
to access only a part of the database. So that their interaction with the system is
simplified and the view level of abstraction is defined.
 The system provides many views for the same database.

2.2 Abstract View of Data


A DBMS is a collection of interrelated files and a set of programs that allow users to
access and modify these files. A major purpose of a database system is to provide users
with an abstract view of the data. That is, the system hides certain details of how the data
is stored and maintained.

2.3 Instances and Schemas


Databases change over time as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the
database. The overall design of the database is called the database schema. Schemas are
not changed frequently.

Analogies to the concepts of data types, variables, and values in programming languages
are useful here. Consider the supplier - record type definition. To declare such variables
in a C++ like language, we write

Supplier supplier1;

Variable supplier1 now corresponds to an area of storage containing a customer type


record.
A database schema corresponds to the programming language type definition. A variable
of a given type has a particular value at a given instant. Thus, the value of a variable in
programming languages corresponds to an instance of a database schema.
pdfMachine
A pdf writer that produces quality PDF files with ease!
13
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Database systems have several schemas, partitioned according to the levels of abstraction
that we discussed. At the lowest level is the physical schema; at the intermediate level is
the logical schema; and at the highest level is a subschema. In general, database systems
support one physical schema, one logical schema and several subschemas.

2.4 Mappings
 The DBMS is responsible for mapping between the 3 types of schema.
 The conceptual schema is related to the internal schema through a
Conceptual/Internal mapping.
 Each External schema is related to the conceptual schema by the
External/Conceptual mapping.

2.5 Data Independence


The ability to modify a schema definition in one level without affecting a schema
definition in the next higher level is called data independence. There are two levels of
data independence:

Physical data Independence: is the ability to modify the physical schema without
causing application programs to be rewritten. Modifications at the physical level are
occasionally necessary to improve performance.

Logical data independence: is the ability to modify the logical schema without causing
application programs to be rewritten. Modifications at the logical level are necessary
whenever the logical structure of the database is altered (for example, when money-
market accounts are added to a banking system).

pdfMachine
A pdf writer that produces quality PDF files with ease!
14
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Chapter 3
Database Models
A model is a representation of real world objects and events and their associations.
Data model can be divided into four:

1) Hierarchical Model
2) Network Model
3) Relational Model
4) Object oriented Model

1) Hierarchical Model
 Consists of an ordered set of trees in a parent child mode.
 Records are represented by rectangles.
 Allows a node to have only one parent.
 It has two data structure concepts:
 Records and PCR (parent child relationship).
 1-1 or 1-M link is allowed.
 Connection between child and its parent is called a Link.

Advantage of Tree Model


 Good for tree type problem (e.g. Family Tree Problem)

Disadvantages of Tree Model

 We must write a program


 Addition, deletion, and search operations are very difficult.
 There is duplication of data.
 Records are ordered.
 Complex programming is required.

pdfMachine
A pdf writer that produces quality PDF files with ease!
15
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

2) Network data Model


 Records data in a database as a collection of records, but unlike in the case of
hierarchical model, here a node can have any number of parents.
 Linked records are called set.
 It is an extension of the hierarchical model.

Advantage of Network Model

 Easy to show the connection of items.


 Good for network type problem.
 Duplication of data is reduced as compared to hierarchical model.
 Allows M-M relationships.

Disadvantages of Network Model:


 Complexity problem.
 Addition, deletion, search operations are difficult.
 Programming is required.

3) Relational Model
 A relational system is a system in which the data in a database is perceived by the
user a table.
 It is based on relational mathematics.
 New table is generated from old tables by using data manipulation operations.
 Most current DBMS technologies use relational model

Properties of relations

 No duplication of records:
Mathematical sets also do not include duplicate elements.
 Records are unordered:
Sets in maths are not ordered.
 Fields are unordered:
 All field values are atomic.

pdfMachine
A pdf writer that produces quality PDF files with ease!
16
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Terminologies in Relational Model

 Relation: a table with columns and rows.


 Attribute: it is a named column of a relation.
 Domain: is the set of allowable values for one or more attributes.
 Tuple: it is a row of a relation.
 Degree: is the number of attributes in a relation.
 Cardinality: is the number of row in a relation.
 Relational Database: is a collection of normalized relations with distinct relation
name.

Exercise: What is the difference between relation, table and file?


What is the difference between tuple, row and record?
What is the difference between attribute, column and field?
What is the main reason for the selection of relational model than tree and
network models?

Implementations of a multi-user database management systems

There are three types of implementations:


a) Teleprocessing
b) File Server
c) Client Server

Exercise: a) Describe the above implementations


b) List the advantages and disadvantages of each implementation
c) Finally, compare and contrast the three implementations.

pdfMachine
A pdf writer that produces quality PDF files with ease!
17
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Chapter 4
The Relational Model
Introduction
A relational model is a model that is perceived by the user as table.

History

The relational model was invented by E.F. (Ted) Codd as a general model of data, and
subsequently maintained and developed by Chris Date and Hugh Darwen.

Structure of Relational Database


1. A relational database consists of a collection of tables, each having a unique
name.
2. A row in a table represents a relationship among a set of values.
3. Thus a table represents a collection of relationships.
4. There is a direct correspondence between the concept of a table and the
mathematical concept of a relation. A substantial theory has been developed for
relational databases.

Basic Structure

1. The following tables show deposit and customer tables for our banking example.

Figure 3.1: The deposit and customer relations.

pdfMachine
A pdf writer that produces quality PDF files with ease!
18
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

o It has four attributes.


o For each attribute there is a permitted set of values, called the domain of
that attribute.
o E.g. the domain of branchname is the set of all branch names.

Let denote the domain of branchname, and , and the remaining


attributes' domains respectively.

Then, any row of deposit consists of a four-tuple where

In general, deposit contains a subset of the set of all possible rows.

That is, deposit is a subset of

In general, a table of n columns must be a subset of

2. Mathematicians define a relation to be a subset of a Cartesian product of a list of


domains. You can see the correspondence with our tables.

We will use the terms relation and tuple in place of table and row from now on.

3. Some more formalities:


o let the tuple variable refer to a tuple of the relation .
o We say to denote that the tuple is in relation .
o Then [bname] = [1] = the value of on the bname attribute.
o So [bname] = [1] = ``Downtown'',
o and [cname] = [3] = ``Johnson''.
4. We'll also require that the domains of all attributes be indivisible units.
o A domain is atomic if its elements are indivisible units.
o For example, the set of integers is an atomic domain.
o The set of all sets of integers is not.
o Why? Integers do not have subparts, but sets do - the integers comprising
them.
o We could consider integers non-atomic if we thought of them as ordered
lists of digits.

pdfMachine
A pdf writer that produces quality PDF files with ease!
19
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Database Scheme

1. We distinguish between a database scheme (logical design) and a database


instance (data in the database at a point in time).
2. A relation scheme is a list of attributes and their corresponding domains.
3. The text uses the following conventions:

o italics for all names


o lowercase names for relations and attributes
o names beginning with an uppercase for relation schemes

These notes will do the same.

For example, the relation scheme for the deposit relation:


o Deposit-scheme = (branchname, account#, customername, balance)

We may state that deposit is a relation on scheme Deposit-scheme by writing


deposit (Deposit-scheme).

If we wish to specify domains, we can write:


o (branchname: string, account#: integer, customername: string, balance:
integer).
Note that customers are identified by name. In the real world, this would not be
allowed, as two or more customers might share the same name.

Figure 3.2 shows the E-R diagram for a banking enterprise.

pdfMachine
A pdf writer that produces quality PDF files with ease!
20
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

4. The relation schemes for the banking example used throughout the text are:

o Branch-scheme = (bname, assets, bcity)


o Customer-scheme = (cname, street, ccity)
o Deposit-scheme = (bname, account#, cname, balance)
o Borrow-scheme = (bname, loan#, cname, amount)

Note: some attributes appear in several relation schemes (e.g. bname, cname).
This is legal, and provides a way of relating tuples of distinct relations.

5. Why not put all attributes in one relation?

Suppose we use one large relation instead of customer and deposit:

o Account-scheme = (bname, account#, cname, balance, street, ccity)


o If a customer has several accounts, we must duplicate her or his address
for each account.
o If a customer has an account but no current address, we cannot build a
tuple, as we have no values for the address.
o We would have to use null values for these fields.
o Null values cause difficulties in the database.
o By using two separate relations, we can do this without using null values

Keys

1. The notions of superkey, candidate key and primary key all apply to the
relational model.
2. For example, in Branch-scheme,

o {bname} is a superkey.
o {bname, bcity} is a superkey.
o {bname, bcity} is not a candidate key, as the superkey {bname} is
contained in it.
o {bname} is a candidate key.
o {bcity} is not a superkey, as branches may be in the same city.
o We will use {bname} as our primary key.

3. The primary key for Customer-scheme is {cname}.

pdfMachine
A pdf writer that produces quality PDF files with ease!
21
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

4. More formally, if we say that a subset of is a super- key for , we are


restricting consideration to relations in which no two distinct tuples have the
same values on all attributes in . In other words,
o If and are in , and
o ,
o Then .

Query Languages

1. A query language is a language in which a user requests information from a


database. These are typically higher-level than programming languages.

They may be one of:

o Procedural, where the user instructs the system to perform a sequence of


operations on the database. This will compute the desired information.
o Nonprocedural, where the user specifies the information desired without
giving a procedure for obtaining the information.

2. A complete query language also contains facilities to insert and delete tuples as
well as to modify parts of existing tuples.

Mathematical Foundation

The fundamental assumption of the relational model is that all data are represented as
mathematical n-ary relations, an n-ary relation being a subset of the Cartesian product of
n domains. In the mathematical model, reasoning about such data is done in two-valued
predicate logic, meaning there are two possible evaluations for each proposition: either
true or false (and in particular no third value such as unknown, or not applicable, either of
which are often associated with the concept of NULL). Data are operated upon by means
of a relational calculus or algebra, these being equivalent in expressive power.

The relational model of data permits the database designer to create a consistent, logical
representation of information. Consistency is achieved by including declared constraints
in the database design, which is usually referred to as the logical schema. The theory
includes a process of database normalization whereby a design with certain desirable
properties can be selected from a set of logically equivalent alternatives.

pdfMachine
A pdf writer that produces quality PDF files with ease!
22
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

The basic relational building block is the domain or data type, usually abbreviated
nowadays to type. A tuple is an unordered set of attribute values. An attribute is an
ordered pair of attribute name and type name. An attribute value is a specific valid value
for the type of the attribute. This can be either a scalar value or a more complex type.

A relation consists of a heading and a body. A heading is a set of attributes. A body (of
an n-ary relation) is a set of n-tuples. The heading of the relation is also the heading of
each of its tuples.

A relation is defined as a set of n-tuples. In both mathematics and the relational database
model, a set is an unordered collection of items, although some DBMSs impose an order
to their data. In mathematics, a tuple, has an order, and allows for duplication.

A table is an accepted visual representation of a relation; a tuple is similar to the concept


of row, but note that in the database language SQL the columns and the rows of a table
are ordered.

Interpretation

To fully appreciate the relational model of data it is essential to understand the intended
interpretation of a relation.

The body of a relation is sometimes called its extension. This is because it is to be


interpreted as a representation of the extension of some predicate, this being the set of
true propositions that can be formed by replacing each free variable in that predicate by a
name (a term that designates something).

There is a one-to-one correspondence between the free variables of the predicate and the
attribute names of the relation heading. Each tuple of the relation body provides attribute
values to instantiate the predicate by substituting each of its free variables. The result is a
proposition that is deemed, on account of the appearance of the tuple in the relation body,
to be true. Contrariwise, every tuple whose heading conforms to that of the relation but
which does not appear in the body is deemed to be false.

pdfMachine
A pdf writer that produces quality PDF files with ease!
23
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Application to databases

A type in a typical relational database might be an int, a char, a boolean, and so on. A
type name would be the string "int", "char", "boolean", etc. An attribute is part of
declaring a table data structure and specifies a column name and the type of the value that
goes under it. A tuple is basically the same thing as a row, except in SQL DBMS, where
rows are ordered. Tuples should not be ordered. An attribute name might be "name" or
"age". An attribute value is a specific entry in a specific column and row, such as "John
Doe" or "35".

A relation is a specification for a table data structure along with the data in it. A heading
is defined by the declaration of the table data structure. A body is the data that goes into
the table data structure.

SQL standard

SQL, initially pushed as the standard language for relational databases, deviates from the
relational model in several places. The current ISO SQL standard doesn't mention the
relational model or use relational terms or concepts. However, it is possible to create a
database conforming to the relational model using SQL if one does not use certain SQL
features.

The following deviations from the relational model have been noted in SQL. Note that
few database servers implement the entire SQL standard and in particular do not allow
some of these deviations. Whereas NULL is nearly ubiquitous, for example, allowing
duplicate column names within a table or anonymous columns is uncommon.

Duplicate rows

The same row can appear more than once in an SQL table. The same tuple cannot appear
more than once in a relation.

pdfMachine
A pdf writer that produces quality PDF files with ease!
24
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Anonymous columns
A column in an SQL table can be unnamed and thus unable to be referenced in
expressions. The relational model requires every attribute to be named and reference-
able.

Duplicate column names


Two or more columns of the same SQL table can have the same name and are thus
unable to be referenced, on account of the obvious ambiguity. The relational model
requires every attribute to be reference-able.

Column order significance


The order of columns in an SQL table is defined and significant, one consequence being
that SQL's implementations of Cartesian product and union are both noncommutative.
The relational model requires there to be no significance to any ordering of the attributes
of a relation.

Views without CHECK OPTION


Updates to a view defined without CHECK OPTION can be accepted but the resulting
update to the database does not necessarily have the expressed effect on its target. For
example, an invocation of INSERT can be accepted but the inserted rows might not all
appear in the view, or an invocation of UPDATE can result in rows disappearing from the
view.

Column-less tables unrecognized


SQL requires every table to have at least one column, but there are two relations of
degree zero (of cardinality one and zero) and they are needed to represent extensions of
predicates that contain no free variables.

NULL

This special mark can appear instead of a value wherever a value can appear in SQL, in
particular in place of a column value in some row. It is used to indicate unknown values
of fields for the time being. The comparison NULL with something other than itself does
not yield false but instead yields unknown. It is because of this behavior in comparisons
that NULL is described as a mark rather than a value.

pdfMachine
A pdf writer that produces quality PDF files with ease!
25
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Chapter 5
The Entity-Relationship Model
Introduction
An entity-relationship model (ERM) is a model that provides a high-level description of a
conceptual data model. Data modeling provides a graphical notation for representing
such data models in the form of entity-relationship diagrams (ERD).

The whole purpose of ER modeling is to create an accurate reflection of the real world in
a database. The ER model doesn’t actually give us a database description. It gives us an
intermediate step from which it is easy to define a database. Let’s look at an example.

The E-R data model is based on a perception of a real world that consists of a set of basic
objects called entities, and of relationships among these objects. It was developed to
facilitate database design by allowing the specification of an enterprise schema, which
represents the overall logical structure of a database.

The E-R data model is one of several semantic data models; the semantic aspect of the
model lies in the attempt to represent the meaning of the data. The E-R model is
extremely useful in mapping the meanings and interactions of real-world enterprises onto
a conceptual scheme. Because of this utility, many database design tools draw on
concepts from the E-R model.

Entity-Relationship Model (ER Model)

A data model in which information stored in the database is viewed as sets of entities and
sets of relationships among entities. There are three basic notions that the ER Model
employs: entity sets, relationships, and attributes.

pdfMachine
A pdf writer that produces quality PDF files with ease!
26
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Entity: something that exists and can be distinguished from other entities.
Examples:

 customer entities with unique social security numbers

 account entities with unique account numbers

Entity set: a set of entities of the same type.


 example: all of the account entities for a bank

 Entity sets need not be disjoint. Example: a person entity could be in both the
customer and employee sets

Suppose you are presented with the following situation and are told to create a database
for it:
Every department within our company is in only one division. Each division has more
than one department in it. We don’t have an upper limit on the number of departments
that a division can have. For example, the New Business Development---the one
managed by Mackenzie---and Higher Education departments are both in the Marketing
division.
An entity type is a collection of entities that share a common definition. An entity is a
person, place, concept, or thing about which the business needs data.

For instance, Department is the name of one entity type. The Marketing division is an
instance of the Division entity type. Mackenzie is one instance of the Employee entity
type. Instances of entity types are referred to as entities. Person is an idea (entity type)
while Scott, Nancy, Lindsey, and Mackenzie are touchable (entities). Entity types provide
us with a means for making generalizations about entities.

For example, instead of saying “Every department within our company is in only one
division,” we could have gone down the list of all departments (that is, all entities with
entity type Department) and asserted that each one is, indeed, in one division.

In ER modeling we look for relationships among entity types because it is easier and
more concise to speak of relationships among general entity types rather than the
touchable entities themselves.

pdfMachine
A pdf writer that produces quality PDF files with ease!
27
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Determining the relationships among entity types is another important step in the process
of ER modeling.

A relationship is an association between entity types.

The defining characteristic of a relationship is that several entity types are involved. So
something like a name or birth date would not be a relationship since only one entity is
involved.
Now we have identified three entity types (Employee, Department, Division) and two
relationships among these entity types (manages, contains).
ER models are usually represented graphically. The language we are going to use
represents entity types as rectangles and relationships as diamonds. Below is the
representation of the situation we are working with.

Notice that the “contains” relationship is drawn between the two entities that it is
associated with. Similarly for “manages” relationship. This (simplified) ER model tells us
that:

 Division is related to department through a relationship called contains.


 Departments are related to employees through a relationship called manages.
 Employees are not directly related to divisions.

Consider the relationship between divisions and departments. We know that divisions
have multiple departments and departments can only be contained within one division.
Or, for every one division there can be many departments. In the language of ER
modeling this is called a 1: M (read: “one to many”) relationship.

5.1 Relationships
Relationships define which entity types are directly associated with which other entity
types. In the example in an earlier section, we saw that divisions are directly associated
with departments and departments are directly associated with employees. No direct
association between division and employee was given.

pdfMachine
A pdf writer that produces quality PDF files with ease!
28
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

This does not mean that there is no relationship between division and employee. In fact,
the ER diagram tells us that there is a relationship between the two:
Given any one division, there can be many employees managing departments within that
division.
An ER diagram should contain the minimum number of relationships necessary to reflect
the situation.

5.1.1 Cardinality
Once a relationship between entity types has been established, the analyst should
determine its cardinality.
A relationship’s cardinality defines the maximum number of entities of one type that can
be associated with an entity of another type.
For relationships between two entity types, there are three basic cardinalities. Each of the
following descriptions are given in terms of a relationship between entity type X and
entity type Y.
1:1 --- one-to- one --- One entity of type X can be associated with, at most, one entity of
type Y. One entity of type Y can be associated with, at most, one entity of type X.

1: M --- one-to-many --- One entity of type X can be associated with many entities of
type Y. One entity of type Y can be associated with, at most, one entity of type X.
M: M --- many-to-many --- One entity of type X can be associated with many entities of
type Y. One entity of type Y can be associated with many entities of type X.).

1. Patient under care of primary care physician


2. Physician performs operation
3. Doctors have specialty in disease
4. Needle injected into patient

It would seem that at any particular time a patient can only have one primary care
physician and that any physician can have many patients (M:1). One physician can
perform many operations and one operation can be performed by many physicians (M:
M). One doctor can have specialties in many diseases and one disease can be the
specialty of many doctors (M: M). One needle can be injected into one patient and one
patient can have many needles injected into him/her (M: 1).

pdfMachine
A pdf writer that produces quality PDF files with ease!
29
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

5.2 Entity types

Entity types are things for which it is important that your company capture data. If it is
not important, it should not be in the database. In an accounting database you would
expect to find entity types for expenses, assets, liabilities, expenditures, deposits, etc. You
would not expect to find entity types for color of check, quality of dollar bills received,
etc. The database is supposed to reflect reality---but only the part of reality that is
important to the company.

5.3 Attributes
5.3.1 Basics

To this point we have focused on entity types and relationships among them. We have
mentioned, in passing, “facts” about entity types and “attributes” of entity types.
Attributes are the characteristics of an entity type that we are interested in. An attribute
is a descriptor whose values are associated with individual entities of a specific type.

The attribute value for any single entity can have only one value at a given time. This
value can change over time. An attribute of an employee might be salary. At any one
time if you asked for the salary level of a certain employee, then you should get one
answer. And if someone else asked the same question about that employee at the exact
same time, they would expect to get the same answer. Of course, if you asked this
question at a later time you might expect to get a different answer.

5.3.2 Identifier

Every entity type has an identifier. This identifier uniquely identifies a single (at least
one, and no more than one) entity. If you know the value of the identifier, then you know
exactly which entity you are dealing with. Further, the identifier’s value will never
change over time. Thus, if you know the identifier now, then you can be confident that at
any time in the future the identifier for that entity will not have changed.
Example: Social security number is a possible identifier for a person.

pdfMachine
A pdf writer that produces quality PDF files with ease!
30
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

5.4 Degree of a relationship


Relationships can be classified by the number of entity types involved. This is referred to
as the degree of a relationship. To this point we have concerned ourselves with
relationships between two entity types. This is, by far, the most common type of
relationships. The most common degrees of relationships are as follows:
Binary
This is a relationship between two entity types.
Ternary
This is a relationship between three entity types.
Recursive
This is a relationship involving only one entity type.

5.4.1 Ternary
In the real world there are relationships other than those involving two things. For
example, suppose that we want to capture which employees use which skills on which
project. We might try to represent this data in a database as three binary relationships
between skills and project, project and employee, and employee and skill.

The “applies” relationship indicates which employee applies which skill. The used on
relationship indicates which skill is used on which project. The works on relationship
indicates which employee works on which project. But this is not enough to specify
which employee uses which skill on which project. Suppose you know the following:
Works-on
Lindsey and Mackenzie have worked on projects A and B.
Applies
Lindsey has used skills interface design and database design while Mackenzie
only used her database design skill.
Used-on
Both skills have been used on both projects.

pdfMachine
A pdf writer that produces quality PDF files with ease!
31
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Implementing ternary relationships does not mean that you have to get rid of the binary
relationships. You only get rid of the binary relationships if they capture a subset of the
information captured by the ternary relationship. If a binary relationship captures
information that differs from the ternary relationship, then the binary relationship should
be retained if the information is important to your company.

For example, consider the following:

The used on relationship stays the same as in the previous ER diagram. The binary
relationships are different.

Have skill

An employee has a certain skill. This is different than used on because there are some
skills that an employee has that he or she may not have used on a particular project.
Needed

A project needs a particular skill. This is different than used on because there may
be some skills for which employees have not been assigned to the project yet.

Manages

An employee manages a project. This is a completely different dimension than skill so it


could not be captured by used on.

pdfMachine
A pdf writer that produces quality PDF files with ease!
32
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

5.4.2 Recursive
The final, and possibly the most difficult, relationship is the recursive relationship. This
is a relationship that an entity has with itself. But it really doesn’t have to be difficult if
you think about it as you would any ordinary binary relationship. Let’s look at an
example.

Think of an employee who is the manager of other employees.

A manager manages many employees and an employee has exactly one direct manager.
This is pretty straightforward. But, now, realize that a manager is really just another name
for an employee. So, replace managers with employees in this diagram.
Not everyone in the company has a manager. The president will not have a direct
manager. This is handled in the data in the table by indicating that the president’s
manager is the president.

5.5 Attributes of a relationship


When we examined attributes earlier, the attributes were exclusively attached to entity
types. However, it is also possible for a relationship to have attributes. Consider the is
member relationship below.

A person can be a member of many clubs and a club can have many members. A natural
piece of information to store is the date the person joined the club. If the attribute is of the
person entity, then this would indicate when the person joined a club but we would not
know which club. If the attribute is of the club entity, then this would indicate (possibly)
when the club was founded or (possibly) when the most recent member joined the club
but we would not know the dates on which each person joined. The solution is to make
join date an attribute of the is member relationship.
pdfMachine
A pdf writer that produces quality PDF files with ease!
33
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

5.5.1 Parallel relationships


Two entities can have more than one type of relationship. It is possible to represent in a
database or in an ER diagram. Consider the entity types person and insurance policy and
the relationships between them of pays for and is insured under.

Look at these relationships one at a time.

 A person pays for zero or more insurance policies. An insurance policy is paid for
by exactly one person.
 A person is insured by zero or more insurance policies. An insurance policy
insures one or more persons.

These are two distinct relationships. They mean two different things---that is why they
are represented as two separate relationships in the ER diagram.

5.6 Weak entities


Weak entities are entities, but with a difference---weak entities only exist because some
other entity exists. For example, if you were to define two entities employee and salary-
history, then the second would be a weak entity because the record of an employee’s
salary history could only exist if a record of an employee also exists. Joe Smith’s salary
history wouldn’t make much sense if Joe Smith doesn’t exist in the data base.

A weak entity is represented by a double border as shown below.

pdfMachine
A pdf writer that produces quality PDF files with ease!
34
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

5.7 Types of attributes


Sometimes it is instructive to classify an attribute by the means in which the value is
determined. Here are the two possibilities.

Basic
These are values provided to the business. These are the types of attributes that we have
been discussing so far. Think of name, address, etc. These values cannot be deduced
from the values of other attributes.

Derived
This is a value that can be calculated from the value of other attributes in the database.
An example might be the age of an employee when the birth date is in the database.
These attributes should, generally, not be stored in the database but should be calculated
when needed.

6 Interpreting ER diagrams

ER diagram for interpretation exercise

pdfMachine
A pdf writer that produces quality PDF files with ease!
35
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Needed

 A skill can be needed by many projects but might not be needed by any.
 A project can need one or more skills.

Manages

 An employee can manage many projects but might not manage any. (Or:
An employee can manage many projects. There are some employees who
don’t manage any projects.)
 A project must be managed by an employee. (Or: A project is managed by
exactly one employee.) (Or: A project is managed by one and only one
employee.)

Has-skill

 An employee may have many skills but might not have any.
 A skill can be possessed by many employees. There are some skills that no
employees possess.

Used-on
The technique for an n-ary (in this case 3-ary, or ternary) relation is different than
for binary relations, but still straight-forward. Hold your hand on n-1 entity types
(in this case 2) and determine whether a 1 or an m goes on the remaining arm of
the relation. Below, in order, are the project, employee, and skill arms.

 An employee uses one skill on many projects.


 Many employees can use a skill on one project.
 An employee can use one skill on a project.

ER exercises

Question 1

What is the cardinality and existence of each of the following relationships in just
the direction given? State any assumptions you have to make.

1. Husband to wife
2. Student to degree
3. Child to parent
4. Player to team
5. Student to course

pdfMachine
A pdf writer that produces quality PDF files with ease!
36
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Question 2

For each of the following pairs of rules, identify two entity types and one
relationship. State the cardinality and existence of the relationship in each case. If
you don’t think enough information is available to define either of these, then
state an assumption that makes it clear. Draw the ER diagram.

1. A department employs many persons. A person is employed by, at most, one


department.
2. A manager manages, at most, one department. A department is managed by, at
most, one manager.
3. An author may write many books. A book may be written by many authors.
4. A team consists of many players. A player plays for only one team.
5. A lecturer teaches, at most, one course. A course is taught by exactly one lecturer.
6. A flight-leg connects two airports. An airport is used by many flight-legs.
7. A purchase order may be for many products. A product may appear on many
purchase orders.
8. A customer may submit many orders. An order is for exactly one customer.

Question 3

Draw an ER diagram for the following. Be sure to indicate the existence and cardinality
for each relationship.

1. A college runs many classes. Each class may be taught by several teachers, and a
teacher may teach several classes. A particular class always uses the same room.
Because classes may meet at different times or on different evenings, it is possible
for different classes to use the same room.

Question 4

Draw an ER diagram for each of the following situations. On the diagram be sure to
identify the cardinality, existence of each relationship.

1. A hospital patient has a patient history. Each patient has one or more history
records (we assume that the initial patient visit is always recorded as an instance
of the history). Each patient history record belongs to exactly one patient.

pdfMachine
A pdf writer that produces quality PDF files with ease!
37
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

CHAPTER 6
Normalization
 Normalization is a formal process for deciding which attributes should be grouped
together in a relation.
 It is a technique for analyzing tables based on primary keys or candidate keys.
 It is the process of decomposing relations with anomalies to produce smaller,
well- structured relations.
 If a relation is not normalized we can encounter the following problems.
(i) Information Redundancy
(ii) Anomalies:
 Insertion anomalies
 Deletion anomalies
 Update anomalies

Normal Forms
 Normalization is often done as a series of steps.
 i.e UNF, 1st NF, 2nd NF 3rd NF, BCNF, ...
 Each normal form involves a set of rules that can be tested against each table in
the system.
 If a table violates the rules of some normal form, then we decompose the table in
to tables that individually meet the requirements of normalization.
 Each higher form of normalization is based on the form prior to it.

UNNORMALIZED FORM (UNF)


- A table that contains one or more repeating groups.
- A repeating group is a field or group of fields that hold multiple values for a
single occurrence of a field.

pdfMachine
A pdf writer that produces quality PDF files with ease!
38
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

EXAMPLE: Consider the following table.


car _ owners
Name Plate _no Engine_ no Color
Abebe 03-3556 477-789 red
01-2777 888-889 green
Ahmed 01-3567 660-789 Red
Tasew 02-611 311 Green
01-111 456789 White

Repeating group= (plate _no, engine _no, Color)

First Normal Form (1NF)


- A relation is in 1NF if it contains no multi-valued attributes and the intersections of
each row and column contains one & only one value.

- Example1: consider the above (UNF) relation called "car owners"; its equivalent 1st
Normal form is the following table.

First normal form of car-owners table.

Name Plate No engine No Color


Abebe 03-3556 477-789 red
Abeba 01-27777 888-889 green
Ahmed 01-3567 660-789 red
Tagew 02-611 311 green
Tagew 01-111 456789 white

Example 2: Consider the following UNF relation.

Employee

emp_id emp_Name tele _1 tele_2 fax_1 fax_2

- Here , Tele and fax fields are multi-valued


- To change in to 1NF relation, we need to split the table in to three
- The following tables are equivalent 1st Normal form of the above employee table:

pdfMachine
A pdf writer that produces quality PDF files with ease!
39
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Table 1
emp_id emp_Name

Table 2
emp_id emp_tele

Table 3
emp_id fax

SECOND NORMAL FORM (2NF)


-A relation is in second normal form (2NF) if it is in 1NF & every non key attribute is
fully functionally dependent on the primary key.

Note:
- No non-key attribute is functionally dependent on part of the primary key.
- A relation that is in 1st normal form will be in 2NF if any one of the following
conditions applies:

1) The primary key consists of only one attribute.


2) No non key attributes exist in the relation. (That is, all of the attributes in the
relation are components of the Pk.
3) Every non key attribute is functionally dependent on the full set of primary
key attributes.

pdfMachine
A pdf writer that produces quality PDF files with ease!
40
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Functional Dependencies

-A functional, dependency is a constraint between 2 attributes or two sets of


attributes.
- For any relation R, attribute B is functionally dependent on attribute A if, for
every valid instance of A, that value of A uniquely determines the value of B.
- A → B means B is functionally dependent on A.
- An attribute may be functionally dependent on two (or more) attributes, rather
than on single attribute.

E.g. 1: consider the relation Emp_course (empid, course_title, Date_completed)


Now: Empid, Course_title → Date_Completed

The functional dependency in this statement implies that the date a course is
completed is completely determined by the identity of the employee & the title of
the course.

E.g. 2: SSN → Name, Address, Birth date.


A personals Name, address, and birth date are functionally dependent on that
person's social security number.

E.g. 3: ISBN → Title, first-Author-name


The title of a book and the name of the first author are functionally dependent on
the book's international standard book number (1SBN).

- Determinants: - The attribute on the left hand side of the arrow in a functional
dependency is called a determinant.
Eg: SSN, ISBN, (Empid, Course_title) are determinants.
- Fully functional dependency applies to composite determinants.

Note: - If A and B are fields in a table, then B is fully functionally dependent on A if


not any proper subset of A determines B.

pdfMachine
A pdf writer that produces quality PDF files with ease!
41
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

E.g.: (A1, A2) → B is a functional dependency, and


A1 → B or
A2 → B
Then we conclude:
(A1, A2) → B is not a full functional dependency.

- Therefore, if we have a table that violates the 2NF, then the composite PK can be
used as a guide for splitting the table.

Example: - Student _ Relative

Student_id Relative-id R\P Stud- tele


H1 R1 Mother
112345
H1 R2 Uncle
112345
H1 R3 Uncle
112345
H1 R1 Mother
207780

- The PK of the above Table is (Stud_id , Relative_id )


- The stud_ tele field is not fully functionally dependent on the PK ,i.e,
(Stud_id, Relative _id) → Stud _tele is not a full Functional dependency
because:
- Stud _id → stud _tele is a correct functional dependency.
- So , to normalize the above table, we need to split it in to two using the functional
dependency:
Stud_id → stud-tele as a guide.
- The resulting tables are :

T1
Stud_id Stud_tele

pdfMachine
A pdf writer that produces quality PDF files with ease!
42
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

And

T2
Stud_id Relative_id Relationship

Now, after the split, we need to check T1 and T2 obey the rules of normalization.
Hence, T1 and T2 are in 2NF.

Third Normal Form (3NF)


- A table is in 3NF if it is in 2NF, and no non_ primary key is transitively dependent on
the primary key.
- i.e if we have the functional dependencies :
A→B
And
B→C
Then we say that C is transitively dependent on A.
-Consider the following table:

empid empname empsal depid depname depbudjet

Now, PK = empid
We have functional dependencies:
Empid → depid
Depid → depname
Or Depid → depbudjet
Therefore, the above table is not is 3NF. To normalize it, we can use the functional
dependencies:
Depid → depname
Depid → depbudjet
And

pdfMachine
A pdf writer that produces quality PDF files with ease!
43
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Empid → depid
So that the resulting tables are the following:

T1
empid empname empsal depid

And
T2
depid depname depbudjet

Boyce / Codd Normal Form (BCNF)


- "We say a relation is in Boyce _ codd normal form (BCNF) if and only if every
determinant in the relation is a candidate key."

Description of BCNF:

(i ) There are at least two candidate keys in a table.


(ii) All the candidate keys are composite keys
and
(iii) There is overlapping field (s) in the candidate keys (there is at least one
common field).

Example: consider the following relation:

Stud_Couse

Sid Sname Couse_code Grade

- The candidate keys of this Relation are: (Sid + course_ code)


And
(sname + course _code) (Assuming sname is unique).

pdfMachine
A pdf writer that produces quality PDF files with ease!
44
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

- i.e. (Sid, course_ code)


And
(Sname, course_ code)

- Course - code is overlapped field.


- So this table does not satisfy the BCNF. To normalize the above table, we need to
decompose (split ) it as :

Table 1: (Sid, Course _ code, Grade)


And
Table 2: (Sid, Sname)

Sid Sname Course- Code Grade

Note: every relation in BCNF is also in 3NF, but not vice versa.

pdfMachine
A pdf writer that produces quality PDF files with ease!
45
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Chapter 7
Structured Query Language
Introduction

SQL is a standard computer language for accessing and manipulating databases.

What is SQL?

 SQL stands for Structured Query Language


 SQL allows you to access a database
 SQL is an ANSI standard computer language
 SQL can execute queries against a database
 SQL can retrieve data from a database
 SQL can insert new records in a database
 SQL can delete records from a database
 SQL can update records in a database
 SQL is easy to learn

SQL is a Standard

SQL is an ANSI standard computer language for accessing and manipulating database
systems. SQL statements are used to retrieve and update data in a database. SQL works
with database programs like MS Access, DB2, Informix, MS SQL Server, Oracle,
Sybase, etc.

There are many different versions of the SQL language, but to be in compliance with the
ANSI standard, they must support the same major keywords in a similar manner (such as
SELECT, UPDATE, DELETE, INSERT, WHERE, and others).

Note: Most of the SQL database programs also have their own proprietary extensions in
addition to the SQL standard!

Although we refer to the SQL language as a “query language” it contains many other
capabilities besides querying a database. It includes features for defining the structure of
the data, for modifying data in the database, and for specifying security constraints.
SQL (Structured Query Language) is THE standard DML for relational database
products. The query language is based on relational algebra, but borrows from tuple
relational calculus.

pdfMachine
A pdf writer that produces quality PDF files with ease!
46
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Two major standards have been defined for SQL:

 SQL-89
 SQL-92

SQL-92 defines some additional operations that are not part of SQL-89.

SQL Database Tables

A database most often contains one or more tables. Each table is identified by a name
(e.g. "Customers" or "Orders"). Tables contain records (rows) with data.

The following table is "Persons" table.

LastName FirstName Address City


Hansen Ola Timoteivn 10 Sandnes
Svendson Tove Borgvn 23 Sandnes
Pettersen Kari Storgt 20 Stavanger

The table above contains three records (one for each person) and four columns
(LastName, FirstName, Address, and City).

SQL Queries
With SQL, we can query a database and have a result set returned.

A query like this:

SELECT LastName FROM Persons

Gives a result set like this:

LastName
Hansen
Svendson
Pettersen

Note: Some database systems require a semicolon at the end of the SQL statement. We
don't use the semicolon in our tutorials.

pdfMachine
A pdf writer that produces quality PDF files with ease!
47
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

SQL Data Manipulation Language (DML)


SQL (Structured Query Language) is syntax for executing queries. But the SQL language
also includes syntax to update, insert, and delete records.

These query and update commands together form the Data Manipulation Language
(DML) part of SQL:

 SELECT - extracts data from a database table


 UPDATE - updates data in a database table
 DELETE - deletes data from a database table
 INSERT INTO - inserts new data into a database table

SQL Data Definition Language (DDL)


The Data Definition Language (DDL) part of SQL permits database tables to be created
or deleted. We can also define indexes (keys), specify links between tables, and impose
constraints between database tables.

The most important DDL statements in SQL are:

 CREATE TABLE - creates a new database table


 ALTER TABLE - alters (changes) a database table
 DROP TABLE - deletes a database table

The SQL SELECT Statement


The SELECT statement is used to select data from a table. The tabular result is stored in a
result table (called the result-set).

Syntax
SELECT column_name(s)
FROM table_name
Note: SQL statements are not case sensitive. SELECT is the same as select.
SQL SELECT Example
To select the content of columns named "LastName" and "FirstName", from the database
table called "Persons", use a SELECT statement like this:
SELECT LastName,FirstName FROM Persons
The result
LastName FirstName
Hansen Ola
Svendson Tove
Pettersen Kari

pdfMachine
A pdf writer that produces quality PDF files with ease!
48
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Select All Columns

To select all columns from the "Persons" table, use a * symbol instead of column names,
like this:

SELECT * FROM Persons

Result

LastName FirstName Address City


Hansen Ola Timoteivn 10 Sandnes
Svendson Tove Borgvn 23 Sandnes
Pettersen Kari Storgt 20 Stavanger

The Result Set


The result from a SQL query is stored in a result-set. Most database software systems
allow navigation of the result set with programming functions, like: Move-To-First-
Record, Get-Record-Content, Move-To-Next-Record, etc.

Semicolon is the standard way to separate each SQL statement in database systems that
allow more than one SQL statement to be executed in the same call to the server.

Some SQL tutorials end each SQL statement with a semicolon. Is this necessary? We are
using MS Access and SQL Server 2000 and we do not have to put a semicolon after each
SQL statement, but some database programs force you to use it.

The SELECT DISTINCT Statement


The DISTINCT keyword is used to return only distinct (different) values.

The SELECT statement returns information from table columns. But what if we only
want to select distinct elements?

With SQL, all we need to do is to add a DISTINCT keyword to the SELECT statement:

Syntax
SELECT DISTINCT column_name(s)
FROM table_name

pdfMachine
A pdf writer that produces quality PDF files with ease!
49
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Using the DISTINCT keyword


To select ALL values from the column named "Company" we use a SELECT statement
like this:

SELECT Company FROM Orders

"Orders" table
Company OrderNumber
Sega 3412
Toyota 2312
Trio 4678
Toyota 6798

Result
Company
Sega
Toyota
Trio
Toyota

Note that "Toyota " is listed twice in the result-set.

To select only DIFFERENT values from the column named "Company" we use a
SELECT DISTINCT statement like this:

SELECT DISTINCT Company FROM Orders

Result:
Company
Sega
Toyota
Trio

Now “Toyota” is listed only once in the result-set.

The WHERE Clause


A WHERE clause is added to the SELECT statement to conditionally select data from a
table

pdfMachine
A pdf writer that produces quality PDF files with ease!
50
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Syntax

SELECT column FROM table


WHERE column operator value

With the WHERE clause, the following operators can be used:

Operator Description
= Equal
<> Not equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
BETWEEN Between an inclusive range
LIKE Search for a pattern

Note: In some versions of SQL the <> operator may be written as! =

Using the WHERE Clause


To select only the persons living in the city "Sandnes", we add a WHERE clause to the
SELECT statement:

SELECT * FROM Persons


WHERE City='Sandnes'

"Persons" table
LastName FirstName Address City Year
Hansen Ola Timoteivn 10 Sandnes 1951
Svendson Tove Borgvn 23 Sandnes 1978
Svendson Stale Kaivn 18 Sandnes 1980
Pettersen Kari Storgt 20 Stavanger 1960

Result
LastName FirstName Address City Year
Hansen Ola Timoteivn 10 Sandnes 1951
Svendson Tove Borgvn 23 Sandnes 1978
Svendson Stale Kaivn 18 Sandnes 1980

pdfMachine
A pdf writer that produces quality PDF files with ease!
51
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Using Quotes
Note that we have used single quotes around the conditional values in the examples.

SQL uses single quotes around text values (most database systems will also accept
double quotes). Numeric values should not be enclosed in quotes.

For text values:

This is correct:
SELECT * FROM Persons WHERE FirstName='Tove'
This is wrong:
SELECT * FROM Persons WHERE FirstName=Tove

For numeric values:


This is correct:
SELECT * FROM Persons WHERE Year>1965
This is wrong:
SELECT * FROM Persons WHERE Year>'1965'

The LIKE Condition

The LIKE condition is used to specify a search for a pattern in a column.

Syntax
SELECT column FROM table
WHERE column LIKE pattern

A "%" sign can be used to define wildcards (missing letters in the pattern) both before
and after the pattern.

Using LIKE

The following SQL statement will return persons with first names that start with an 'O':

SELECT * FROM Persons


WHERE FirstName LIKE 'O%'

The following SQL statement will return persons with first names that end with an 'a':

SELECT * FROM Persons


WHERE FirstName LIKE '%a'

pdfMachine
A pdf writer that produces quality PDF files with ease!
52
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

The following SQL statement will return persons with first names that contain the pattern
'la':

SELECT * FROM Persons


WHERE FirstName LIKE '%la%'

AND & OR

AND and OR join two or more conditions in a WHERE clause.

The AND operator displays a row if ALL conditions listed are true. The OR operator
displays a row if ANY of the conditions listed are true.

Original Table (used in the examples)


LastName FirstName Address City
Hansen Ola Timoteivn 10 Sandnes
Svendson Tove Borgvn 23 Sandnes
Svendson Stephen Kaivn 18 Sandnes

Example
Use AND to display each person with the first name equal to "Tove", and the last name
equal to "Svendson":

SELECT * FROM Persons


WHERE FirstName='Tove'
AND LastName='Svendson'

Result:
LastName FirstName Address City
Svendson Tove Borgvn 23 Sandnes
Example

Use OR to display each person with the first name equal to "Tove", or the last name equal
to "Svendson":

SELECT * FROM Persons


WHERE firstname='Tove'
OR lastname='Svendson'

Result:
LastName FirstName Address City
Svendson Tove Borgvn 23 Sandnes
Svendson Stephen Kaivn 18 Sandnes

pdfMachine
A pdf writer that produces quality PDF files with ease!
53
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Example

You can also combine AND and OR (use parentheses to form complex expressions):

SELECT * FROM Persons WHERE


(FirstName='Tove' OR FirstName='Stephen')
AND LastName='Svendson'

Result:
LastName FirstName Address City
Svendson Tove Borgvn 23 Sandnes
Svendson Stephen Kaivn 18 Sandnes

BETWEEN ... AND

The BETWEEN ... AND operator selects a range of data between two values. These
values can be numbers, text, or dates.

SELECT column_name FROM table_name


WHERE column_name
BETWEEN value1 AND value2

Original Table (used in the examples)


LastName FirstName Address City
Hansen Ola Timoteivn 10 Sandnes
Nordmann Anna Neset 18 Sandnes
Pettersen Kari Storgt 20 Stavanger
Svendson Tove Borgvn 23 Sandnes

Example 1

To display the persons alphabetically between (and including) "Hansen" and exclusive
"Pettersen", use the following SQL:

SELECT * FROM Persons WHERE LastName


BETWEEN 'Hansen' AND 'Pettersen'

Result:

LastName FirstName Address City


Hansen Ola Timoteivn 10 Sandnes
Nordmann Anna Neset 18 Sandnes

pdfMachine
A pdf writer that produces quality PDF files with ease!
54
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Example 2

To display the persons outside the range used in the previous example, use the NOT
operator:

SELECT * FROM Persons WHERE LastName


NOT BETWEEN 'Hansen' AND 'Pettersen'
Result:
LastName FirstName Address City
Pettersen Kari Storgt 20 Stavanger
Svendson Tove Borgvn 23 Sandnes

IN

The IN operator may be used if you know the exact value you want to return for at least
one of the columns.

SELECT column_name FROM table_name


WHERE column_name IN (value1,value2,..)

Original Table (used in the examples)


LastName FirstName Address City
Hansen Ola Timoteivn 10 Sandnes
Nordmann Anna Neset 18 Sandnes
Pettersen Kari Storgt 20 Stavanger
Svendson Tove Borgvn 23 Sandnes

Example 1

To display the persons with LastName equal to "Hansen" or "Pettersen", use the
following SQL:

SELECT * FROM Persons


WHERE LastName IN ('Hansen','Pettersen')
Result:
LastName FirstName Address City
Hansen Ola Timoteivn 10 Sandnes
Pettersen Kari Storgt 20 Stavanger

With SQL, aliases can be used for column names and table names.

pdfMachine
A pdf writer that produces quality PDF files with ease!
55
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Column Name Alias


The syntax is:

SELECT column AS column_alias FROM table

Table Name Alias

The syntax is:

SELECT column FROM table AS table_alias

Example: Using a Column Alias

This table (Persons):

LastName FirstName Address City


Hansen Ola Timoteivn 10 Sandnes
Svendson Tove Borgvn 23 Sandnes
Pettersen Kari Storgt 20 Stavanger

And this SQL:

SELECT LastName AS Family, FirstName AS Name


FROM Persons

Returns this result:

Family Name
Hansen Ola
Svendson Tove
Pettersen Kari

Example: Using a Table Alias

This table (Persons):

LastName FirstName Address City


Hansen Ola Timoteivn 10 Sandnes
Svendson Tove Borgvn 23 Sandnes
Pettersen Kari Storgt 20 Stavanger

pdfMachine
A pdf writer that produces quality PDF files with ease!
56
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

And this SQL:

SELECT LastName, FirstName


FROM Persons AS Employees

Returns this result:

Table Employees:

LastName FirstName
Hansen Ola
Svendson Tove
Pettersen Kari

The ORDER BY keyword is used to sort the result.

Sort the Rows


The ORDER BY clause is used to sort the rows.

Orders:

Company OrderNumber
Sega 3412
ABC Shop 5678
W3Schools 2312
W3Schools 6798

Example

To display the companies in alphabetical order:

SELECT Company, OrderNumber FROM Orders


ORDER BY Company

Result:

Company OrderNumber
ABC Shop 5678
Sega 3412
W3Schools 6798
W3Schools 2312

pdfMachine
A pdf writer that produces quality PDF files with ease!
57
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Example

To display the companies in alphabetical order AND the ordernumbers in numerical


order:

SELECT Company, OrderNumber FROM Orders


ORDER BY Company, OrderNumber

Result:

Company OrderNumber
ABC Shop 5678
Sega 3412
W3Schools 2312
W3Schools 6798

Example

To display the companies in reverse alphabetical order:

SELECT Company, OrderNumber FROM Orders


ORDER BY Company DESC
Result:
Company OrderNumber
W3Schools 6798
W3Schools 2312
Sega 3412
ABC Shop 5678

Example

To display the companies in reverse alphabetical order AND the ordernumbers in


numerical order:

SELECT Company, OrderNumber FROM Orders


ORDER BY Company DESC, OrderNumber ASC
Result:
Company OrderNumber
W3Schools 2312
W3Schools 6798
Sega 3412
ABC Shop 5678

pdfMachine
A pdf writer that produces quality PDF files with ease!
58
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

SQL has a lot of built-in functions for counting and calculations.

Function Syntax

The syntax for built-in SQL functions is:

SELECT function (column) FROM table

Types of Functions
There are several basic types and categories of functions in SQL. The basic types of
functions are:

 Aggregate Functions
 Scalar functions

Aggregate functions
Aggregate functions operate against a collection of values, but return a single value.

Note: If used among many other expressions in the item list of a SELECT statement, the
SELECT must have a GROUP BY clause!!

"Persons" table (used in most of the given examples)


Name Age
Hansen, Ola 34
Svendson, Tove 45
Pettersen, Kari 19
Function Description
AVG(column) Returns the average value of a column
CHECKSUM
COUNT(column) Returns the number of rows (without a NULL value)
of a column
COUNT(*) Returns the number of selected rows
COUNT(DISTINCT Returns the number of distinct results
column)
FIRST(column) Returns the value of the first record in a specified field
(not supported in SQLServer2K)
LAST(column) Returns the value of the last record in a specified field
(not supported in SQLServer2K)
MAX(column) Returns the highest value of a column
MIN(column) Returns the lowest value of a column
SUM(column) Returns the total sum of a column
VAR(column)

pdfMachine
A pdf writer that produces quality PDF files with ease!
59
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Scalar functions

Scalar functions operate against a single value, and return a single value based on the
input value.

Useful Scalar Functions in MS Access


Aggregate functions (like SUM) often need an added GROUP BY functionality.

GROUP BY...

GROUP BY... was added to SQL because aggregate functions (like SUM) return the
aggregate of all column values every time they are called, and without the GROUP BY
function it was impossible to find the sum for each individual group of column values.

The syntax for the GROUP BY function is:

SELECT column, SUM (column) FROM table GROUP BY column

GROUP BY Example

This is "Sales" Table:

Company Amount
W3Schools 5500
IBM 4500
W3Schools 7100
And This SQL:
SELECT Company, SUM (Amount) FROM Sales

Returns this result:

Company SUM(Amount)
W3Schools 17100
IBM 17100
W3Schools 17100

The above code is invalid because the column returned is not part of an aggregate. A
GROUP BY clause will solve this problem:

SELECT Company,SUM(Amount) FROM Sales


GROUP BY Company

pdfMachine
A pdf writer that produces quality PDF files with ease!
60
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Returns this result:

Company SUM(Amount)
W3Schools 12600
IBM 4500

HAVING...

HAVING... was added to SQL because the WHERE keyword could not be used against
aggregate functions (like SUM), and without HAVING... it would be impossible to test
for result conditions.

The syntax for the HAVING function is:

SELECT column,SUM(column) FROM table


GROUP BY column
HAVING SUM (column) condition value

This "Sales" Table:

Company Amount
W3Schools 5500
IBM 4500
W3Schools 7100

This SQL:

SELECT Company,SUM(Amount) FROM Sales


GROUP BY Company
HAVING SUM(Amount)>10000

Returns this result

Company SUM(Amount)
W3Schools 12600

The SELECT INTO Statement

The SELECT INTO statement is most often used to create backup copies of tables or for
archiving records.

Syntax
SELECT column_name(s) INTO newtable [IN externaldatabase]
FROM source

pdfMachine
A pdf writer that produces quality PDF files with ease!
61
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

The INSERT INTO Statement


The INSERT INTO statement is used to insert new rows into a table.

Syntax
INSERT INTO table_name
VALUES (value1, value2,....)

You can also specify the columns for which you want to insert data:

INSERT INTO table_name (column1, column2,...)


VALUES (value1, value2,....)
Insert a New Row

This "Persons" table:

LastName FirstName Address City


Pettersen Kari Storgt 20 Stavanger
And this SQL statement:
INSERT INTO Persons
VALUES ('Hetland', 'Camilla', 'Hagabakka 24', 'Sandnes')

Will give this result:

LastName FirstName Address City


Pettersen Kari Storgt 20 Stavanger
Hetland Camilla Hagabakka 24 Sandnes

Insert Data in Specified Columns

This "Persons" table:

LastName FirstName Address City


Pettersen Kari Storgt 20 Stavanger
Hetland Camilla Hagabakka 24 Sandnes

And This SQL statement:

INSERT INTO Persons (LastName, Address)


VALUES ('Rasmussen', 'Storgt 67')

Will give the following result:

pdfMachine
A pdf writer that produces quality PDF files with ease!
62
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

LastName FirstName Address City


Pettersen Kari Storgt 20 Stavanger
Hetland Camilla Hagabakka 24 Sandnes
Rasmussen Storgt 67

The Update Statement

The UPDATE statement is used to modify the data in a table.

Syntax
UPDATE table_name
SET column_name = new_value
WHERE column_name = some_value

Person:

LastName FirstName Address City


Nilsen Fred Kirkegt 56 Stavanger
Rasmussen Storgt 67

Update one Column in a Row


We want to add a first name to the person with a last name of "Rasmussen":
UPDATE Person SET FirstName = 'Nina'
WHERE LastName = 'Rasmussen'

Result:

LastName FirstName Address City


Nilsen Fred Kirkegt 56 Stavanger
Rasmussen Nina Storgt 67

Update several Columns in a Row

We want to change the address and add the name of the city:

UPDATE Person
SET Address = 'Stien 12', City = 'Stavanger'
WHERE LastName = 'Rasmussen'

pdfMachine
A pdf writer that produces quality PDF files with ease!
63
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Result:

LastName FirstName Address City


Nilsen Fred Kirkegt 56 Stavanger
Rasmussen Nina Stien 12 Stavanger

The DELETE Statement


The DELETE statement is used to delete rows in a table.

Syntax
DELETE FROM table_name
WHERE column_name = some_value

Person:

LastName FirstName Address City


Nilsen Fred Kirkegt 56 Stavanger
Rasmussen Nina Stien 12 Stavanger

Delete a Row

"Nina Rasmussen" is going to be deleted:

DELETE FROM Person WHERE LastName = 'Rasmussen'

Result

LastName FirstName Address City


Nilsen Fred Kirkegt 56 Stavanger

Delete All Rows

It is possible to delete all rows in a table without deleting the table. This means that the
table structure, attributes, and indexes will be intact:

DELETE FROM table_name


or
DELETE * FROM table_name

pdfMachine
A pdf writer that produces quality PDF files with ease!
64
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Joins and Keys


Sometimes we have to select data from two or more tables to make our result complete.
We have to perform a join.

Tables in a database can be related to each other with keys. A primary key is a column
with a unique value for each row. Each primary key value must be unique within the
table. The purpose is to bind data together, across tables, without repeating all of the data
in every table.

In the "Employees" table below, the "Employee_ID" column is the primary key, meaning
that no two rows can have the same Employee_ID. The Employee_ID distinguishes two
persons even if they have the same name.

When you look at the example tables below, notice that:

 The "Employee_ID" column is the primary key of the "Employees" table


 The "Prod_ID" column is the primary key of the "Orders" table
 The "Employee_ID" column in the "Orders" table is used to refer to the persons in
the "Employees" table without using their names

Employees:

Employee_ID Name
01 Hansen, Ola
02 Svendson, Tove
03 Svendson, Stephen
04 Pettersen, Kari

Orders:

Prod_ID Product Employee_ID


234 Printer 01
657 Table 03
865 Chair 03

Referring to Two Tables

We can select data from two tables by referring to two tables, like this:

Example

Who has ordered a product, and what did they order?

pdfMachine
A pdf writer that produces quality PDF files with ease!
65
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

SELECT Employees.Name, Orders.Product


FROM Employees, Orders
WHERE Employees.Employee_ID=Orders.Employee_ID

Result

Name Product
Hansen, Ola Printer
Svendson, Stephen Table
Svendson, Stephen Chair

Example

Who ordered a printer?

SELECT Employees.Name
FROM Employees, Orders
WHERE Employees.Employee_ID=Orders.Employee_ID
AND Orders.Product='Printer'

Result

Name
Hansen, Ola

Using Joins

OR we can select data from two tables with the JOIN keyword, like this:

Example INNER JOIN

Syntax

SELECT field1, field2, field3


FROM first_table
INNER JOIN second_table
ON first_table.keyfield = second_table.foreign_keyfield

Who has ordered a product, and what did they order?

SELECT Employees.Name, Orders.Product


FROM Employees
INNER JOIN Orders
ON Employees.Employee_ID=Orders.Employee_ID

pdfMachine
A pdf writer that produces quality PDF files with ease!
66
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

The INNER JOIN returns all rows from both tables where there is a match. If there are
rows in Employees that do not have matches in Orders, those rows will not be listed.

Result

Name Product
Hansen, Ola Printer
Svendson, Stephen Table
Svendson, Stephen Chair

Example LEFT JOIN

Syntax

SELECT field1, field2, field3


FROM first_table
LEFT JOIN second_table
ON first_table.keyfield = second_table.foreign_keyfield
List all employees, and their orders - if any.
SELECT Employees.Name, Orders.Product
FROM Employees
LEFT JOIN Orders
ON Employees.Employee_ID=Orders.Employee_ID

The LEFT JOIN returns all the rows from the first table (Employees), even if there are no
matches in the second table (Orders). If there are rows in Employees that do not have
matches in Orders, those rows also will be listed.

Result

Name Product
Hansen, Ola Printer
Svendson, Tove
Svendson, Stephen Table
Svendson, Stephen Chair
Pettersen, Kari

Example RIGHT JOIN


Syntax
SELECT field1, field2, field3
FROM first_table
RIGHT JOIN second_table
ON first_table.keyfield = second_table.foreign_keyfield

pdfMachine
A pdf writer that produces quality PDF files with ease!
67
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

List all orders, and who has ordered - if any.

SELECT Employees.Name, Orders.Product


FROM Employees
RIGHT JOIN Orders
ON Employees.Employee_ID=Orders.Employee_ID

The RIGHT JOIN returns all the rows from the second table (Orders), even if there are no
matches in the first table (Employees). If there had been any rows in Orders that did not
have matches in Employees, those rows also would have been listed.

Result

Name Product
Hansen, Ola Printer
Svendson, Stephen Table
Svendson, Stephen Chair

Example
Who ordered a printer?
SELECT Employees.Name
FROM Employees
INNER JOIN Orders
ON Employees.Employee_ID=Orders.Employee_ID
WHERE Orders.Product = 'Printer'

Result

Name
Hansen, Ola

Create a Database
To create a database:
CREATE DATABASE database_name

Create a Table

To create a table in a database:

CREATE TABLE table_name


(
column_name1 data_type,
column_name2 data_type,
.......)

pdfMachine
A pdf writer that produces quality PDF files with ease!
68
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Example

This example demonstrates how you can create a table named "Person", with four
columns. The column names will be "LastName", "FirstName", "Address", and "Age":

CREATE TABLE Person


(
LastName varchar,
FirstName varchar,
Address varchar,
Age int
)

This example demonstrates how you can specify a maximum length for some columns:

CREATE TABLE Person


(
LastName varchar(30),
FirstName varchar,
Address varchar,
Age int(3)
)

The data type specifies what type of data the column can hold. The table below contains
the most common data types in SQL:

Data Type Description


integer(size) Hold integers only. The maximum number of digits are
int(size) specified in parenthesis.
smallint(size)
tinyint(size)
decimal(size,d) Hold numbers with fractions. The maximum number of
numeric(size,d) digits are specified in "size". The maximum number of
digits to the right of the decimal is specified in "d".
char(size) Holds a fixed length string (can contain letters, numbers,
and special characters). The fixed size is specified in
parenthesis.
varchar(size) Holds a variable length string (can contain letters,
numbers, and special characters). The maximum size is
specified in parenthesis.
date(yyyymmdd) Holds a date

pdfMachine
A pdf writer that produces quality PDF files with ease!
69
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

ALTER TABLE
The ALTER TABLE statement is used to add or drop columns in an existing table.
ALTER TABLE table_name
ADD column_name datatype
ALTER TABLE table_name
DROP COLUMN column_name

Some database systems don't allow the dropping of a column in a database table
Note:
(DROP COLUMN column_name).

Person:

LastName FirstName Address


Pettersen Kari Storgt 20

Example

To add a column named "City" in the "Person" table:

ALTER TABLE Person ADD City varchar(30)

Result:

LastName FirstName Address City


Pettersen Kari Storgt 20

Example

To drop the "Address" column in the "Person" table:

ALTER TABLE Person DROP COLUMN Address

Result:

LastName FirstName City


Pettersen Kari

pdfMachine
A pdf writer that produces quality PDF files with ease!
70
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

Delete a Table or Database

To delete a table (the table structure attributes, and indexes will also be deleted):

DROP TABLE table_name

To delete a database:

DROP DATABASE database_name

Views

A view is a virtual table based on the result-set of a SELECT statement. In SQL, a VIEW
is a virtual table based on the result-set of a SELECT statement.
A view contains rows and columns, just like a real table. The fields in a view are fields
from one or more real tables in the database. You can add SQL functions, WHERE, and
JOIN statements to a view and present the data as if the data were coming from a single
table.
Note: The database design and structure will NOT be affected by the functions, where, or
join statements in a view.

Syntax
CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition

Note: The database does not store the view data! The database engine recreates the data,
using the view's SELECT statement, every time a user queries a view.

Using Views
A view could be used from inside a query, a stored procedure, or from inside another
view. By adding functions, joins, etc., to a view, it allows you to present exactly the data
you want to the user.

The sample database Northwind has some views installed by default. The view "Current
Product List" lists all active products (products that are not discontinued) from the
Products table. The view is created with the following SQL:

CREATE VIEW [Current Product List] AS


SELECT ProductID,ProductName
FROM Products
WHERE Discontinued=No

pdfMachine
A pdf writer that produces quality PDF files with ease!
71
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!
Intermediate Database Management

We can query the view above as follows:


SELECT * FROM [Current Product List]

Another view from the Northwind sample database selects every product in the Products
table that has a unit price that is higher than the average unit price:

CREATE VIEW [Products above Average Price] AS


SELECT ProductName, UnitPrice
FROM Products
WHERE UnitPrice> (SELECT AVG (UnitPrice) FROM Products)

We can query the view above as follows:

SELECT * FROM [Products above Average Price]

Another example view from the Northwind database calculates the total sale for each
category in 1997. Note that this view selects its data from another view called "Product
Sales for 1997":

CREATE VIEW [Category Sales For 1997] AS


SELECT DISTINCT CategoryName,Sum(ProductSales) AS CategorySales

FROM [Product Sales for 1997]


GROUP BY CategoryName

We can query the view above as follows:


SELECT * FROM [Category Sales For 1997]
We can also add a condition to the query. Now we want to see the total sale only for the
category "Beverages":
SELECT * FROM [Category Sales for 1997]
WHERE CategoryName='Beverages'

pdfMachine
A pdf writer that produces quality PDF files with ease!
72
Produce quality PDF files in seconds and preserve the integrity of your original documents. Compatible across
nearly all Windows platforms, simply open the document you want to convert, click “print”, select the
“Broadgun pdfMachine printer” and that’s it! Get yours now!

You might also like