Download as pdf or txt
Download as pdf or txt
You are on page 1of 97

MODERN DATABASES

UNIT – 1
Overview of DBMS and SQL: Introduction to DBMS and SQL, SQL Data
Definition and Data Types, Schema change statements in SQL, Specifying basic
constraints in SQL, Basic Queries in SQL, More Complex Queries in SQL.

1. INTRODUCTION TO DBMS

WHAT IS DATA?

Data is a collection of a distinct small unit of information. It can be used in a variety


of forms like text, numbers, media, bytes, etc. it can be stored in pieces of paper or
electronic memory, etc.

Word 'Data' is originated from the word 'datum' that means 'single piece of
information.' It is plural of the word datum.

In computing, Data is information that can be translated into a form for efficient
movement and processing. Data is interchangeable.

WHAT IS DATABASE

A database is an organized collection of data, so that it can be easily accessed and


managed.

You can organize data into tables, rows, columns, and index it to make it easier to
find relevant information.

Database handlers create a database in such a way that only one set of software
program provides access of data to all the users.

The main purpose of the database is to operate a large amount of information by


storing, retrieving, and managing data.
There are many dynamic websites on the World Wide Web nowadays which are
handled through databases. For example, a model that checks the availability of
rooms in a hotel. It is an example of a dynamic website that uses a database.

There are many databases available like MySQL, Sybase, Oracle, MongoDB,
Informix, PostgreSQL, SQL Server, etc.

Modern databases are managed by the database management system (DBMS).

SQL or Structured Query Language is used to operate on the data stored in a


database. SQL depends on relational algebra and tuple relational calculus.

A cylindrical structure is used to display the image of a database.

DATABASE MANAGEMENT SYSTEM

o Database management system is a software which is used to manage the


database. For example: MySQL, Oracle, etc are a very popular commercial
database which is used in different applications.
o DBMS provides an interface to perform various operations like database
creation, storing data in it, updating data, creating a table in the database and a
lot more.
o It provides protection and security to the database. In the case of multiple
users, it also maintains data consistency.

DBMS allows users the following tasks:

o Data Definition: It is used for creation, modification, and removal of definition


that defines the organization of data in the database.
o Data Updation: It is used for the insertion, modification, and deletion of the
actual data in the database.
o Data Retrieval: It is used to retrieve the data from the database which can be
used by applications for various purposes.
o User Administration: It is used for registering and monitoring users, maintain
data integrity, enforcing data security, dealing with concurrency control,
monitoring performance and recovering information corrupted by unexpected
failure.

Characteristics of DBMS

o It uses a digital repository established on a server to store and manage the


information.
o It can provide a clear and logical view of the process that manipulates data.
o DBMS contains automatic backup and recovery procedures.
o It contains ACID properties which maintain data in a healthy state in case of
failure.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
o It is used to provide security of data.
o It can view the database from different viewpoints according to the
requirements of the user.

Advantages of DBMS

o Controls database redundancy: It can control data redundancy because it stores


all the data in one single database file and that recorded data is placed in the
database.
o Data sharing: In DBMS, the authorized users of an organization can share the
data among multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature
of the database system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic
backup of data from hardware and software failures and restores the data if
required.
o multiple user interface: It provides different types of user interfaces like
graphical user interfaces, application program interfaces

Disadvantages of DBMS

o Cost of Hardware and Software: It requires a high speed of data processor and
large memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in
most of the organization, all the data stored in a single database and if the
database is damaged due to electric failure or database corruption then the data
may be lost forever.

RDBMS (Relational Database Management System)

The word RDBMS is termed as 'Relational Database Management System.'


It is represented as a table that contains rows and column.

RDBMS is based on the Relational model; it was introduced by E. F. Codd.

A relational database contains the following components:

o Table
o Record/ Tuple
o Field/Column name /Attribute
o Instance
o Schema
o Keys

An RDBMS is a tabular DBMS that maintains the security, integrity, accuracy, and
consistency of the data.

Applications

 Banking Management
 Business Organization
 Manufacturing
 Education
 Medical Field
 Railways
2. INTRODUCTION TO SQL

SQL
o SQL stands for Structured Query Language. It is used for storing and
managing data in relational database management system (RDMS).
o It is a standard language for Relational Database System. It enables a user to
create, read, update and delete relational databases and tables.
o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server
use SQL as their standard database language.
o SQL allows users to query the database in a number of ways, using English-
like statements.

Rules:

SQL follows the following rules:

o Structure query language is not case sensitive. Generally, keywords of SQL


are written in uppercase.
o Statements of SQL are dependent on text lines. We can use a single SQL
statement on one or multiple text line.
o Using the SQL statements, you can perform most of the actions in a
database.
o SQL depends on tuple relational calculus and relational algebra.

SQL process:
o When an SQL command is executing for any RDBMS, then the system
figure out the best way to carry out the request and the SQL engine
determines that how to interpret the task.
o In the process, various components are included. These components can be
optimization Engine, Query engine, Query dispatcher, classic, etc.
o All the non-SQL queries are handled by the classic query engine, but SQL
query engine won't handle logical files.
Characteristics of SQL
o SQL is easy to learn.
o SQL is used to access data from relational database management systems.
o SQL can execute queries against the database.
o SQL is used to describe the data.
o SQL is used to define the data in the database and manipulate it when
needed.
o SQL is used to create and drop the database and table.
o SQL is used to create a view, stored procedure, function in a database.
o SQL allows users to set permissions on tables, procedures, and views.

What Can SQL do?


• SQL can execute queries against a database

• SQL can retrieve data from a database

• SQL can insert records in a database

• SQL can update records in a database

• SQL can delete records from a database 2i 39 h 0

• SQL can create new databases

• SQL can create new tables in a database


• SQL can create stored procedures in a database

• SQL can create views in a database

• SQL can set permissions on tables, procedures, and views


• SQL works with database programs like MS Access, DB@, Informix, MS SQL
Server, Oracle, Sybase etc,

Advantages of SQL

There are the following advantages of SQL:

 High speed
 No coding needed
 Well defined standards
 Portability
 Interactive language
 Multiple data view

3. SQL DATA DEFINITION

The set of relations in a database must be specified to the system by means of a


data-definition language (DDL). The SQL DDL allows specification of not only a
set of relations, but also information about each relation, including:

• The schema for each relation.


• The types of values associated with each attribute.
• The integrity constraints.
• The set of indices to be maintained for each relation.
• The security and authorization information for each relation.
• The physical storage structure of each relation on disk.
We discuss here basic schema definition and basic types; we defer discussion of
the other SQL DDL features to later topics.
1. Basic Types
The SQL standard supports a variety of built-in types, including:
• char (n): fixed-length character string with user-specified length n. The full
form, character, can be used instead.
• varchar(n): A variable-length character string with user-specified maximum
length n. The full form, character varying, is equivalent.
• int: An integer (a finite subset of the integers that is machine dependent). The
full form, integer, is equivalent.
• smallint: A small integer (a machine-dependent subset of the integer type).
• numeric(p, d): A fixed-point number with user-specified precision. The num?ber
consists of p digits (plus a sign), and d of the p digits are to the right of
the decimal point. Thus, numeric(3,1) allows 44.5 to be stored exactly, but
neither 444.5 or 0.32 can be stored exactly in a field of this type.
• real, double precision: Floating-point and double-precision floating-point
numbers with machine-dependent precision.
• float(n): A floating-point number, with precision of at least n digits.

Each type may include a special value called the null value. A null value
indicates an absent value that may exist but be unknown or that may not exist at
all. In certain cases, we may wish to prohibit null values from being entered, as
we shall see shortly.
The char data type stores fixed length strings. Consider, for example, an
attribute A of type char(10). If we store a string “Avi” in this attribute, 7 spaces
are appended to the string to make it 10 characters long. In contrast, if attribute B
were of type varchar(10), and we store “Avi” in attribute B, no spaces would be
added. When comparing two values of type char, if they are of different lengths
extra spaces are automatically added to the shorter one to make them the same
size, before comparison.
When comparing a char type with a varchar type, one may expect extra spaces
to be added to the varchar type to make the lengths equal, before comparison;
however, this may or may not be done, depending on the database system. As a
result, even if the same value “Avi” is stored in the attributes A and B above, a
comparison A=B may return false. We recommend you always use the varchar
type instead of the char type to avoid these problems.
SQL also provides the nvarchar type to store multilingual data using the
Unicode representation. However, many databases allow Unicode (in the UTF-8
representation) to be stored even in varchar types.
2. Basic Schema Definition
We define an SQL relation by using the create table command. The following
command creates a relation department in the database.

create table department


(dept_name varchar (20),
building varchar (15),
budget numeric (12,2),
primary key (dept_name));

The relation created above has three attributes, dept_name, which is a character
string of maximum length 20, building, which is a character string of maximum
length 15, and budget, which is a number with 12 digits in total, 2 of which are
after the decimal point. The create table command also specifies that
the dept_name attribute is the primary key of the department relation.
The general form of the create table command is:

create table r
(A1 D2,
A2 D2,
...,
An Dn,
{integrity-constraint 1},
...,
{integrity-constraint k });

where r is the name of the relation, each Ai is the name of an attribute in the
schema of relation r, and Di is the domain of attribute Ai; that is, Di specifies the
type of attribute Ai along with optional constraints that restrict the set of allowed
values for Ai.
The semicolon shown at the end of the create table statements, as well as
at the end of other SQL statements later in this chapter, is optional in many SQL
implementations.
SQL supports a number of different integrity constraints. In this section, we
discuss only a few of them:
• primary key (Aj1 , Aj2 ,..., Ajm ): The primary-key specification says that
at?tributes Aj1 , Aj2 ,..., Ajm form the primary key for the relation. The
primary?key attributes are required to be nonnull and unique; that is, no tuple can
have
a null value for a primary-key attribute, and no two tuples in the relation
can be equal on all the primary-key attributes. Although the primary-
key specification is optional, it is generally a good idea to specify a primary key
for each relation.
• foreign key (Ak1 , Ak2 ,..., Akn )references s: The foreign key specification says
that the values of attributes (Ak1 , Ak2 ,..., Akn ) for any tuple in the relation
must correspond to values of the primary key attributes of some tuple in
relation s.
Figure 3.1 presents a partial SQL DDL definition of the university database we
use in the text. The definition of the course table has a declaration “foreign key
(dept_name)references department”. This foreign-key declaration specifies that
for each course tuple, the department name specified in the tuple must exist
in the primary key attribute (dept_name) of the department relation. Without
this constraint, it is possible for a course to specify a nonexistent department
name. Figure 3.1 also shows foreign key constraints on tables section, instructor
and teaches.
• not null: The not null constraint on an attribute specifies that the null value
is not allowed for that attribute; in other words, the constraint excludes the
null value from the domain of that attribute. For example, in Figure 3.1, the
not null constraint on the name attribute of the instructor relation ensures that
the name of an instructor cannot be null.
SQL prevents any update to the database that violates an integrity constraint.
For example, if a newly inserted or modified tuple in a relation has null values for
any primary-key attribute, or if the tuple has the same value on the primary-key
attributes as does another tuple in the relation, SQL flags an error and prevents the
update. Similarly, an insertion of a course tuple with a dept_name value that does
not appear in the department relation would violate the foreign-key constraint on
course, and SQL prevents such an insertion from taking place.
A newly created relation is empty initially. We can use the insert command
to load data into the relation. For example, if we wish to insert the fact that there
is an instructor named Smith in the Biology department with instructor_id 10211
and a salary of $66,000, we write:
insert into instructor
values (10211, ’Smith’, ’Biology’, 66000);
The values are specified in the order in which the corresponding attributes are
listed in the relation schema. The insert command has a number of useful features,
and is covered in more detail later.
We can use the delete command to delete tuples from a relation. The command
delete from student;
would delete all tuples from the student relation. Other forms of the delete
command allow specific tuples to be delete.
To remove a relation from an SQL database, we use the drop table command.
The drop table command deletes all information about the dropped relation from
the database. The command
drop table r;
is a more drastic action than
delete from r;
The latter retains relation r, but deletes all tuples in r. The former deletes not only
all tuples of r, but also the schema for r. After r is dropped, no tuples can be
inserted into r unless it is re-created with the create table command.
We use the alter table command to add attributes to an existing relation. All
tuples in the relation are assigned null as the value for the new attribute. The form
of the alter table command is
alter table r add A D;
where r is the name of an existing relation, A is the name of the attribute to be
added, and D is the type of the added attribute. We can drop attributes from a
relation by the command
alter table r drop A;
where r is the name of an existing relation, and A is the name of an attribute of the
relation. Many database systems do not support dropping of attributes, although
they will allow an entire table to be dropped.
4. SQL DATA TYPES

Data types are used to represent the nature of the data that can be stored in the
database table. For example, in a particular column of a table, if we want to store a
string type of data then we will have to declare a string data type of this column.

 SQL data types can be broadly divided into following categories.


 Numeric data types such as int, tinyint, bigint, float, real, etc.
 Date and Time data types such as Date, Time, Datetime, etc.
 Character and String data types such as char, varchar, text, etc.
 Unicode character string data types, for example nchar, nvarchar, ntext, etc.
 Binary data types such as binary, varbinary, etc.
 Miscellaneous data types - clob, blob, xml, cursor, table, etc.

SQL Data Types important points:


1. Not all data types are supported by every relational database vendor. For
example, Oracle database doesn’t support DATETIME and MySQL doesn’t
support CLOB data type. So while designing database schema and writing
SQL queries, make sure to check if the data types are supported or not.
2. Data types listed here doesn’t include all the data types, these are the most
popularly used data types. Some relational database vendors have their own
data types that might be not listed here. For example, Microsoft SQL Server
has money and smallmoney data types but since it’s not supported by other
popular database vendors, it’s not listed here.
3. Every relational database vendor has its own maximum size limit for
different data types, you don’t need to remember the limit. Idea is to have the
knowledge of what data type to be used in a specific scenario.

SQL Numeric Data Types


Numeric data types include
• integer numbers of various sizes INTERGER or INT and
SMALLINT
• floating points of various precision
FLOAT or REAL
DOUBLE PRECISION
• Formatted numbers can be declared by using
• DECIMAL (i , j) or DEC(i , j) or NUMERIC( i , j) where i is
the precision and j is the scale
• The default scale is zero and default for precision is
implementation defined.

SQL Date and Time Data Types


• date and time
• The DATE data type has ten positions, and its components are
YEAR, MONTH, and DAY typically in the form YYYY-MM-DD.
• The TIME data type has at least eight positions, with the components
HOUR, MINUTE, and SECOND, typically in the form HH:MM:SS.
• The < (less than) comparison can be used with dates or times.
• Literal values are represented by single quoted strings preceded by
DATE or TIME
DATE’2002-09-27’ or TIME’09:12:47’
• timestamp
• A timestamp data type (TIMESTAMP) includes both the DATE and
TIME fields, plus a maximum number of six positions for decimal
fractions of seconds and an optional WITH TIME ZONE qualifier
• Literal values are specified as follows:
TIMESTAMP’2002 -09-27 09:12:47 648302’
(Minimum one space between date and time )
 Interval
• INTERVAL data type specifies an interval—a relative value that
can be used to increment or decrement an absolute value of a date,
time, or timestamp.

SQL Character and String Data Types


Character-string 2 types
• fixed-length
CHAR(n) or CHARACTER(n), where n is the number of characters.
 Varying-length
VARCHAR(n) or CHAR VARYING(n) or CHARACTER
VARYING(n), where n is the maximum number of characters. Literal
string value is placed between single quotation marks and it is case
sensitive.
1. Example ‘SMITH’ and ‘smith’

• Bit-string  2 types
• fixed length
BIT(n), where n is the maximum number of bits.
• varying length
BIT VARYING(n), where n is the maximum number of bits.
The default for n, the length of a character string or bit string,
is one.
• Literal bit strings are placed between single quotes but
precede by a B ; example B’10101’

SQL Unicode Character and String Data Types

SQL Binary Data Types


SQL Miscellaneous Data Types

5. SCHEMA CHANGE STATEMENTS IN SQL

In this section, we give an overview of the schema evolution commands available in


SQL, which can be used to alter a schema by adding or dropping tables, attributes,
constraints, and other schema elements. This can be done while the database is
operational and does not require recompilation of the database schema. Certain
checks must be done by the DBMS to ensure that the changes do not affect the rest of
the database and make it inconsistent.
What is a schema in SQL Server
A schema is a collection of database objects including
tables, views, triggers, stored procedures, indexes, etc. A schema is associated with
a username which is known as the schema owner, who is the owner of the logically
related database objects.
A schema always belongs to one database. On the other hand, a database
may have one or multiple schemas. For example, in our BikeStores sample
database, we have two schemas: sales and production.
An object within a schema is qualified using the
schema_name.object_name format like sales.orders.

Built-in schemas in SQL Server


SQL Server provides us with some pre-defined schemas which have the
same names as the built-in database users and roles, for example: dbo, guest, sys,
and INFORMATION_SCHEMA.
Note that SQL Server reserves the sys and INFORMATION_SCHEMA schemas
for system objects, therefore, you cannot create or drop any objects in these
schemas.
SQL Server CREATE SCHEMA statement:
CREATE SCHEMA schema_name
[AUTHORIZATION owner_name]

A schema can be changed by adding or dropping tables , attributes,


constraints and other schema elements.

5.4.1 The DROP Command


The DROP command can be used to drop named schema elements, such as tables,
domains, or constraints. One can also drop a schema. For example, if a whole
schema is no longer needed, the DROP SCHEMA command can be used. There
are two drop behavior options: CASCADE and RESTRICT.

The DROP SCHEMA statement allows you to delete a schema from a database.
The following shows the syntax of the DROP SCHEMA statement:
DROP SCHEMA [IF EXISTS] schema_name;
In this syntax:
1. First, specify the name of the schema that you want to drop. If the schema
contains any objects, the statement will fail. Therefore, you must delete all
objects in the schema before removing the schema.
2. Second, use the IF EXISTS option to conditionally remove the schema only
if the schema exists. Attempting to drop a non existing schema without
the IF EXISTS option will result in an error.

First, create a new schema named logistics:


CREATE SCHEMA logistics;
GO
Next, create a new table named deliveries inside the logistics
schema:
CREATE TABLE logistics.deliveries
(
order_id INT PRIMARY KEY,
delivery_date DATE NOT NULL,
delivery_status TINYINT NOT NULL
);

Then, drop the schema logistics:


DROP SCHEMA logistics;
SQL Server issued the following error because the schema is not empty.
Msg 3729, Level 16, State 1, Line 1
Cannot drop schema 'logistics' because it is being referenced by object
'deliveries'.
After that, drop the table logistics.deliveries:
DROP TABLE logistics.deliveries;
Finally, issue the DROP SCHEMA again to drop the logistics schema:
DROP SCHEMA IF EXISTS logistics;
Now, you will find that the logistics schema has been deleted from the database.
MySQL: DROP TABLE Statement
The MySQL DROP TABLE statement allows you to remove or delete a table from
the MySQL database.
Syntax
DROP TABLE table_name;
However, the full syntax for the MySQL DROP TABLE statement is:
DROP [ TEMPORARY ] TABLE [ IF EXISTS ]
table_name1, table_name2, ...
[ RESTRICT | CASCADE ];
Parameters or Arguments
table_name
The name of the table to remove from the database.
table_name1, table_name2
The tables to remove from the database, if removing more than one table in the
DROP TABLE statement.
IF EXISTS
Optional. If specified, the DROP TABLE statement will not raise an error if one of
the tables does not exist.
Note
If you use the MySQL DROP TABLE statement to drop one or more tables
that do not exist, the database will raise an error (unless you specify the IF
EXISTS parameter in the DROP TABLE statement).
Example
Drop One Table
DROP TABLE customers;
Drop Multiple Tables
DROP TABLE customers, suppliers;
This DROP TABLE statement example would delete two tables -
customers and suppliers. If we were worried that one of the tables doesn't exist
and we don't want to raise an error, we could modify our DROP TABLE statement
as follows:
DROP TABLE IF EXISTS customers, suppliers;
This example would delete the customers and suppliers tables and would not raise
an error if one of the tables didn't exist.
Example : DROP TABLE DEPEDENT ;
– CASCADE option , when dropping a table , will remove
table and all its definitions (including constraints)
– RESTRICT option is chosen, the table is dropped only if it
is not referenced in any constraints or views . Otherwise
DROP command will not be executed.

5.4.2 The ALTER Command


The definition of a base table or of other named schema elements can be changed by
using the ALTER command. For base tables, the possible alter table actions include
adding or dropping a column (attribute), changing a column definition, and adding
or dropping table constraints.
• The definition of a base table can be changed by using the ALTER TABLE
command.
• ALTER TABLE is used to
1. add a column (attribute)
2. drop a column (attribute)
3. change column definition,
4. add or dropping table constraints.
Examples:
1. ALTER TABLE EMPLOYEE ADD JOB VARCHAR(12);
2. ALTER TABLE EMPLOYEE DROP ADDRESS CASCADE;
3. ALTER TABLE DEPARTMENT ALTER MGRSSN DROP
DEFAULT;
4. ALTER TABLE DEPARTMENT ALTER MGRSSN SET
DEFAULT "333445555";
5. ALTER TABLE COMPANY.EMPLOYE DROP CONSTRAINT
EMPSUPERFK CASCADE;
2. SQL ALTER TABLE Statement
3. The ALTER TABLE statement is used to add, delete, or modify
columns in an existing table.
4. The ALTER TABLE statement is also used to add and drop various
constraints on an existing table.
5. ALTER TABLE - ADD Column
6. ALTER TABLE table_name
ADD column_name datatype;
ALTER TABLE - DROP COLUMN
To delete a column in a table, use the following syntax (notice that some
database systems don't allow deleting a column):
ALTER TABLE table_name
DROP COLUMN column_name;

ALTER TABLE - ALTER/MODIFY COLUMN


To change the data type of a column in a table, use the following syntax:
SQL Server / MS Access:
ALTER TABLE table_name
ALTER COLUMN column_name datatype;
My SQL / Oracle (prior version 10G):
ALTER TABLE table_name
MODIFY COLUMN column_name datatype;
Oracle 10G and later:
ALTER TABLE table_name
MODIFY column_name datatype;

ALTER TABLE - ALTER/MODIFY COLUMN Example


Persons" table:

Now we want to add a column named "DateOfBirth" in the "Persons" table. We use
the following SQL statement:

ALTER TABLE Persons ADD DateOfBirth date;


Now we want to change the data type of the column named "DateOfBirth" in the
"Persons" table.

ALTER TABLE Persons


ALTER COLUMN DateOfBirth year;

Notice that the "DateOfBirth" column is now of type year and is going to hold a
year in a two- or four-digit format.

DROP COLUMN Example

Next, we want to delete the column named "DateOfBirth" in the "Persons" table.

ALTER TABLE Persons


DROP COLUMN DateOfBirth;

RENAME COLUMN IN TABLE

Syntax

To rename a column in an existing table, the SQL ALTER TABLE syntax is:

For Oracle:

ALTER TABLE table_name RENAME COLUMN old_name TO new_name;

Example

ALTER TABLE supplier RENAME COLUMN supplier_name TO sname;

For MySQL:
ALTER TABLE table_name CHANGE COLUMN old_name TO new_name;

Example:

ALTER TABLE supplier CHANGE COLUMN supplier_name sname


VARCHAR(100);

RENAME TABLE

Syntax

To rename a table, the SQL ALTER TABLE syntax is:

For Oracle and MySQL :

ALTER TABLE table_name RENAME TO new_table_name;

Example

ALTER TABLE supplier RENAME TO vendor;

6. CONSTRAINTS IN SQL

Constraints in SQL means we are applying certain conditions or restrictions on the


database. This further means that before inserting data into the database, we are
checking for some conditions. If the condition we have applied to the database holds
true for the data which is to be inserted, then only the data will be inserted into the
database tables.
Constraints in SQL can be categorized into two types:
1. Column Level Constraint:
Column Level Constraint is used to apply a constraint on a single column.
2. Table Level Constraint:
Table Level Constraint is used to apply a constraint on multiple columns.

Some of the real-life examples of constraints are as follows:


1. Every person has a unique email id. This is because while creating an email
account for any user, the email providing services such as Gmail, Yahoo or
any other email providing service will always check for the availability of
the email id that the user wants for himself. If some other user already takes
the email id that the user wants, then that id cannot be assigned to another
user. This simply means that no two users can have the same email ids on the
same email providing service. So, here the email id is the constraint on the
database of email providing services.
2. Whenever we set a password for any system, there are certain constraints
that are to be followed. These constraints may include the following:
o There must be one uppercase character in the password.
o Password must be of at least eight characters in length.
o Password must contain at least one special symbol.

Constraints available in SQL are:


1. NOT NULL
2. UNIQUE
3. PRIMARY KEY
4. FOREIGN KEY
5. CHECK
6. DEFAULT
7. CREATE INDEX

Now let us try to understand the different constraints available in SQL in more detail
with the help of examples. We will use MySQL database for writing all the queries.

1. NOT NULL
o NULL means empty, i.e., the value is not available.
o Whenever a table's column is declared as NOT NULL, then the value for that
column cannot be empty for any of the table's records.
o There must exist a value in the column to which the NOT NULL constraint
is applied.

NOTE: NULL does not mean zero. NULL means empty column, not even zero.

Syntax to apply the NOT NULL constraint during table creation:

1. CREATE TABLE TableName (ColumnName1 datatype NOT NULL, ColumnNam


e2 datatype,…., ColumnNameN datatype);

Example: Up With Microsoft for Ad-Supported Subscription Plan

Create a student table and apply a NOT NULL constraint on one of the table's
column while creating a table.

1. CREATE TABLE student(StudentID INT NOT NULL, Student_FirstName VAR


CHAR(20), Student_LastName VARCHAR(20), Student_PhoneNumber VARCH
AR(20), Student_Email_ID VARCHAR(40));

To verify that the not null constraint is applied to the table's column and the student
table is created successfully, we will execute the following query:

mysql> DESC student;

Syntax to apply the NOT NULL constraint on an existing table's column:


1. ALTER TABLE TableName CHANGE Old_ColumnName New_ColumnName Da
tatype NOT NULL;

Example:

Consider we have an existing table student, without any constraints applied to it.
Later, we decided to apply a NOT NULL constraint to one of the table's column.
Then we will execute the following query:

mysql> ALTER TABLE student CHANGE StudentID StudentID INT NOT


NULL;

To verify that the not null constraint is applied to the student table's column, we will
execute the following query:

mysql> DESC student;

2. UNIQUE
o Duplicate values are not allowed in the columns to which the UNIQUE
constraint is applied.
o The column with the unique constraint will always contain a unique value.
o This constraint can be applied to one or more than one column of a table,
which means more than one unique constraint can exist on a single table.
o Using the UNIQUE constraint, you can also modify the already created
tables.

Syntax to apply the UNIQUE constraint on a single column:

CREATE TABLE TableName (ColumnName1 datatype UNIQUE, Colum


nName2 datatype,…., ColumnNameN datatype);

Example:

Create a student table and apply a UNIQUE constraint on one of the table's column
while creating a table.

mysql> CREATE TABLE student(StudentID INT UNIQUE, Student_FirstName


VARCHAR(20), Student_LastName VARCHAR(20), Student_PhoneNumber VA
RCHAR(20), Student_Email_ID VARCHAR(40));

To verify that the unique constraint is applied to the table's column and the student
table is created successfully, we will execute the following query:

mysql> DESC student;

Syntax to apply the UNIQUE constraint on more than one column:


CREATE TABLE TableName (ColumnName1 datatype, ColumnName2 datatype,
…., ColumnNameN datatype, UNIQUE (ColumnName1, ColumnName 2));

Example:

Create a student table and apply a UNIQUE constraint on more than one table's
column while creating a table.

mysql> CREATE TABLE student(StudentID INT, Student_FirstName VARCHA


R(20), Student_LastName VARCHAR(20), Student_PhoneNumber VARCHAR(2
0), Student_Email_ID VARCHAR(40), UNIQUE(StudentID, Student_PhoneNumb
er));

To verify that the unique constraint is applied to more than one table's column and
the student table is created successfully, we will execute the following query:

mysql> DESC student;

Syntax to apply the UNIQUE constraint on an existing table's column:

ALTER TABLE TableName ADD UNIQUE (ColumnName);

Example:
Consider we have an existing table student, without any constraints applied to it.
Later, we decided to apply a UNIQUE constraint to one of the table's column. Then
we will execute the following query:

mysql> ALTER TABLE student ADD UNIQUE (StudentID);

To verify that the unique constraint is applied to the table's column and the student
table is created successfully, we will execute the following query:

mysql> DESC student;

3. PRIMARY KEY
o PRIMARY KEY Constraint is a combination of NOT NULL and Unique
constraints.
o NOT NULL constraint and a UNIQUE constraint together forms a
PRIMARY constraint.
o The column to which we have applied the primary constraint will always
contain a unique value and will not allow null values.

Syntax of primary key constraint during table creation:


1. CREATE TABLE TableName (ColumnName1 datatype PRIMARY KEY, Colum
nName2 datatype,…., ColumnNameN datatype);

Example:

Create a student table and apply the PRIMARY KEY constraint while creating a
table.

mysql> CREATE TABLE student(StudentID INT PRIMARY KEY, Student_Firs


tName VARCHAR(20), Student_LastName VARCHAR(20), Student_PhoneNumb
er VARCHAR(20), Student_Email_ID VARCHAR(40));

To verify that the primary key constraint is applied to the table's column and the
student table is created successfully, we will execute the following query:

mysql> DESC student;

Syntax to apply the primary key constraint on an existing table's column:

ALTER TABLE TableName ADD PRIMARY KEY (ColumnName);

Example:
Consider we have an existing table student, without any constraints applied to it.
Later, we decided to apply the PRIMARY KEY constraint to the table's column.
Then we will execute the following query:

mysql> ALTER TABLE student ADD PRIMARY KEY (StudentID);

To verify that the primary key constraint is applied to the student table's column, we
will execute the following query:

mysql> DESC student;

4. FOREIGN KEY
o A foreign key is used for referential integrity.
o When we have two tables, and one table takes reference from another table,
i.e., the same column is present in both the tables and that column acts as a
primary key in one table. That particular column will act as a foreign key in
another table.

Syntax to apply a foreign key constraint during table creation:

CREATE TABLE tablename(ColumnName1 Datatype(SIZE) PRIMARY KEY,


ColumnNameN Datatype(SIZE), FOREIGN KEY( ColumnName ) REFERENCE
S PARENT_TABLE_NAME(Primary_Key_ColumnName));

Example:
Create an employee table and apply the FOREIGN KEY constraint while creating a
table.

To create a foreign key on any table, first, we need to create a primary key on a
table.

mysql> CREATE TABLE employee (Emp_ID INT NOT NULL PRIMARY KE


Y, Emp_Name VARCHAR (40), Emp_Salary VARCHAR (40));

To verify that the primary key constraint is applied to the employee table's column,
we will execute the following query:

mysql> DESC employee;

Now, we will write a query to apply a foreign key on the department table referring
to the primary key of the employee table, i.e., Emp_ID.

mysql> CREATE TABLE department(Dept_ID INT NOT NULL PRIMARY KE


Y, Dept_Name VARCHAR(40), Emp_ID INT NOT NULL, FOREIGN KEY(Em
p_ID) REFERENCES employee(Emp_ID));
To verify that the foreign key constraint is applied to the department table's column,
we will execute the following query:

mysql> DESC department;

Syntax to apply the foreign key constraint with constraint name:

CREATE TABLE tablename(ColumnName1 Datatype PRIMARY KEY, Column


NameN Datatype(SIZE), CONSTRAINT ConstraintName FOREIGN KEY( Colu
mnName ) REFERENCES PARENT_TABLE_NAME(Primary_Key_ColumnNam
e));

Example:

Create an employee table and apply the FOREIGN KEY constraint with a constraint
name while creating a table.

To create a foreign key on any table, first, we need to create a primary key on a
table.

mysql> CREATE TABLE employee (Emp_ID INT NOT NULL PRIMARY KE


Y, Emp_Name VARCHAR (40), Emp_Salary VARCHAR (40));

To verify that the primary key constraint is applied to the student table's column, we
will execute the following query:

mysql> DESC employee;


Now, we will write a query to apply a foreign key with a constraint name on the
department table referring to the primary key of the employee table, i.e., Emp_ID.

mysql> CREATE TABLE department(Dept_ID INT NOT NULL PRIMARY KE


Y, Dept_Name VARCHAR(40), Emp_ID INT NOT NULL, CONSTRAINT emp_
id_fk FOREIGN KEY(Emp_ID) REFERENCES employee(Emp_ID));

To verify that the foreign key constraint is applied to the department table's column,
we will execute the following query:

mysql> DESC department;

Syntax to apply the foreign key constraint on an existing table's column:


ALTER TABLE Parent_TableName ADD FOREIGN KEY (ColumnName) REF
ERENCES Child_TableName (ColumnName);

Example:

Consider we have an existing table employee and department. Later, we decided to


apply a FOREIGN KEY constraint to the department table's column. Then we will
execute the following query:

mysql> DESC employee;

mysql> ALTER TABLE department ADD FOREIGN KEY (Emp_ID) REFERE


NCES employee (Emp_ID);

To verify that the foreign key constraint is applied to the department table's column,
we will execute the following query:

mysql> DESC department;


5. CHECK
o Whenever a check constraint is applied to the table's column, and the user
wants to insert the value in it, then the value will first be checked for certain
conditions before inserting the value into that column.
o For example: if we have an age column in a table, then the user will insert
any value of his choice. The user will also enter even a negative value or any
other invalid value. But, if the user has applied check constraint on the age
column with the condition age greater than 18. Then in such cases, even if a
user tries to insert an invalid value such as zero or any other value less than
18, then the age column will not accept that value and will not allow the user
to insert it due to the application of check constraint on the age column.

Syntax to apply check constraint on a single column:

CREATE TABLE TableName (ColumnName1 datatype CHECK (ColumnName1


Condition), ColumnName2 datatype,…., ColumnNameN datatype);

Example:

Create a student table and apply CHECK constraint to check for the age less than or
equal to 15 while creating a table.

1. mysql> CREATE TABLE student(StudentID INT, Student_FirstName VARCHA


R(20), Student_LastName VARCHAR(20), Student_PhoneNumber VARCHAR(2
0), Student_Email_ID VARCHAR(40), Age INT CHECK( Age <= 15));
To verify that the check constraint is applied to the student table's column, we will
execute the following query:

mysql> DESC student;

Syntax to apply check constraint on multiple columns:

CREATE TABLE TableName (ColumnName1 datatype, ColumnName2 datatype


CHECK (ColumnName1 Condition AND ColumnName2 Condition),…., ColumnN
ameN datatype);

Example:

Create a student table and apply CHECK constraint to check for the age less than or
equal to 15 and a percentage greater than 85 while creating a table.

mysql> CREATE TABLE student(StudentID INT, Student_FirstName VARCHA


R(20), Student_LastName VARCHAR(20), Student_PhoneNumber VARCHAR(2
0), Student_Email_ID VARCHAR(40), Age INT, Percentage INT, CHECK( Age
<= 15 AND Percentage > 85));

To verify that the check constraint is applied to the age and percentage column, we
will execute the following query:

mysql> DESC student;


Syntax to apply check constraint on an existing table's column:

ALTER TABLE TableName ADD CHECK (ColumnName Condition);

Example:

Consider we have an existing table student. Later, we decided to apply the CHECK
constraint on the student table's column. Then we will execute the following query:

mysql> ALTER TABLE student ADD CHECK ( Age <=15 );

To verify that the check constraint is applied to the student table's column, we will
execute the following query:

mysql> DESC student;


6. DEFAULT

Whenever a default constraint is applied to the table's column, and the user has not
specified the value to be inserted in it, then the default value which was specified
while applying the default constraint will be inserted into that particular column.

Syntax to apply default constraint during table creation:

CREATE TABLE TableName (ColumnName1 datatype DEFAULT Value, Colum


nName2 datatype,…., ColumnNameN datatype);

Example:

Create a student table and apply the default constraint while creating a table.

mysql> CREATE TABLE student(StudentID INT, Student_FirstName VARCHA


R(20), Student_LastName VARCHAR(20), Student_PhoneNumber VARCHAR(2
0), Student_Email_ID VARCHAR(40) DEFAULT "anuja.k8@gmail.com");

To verify that the default constraint is applied to the student table's column, we will
execute the following query:

mysql> DESC student;


Syntax to apply default constraint on an existing table's column:

ALTER TABLE TableName ALTER ColumnName SET DEFAULT Value;

Example:

Consider we have an existing table student. Later, we decided to apply the


DEFAULT constraint on the student table's column. Then we will execute the
following query:

mysql> ALTER TABLE student ALTER Student_Email_ID SET DEFAULT "an


uja.k8@gmail.com";

To verify that the default constraint is applied to the student table's column, we will
execute the following query:

mysql> DESC student;


7. CREATE INDEX

CREATE INDEX constraint is used to create an index on the table. Indexes are not
visible to the user, but they help the user to speed up the searching speed or retrieval
of data from the database.

Syntax to create an index on single column:

CREATE INDEX IndexName ON TableName (ColumnName 1);

Example:

Create an index on the student table and apply the default constraint while creating a
table.

mysql> CREATE INDEX idx_StudentID ON student (StudentID);

To verify that the create index constraint is applied to the student table's column, we
will execute the following query:

mysql> DESC student;


Syntax to create an index on multiple columns:

CREATE INDEX IndexName ON TableName (ColumnName 1, ColumnName 2,


ColumnName N);

Example:

mysql> CREATE INDEX idx_Student ON student (StudentID, Student_PhoneNu


mber);

To verify that the create index constraint is applied to the student table's column, we
will execute the following query:

mysql> DESC student;


Syntax to create an index on an existing table:

ALTER TABLE TableName ADD INDEX (ColumnName);

Consider we have an existing table student. Later, we decided to apply the


DEFAULT constraint on the student table's column. Then we will execute the
following query:

mysql> ALTER TABLE student ADD INDEX (StudentID);

To verify that the create index constraint is applied to the student table's column, we
will execute the following query:

mysql> DESC student;

7. BASIC SQL COMMANDS


o SQL commands are instructions. It is used to communicate with the
database. It is also used to perform specific tasks, functions, and queries of
data.
o SQL can perform various tasks like create a table, add data to tables, drop
the table, modify the table, set permission for users.

Types of SQL Commands

There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.

1. Data Definition Language (DDL)


o DDL changes the structure of the table like creating a table, deleting a table,
altering a table, etc.
o All the command of DDL are auto-committed that means it permanently
save all the changes in the database.

Here are some commands that come under DDL:

o CREATE
o ALTER
o DROP
o TRUNCATE

a. CREATE It is used to create a new table in the database.


Syntax:

CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,....]);

Example:

CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHA


R2(100), DOB DATE);

b. DROP: It is used to delete both the structure and record stored in the table.

Syntax

DROP TABLE table_name;

Example

DROP TABLE EMPLOYEE;

c. ALTER: It is used to alter the structure of the database. This change could be either
to modify the characteristics of an existing attribute or probably to add a new attribute.

Syntax:

To add a new column in the table

ALTER TABLE table_name ADD column_name COLUMN-definition;

To modify existing column in the table:

ALTER TABLE table_name MODIFY(column_definitions....);

EXAMPLE

ALTER TABLE STU_DETAILS ADD(ADDRESS VARCHAR2(20));


ALTER TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));

d. TRUNCATE: It is used to delete all the rows from the table and free the space
containing the table.

Syntax:
TRUNCATE TABLE table_name;

Example:

TRUNCATE TABLE EMPLOYEE;

2. Data Manipulation Language


o DML commands are used to modify the database. It is responsible for all
form of changes in the database.
o The command of DML is not auto-committed that means it can't
permanently save all the changes in the database. They can be rollback.

Here are some commands that come under DML:

o INSERT
o UPDATE
o DELETE

a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the
row of a table.

Syntax:

INSERT INTO TABLE_NAME


(col1, col2, col3,.... col N)
VALUES (value1, value2, value3, .... valueN);

Or

INSERT INTO TABLE_NAME


VALUES (value1, value2, value3, .... valueN);

For example:

INSERT INTO javatpoint (Author, Subject) VALUES ("Sonoo", "DBMS");

b. UPDATE: This command is used to update or modify the value of a column in


the table.
Syntax:

UPDATE table_name SET [column_name1= value1,...column_nameN = val


ueN] [WHERE CONDITION]

For example:

UPDATE students
SET User_Name = 'Sonoo'
WHERE Student_Id = '3'

c. DELETE: It is used to remove one or more row from a table.

Syntax:

DELETE FROM table_name [WHERE condition];

For example:

DELETE FROM javatpoint


WHERE Author="Sonoo";

3. Data Control Language

DCL commands are used to grant and take back authority from any database user.

Here are some commands that come under DCL:

o Grant
o Revoke

a. Grant: It is used to give user access privileges to a database.

Example

GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTH


ER_USER;

b. Revoke: It is used to take back permissions from the user.

Example
REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2;

4. Transaction Control Language

TCL commands can only use with DML commands like INSERT, DELETE and
UPDATE only.

These operations are automatically committed in the database that's why they cannot
be used while creating tables or dropping them.

Here are some commands that come under TCL:

o COMMIT
o ROLLBACK
o SAVEPOINT

a. Commit: Commit command is used to save all the transactions to the database.

Syntax:

COMMIT;

Example:

DELETE FROM CUSTOMERS


WHERE AGE = 25;
COMMIT;

b. Rollback: Rollback command is used to undo transactions that have not already
been saved to the database.

Syntax:

ROLLBACK;

Example:

DELETE FROM CUSTOMERS


WHERE AGE = 25;
ROLLBACK;
DELETE FROM CUSTOMERS

WHERE AGE = 25;


ROLLBACK;

c. SAVEPOINT: It is used to roll the transaction back to a certain point without


rolling back the entire transaction.

Syntax:

SAVEPOINT SAVEPOINT_NAME;

5. Data Query Language

DQL is used to fetch the data from the database.

It uses only one command:

o SELECT

a. SELECT: This is the same as the projection operation of relational algebra. It is


used to select the attribute based on the condition described by WHERE clause.

Syntax:

SELECT expressions
FROM TABLES
WHERE conditions;

For example:

SELECT emp_name
FROM employee
WHERE age > 20;
SELECT-FROM-WHERE : Example queries on company relational schema
Query 1: Retrieve the name and address of all employees who work for the
'Research' department.
Q1: SELECT name, ADDRESS
FROM EMPLOYEE, DEPARTMENT
WHERE DNUMBER=DNO AND DNAME='Research'

• Similar to a SELECT-PROJECT-JOIN sequence of relational algebra


operations
(DNAME='Research') is a selection condition (corresponds to a
SELECT operation in relational algebra)
(DNUMBER=DNO) is a join condition (corresponds to a JOIN
operation in relational algebra)

Query 2: For every project located in ‘blore', list the project number, the controlling
department number, and the department manager's name, address, and birthdate.
SELECT PNUMBER, DNUM, LNAME, BDATE, ADDRESS
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE DNUM=DNUMBER AND MGRSSN=SSN AND
PLOCATION=‘blore’ ;

In Q2, there are two join conditions and one select condition
• The join condition DNUM=DNUMBER relates a project to its
controlling department.
• The join condition MGRSSN=SSN relates the department to the
employee who manages that department.
• PLOCATION=‘blore' is selection condition for Project table

Ambiguous attribute names , Aliasing and Tuple Variables


• A query that refers to two or more attributes with the same name must
qualify the attribute name with the relation name by prefixing the relation
name to the attribute name by using dot(.) notation to prevent ambiguity.
Example: Suppose that NAME and DNO attributes of EMPLOYEE table
were called NAME and DNUMBER, and DNAME attribute of the DEPARTMENT
was also called NAME. then Query1 must be restated as follows.

SELECT NAME, EMPLOYEE.NAME, ADDRESS


FROM EMPLOYEE, DEPARTMENT
WHERE DEPARTMENT.NAME='Research' AND
EMPLOYEE.DNUMBER=DEPARTMENT. DNUMBER

SQL - Alias Syntax


You can rename a table or a column temporarily by giving another name known
as Alias. The use of table aliases is to rename a table in a specific SQL statement.
The renaming is a temporary change and the actual table name does not change in
the database. The column aliases are used to rename a table's columns for the
purpose of a particular SQL query.
Syntax
The basic syntax of a table alias is as follows.
SELECT column1, column2.... FROM table_name AS alias_name WHERE
[condition];
The basic syntax of a column alias is as follows.
SELECT column_name AS alias_name FROM table_name WHERE
[condition];

Example
Consider the following two tables.
Table 1 − CUSTOMERS Table is as follows. Table 2 − ORDERS Table is as follows.

table alias.
SQL> SELECT C.ID, C.NAME, C.AGE, O.AMOUNT FROM CUSTOMERS AS
C, ORDERS AS O WHERE C.ID = O.CUSTOMER_ID;

Result.
Column alias.
SQL> SELECT ID AS CUSTOMER_ID, NAME AS CUSTOMER_NAME FROM
CUSTOMERS WHERE SALARY IS NOT NULL;
Result.

How to Solve the “Ambiguous Name Column” Error in SQL


At times you may want to join two tables in SQL and there are in the tables,
columns with the same name.
In this case, if you join the two tables and run the query without
differentiating the names of the columns that are the same, the error “Ambiguous
name column”
For instance, you want to join two tables named TABLE1 and TABLE2.
TABLE1 contains these columns — EmployeeID, Name, Salary. TABLE2 has these
columns — EmployeeID, Name, Age.
create the tables.
CREATE TABLE TABLE1 (EmployeeID INT, Name VARCHAR(20), Salary
INT)
CREATE TABLE TABLE2 (EmployeeID INT, Name VARCHAR(20), Age
INT)

Note that the two tables have a “Name” column in common apart from the
EmployeeID — which is always a number.
SELECT [Name], [Salary], [Name], [Age]FROM TABLE1 A INNER JOIN
TABLE2 B ON A.EmployeeID = B.EmployeeID
If you run the above query, you will get this error — “Ambiguous name
column”.
This means two columns have the same column name — that is the “Name”
column. The SQL Machine is confused as to which “Name” out of the two tables
you are referring to. It is ambiguous — not clear.
To clarify this, add the alias of either or both TABLE1 or TABLE2 to the columns
having the same name. You will notice above, the alias of TABLE1 is A while that
of TABLE2 is B.
So, let’s fix the bug.
SELECT A.[Name], [Salary], B.[Name], [Age]FROM TABLE1 A INNER JOIN
TABLE2 B ON A.EmployeeID = B.EmployeeID
Run the query. No error!

 Ambiguity also arises if some queries need to refer to the same relation twice
• In this case, aliases are given to the relation name
1. Query 8: For each employee, retrieve the employee's name, and the
name of his or her immediate supervisor.
Q8: SELECT E.FNAME, E.LNAME, S.FNAME,
S.LNAME
FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.SUPERSSN=S.SSN
2. In Q8, the alternate relation names E and S are called aliases or
tuple variables for the EMPLOYEE relation
3. We can think of E and S as two different copies of EMPLOYEE; E
represents employees in role of supervisees and S represents
employees in role of supervisors
4. Use of AS is optional. E.g. : Use EMPLOYEE E, instead of
EMPLOYEE AS E,
Unspecified WHERE-clause and use of Asteric (*)

• A missing WHERE-clause indicates no condition; hence, all tuples of the


relations in the FROM-clause are selected.

• This is equivalent to the condition WHERE TRUE

Query 9 and Query 10: Retrieve the SSN values for all employees.

Q9:SELECT SSN
FROM EMPLOYEE

• If more than one relation is specified in the FROM-clause and there is no


join condition, then the CARTESIAN PRODUCT of tuples is selected.
Q10: SELECT SSN, DNAME

FROM EMPLOYEE, DEPARTMENT;

• An Asterisk (*) used in SELECT Clause to select all the attributes.


Q1C : SELECT *
FROM EMPLOYEE
WHERE DNO=5;
Q1D : SELECT *
FROM EMPLOYEE, DEPARTMENT
WHERE DNAME=‘Research’ AND DNO=DNUMBER;
Q10A : SELECT *
FROM EMPLOYEE, DEPARTMENT;

Q10A specifies CROSS PRODUCT of EMPLOYEE and DEPARTMENT

The keyword DISTINCT


The SELECT DISTINCT statement is used to return only distinct (different)
values.
Inside a table, a column often contains many duplicate values; and
sometimes you only want to list the different (distinct) values.
SELECT DISTINCT Syntax
SELECT DISTINCT column1, column2, ...
FROM table_name;

SELECT DISTINCT Examples


The following SQL statement selects only the DISTINCT values from the
"Country" column in the "Customers" table:
Example
SELECT DISTINCT Country FROM Customers;
The following SQL statement lists the number of different (distinct) customer
countries:
Example
SELECT COUNT(DISTINCT Country) FROM Customers;

• SQL does not treat a relation as a set; duplicate tuples can appear
• To eliminate duplicate tuples in a query result, the keyword DISTINCT
is used.
Query 11 : Retrieve the salary of every employee (Q11) and all distinct salary
values (Q11A)
Q11 : SELECT SALARY FROM EMPLOYEE;

Q11A : SELECT DISTINCT SALARY FROM EMPLOYEE;

Set theoretic Operations

• For union , the keyword is UNION


• For set difference the keyword is EXCEPT
• For set intersection the keyword is INTERSECT
• The relations resulting from these operations are set of tuples ;
duplicates tuples are eliminated from the result.
• The relations must be type compatible relations

The SQL Set operation is used to combine the two or more SQL SELECT
statements.
Types of Set Operation
1. Union
2. UnionAll
3. Intersect
4. Minus
1. Union
• The SQL Union operation is used to combine the result of two or more SQL
SELECT queries.
• In the union operation, all the number of datatype and columns must be same
in both the tables on which UNION operation is being applied.
• The union operation eliminates the duplicate rows from its resultset.
Syntax(UNION)
SELECT column_name FROM table1
UNION
SELECT column_name FROM table2;
The First table

SELECT * FROM First


UNION
SELECT * FROM Second;
Result:

2. Union All
Union All operation is equal to the Union operation. It returns the set without
removing duplication and sorting the data.
Syntax:
SELECT column_name FROM table1
UNION ALL
SELECT column_name FROM table2;
The First table

SELECT * FROM First


UNION ALL
SELECT * FROM Second;

3. INTERSECT
• It is used to combine two SELECT statements. The Intersect operation
returns the common rows from both the SELECT statements.
• In the Intersect operation, the number of datatype and columns must be the
same.
• It has no duplicates and it arranges the data in ascending order by default.
• Syntax

SELECT column_name FROM table1

INTERSECT

SELECT column_name FROM table2;


The First table

Result

4. Minus
It combines the result of two SELECT statements. Minus operator is used to
display the rows which are present in the first query but absent in the second query.
It has no duplicates and data arranged in ascending order by default.
Syntax
SELECT column_name FROM table1
MINUS
SELECT column_name FROM table2;

The First table

SELECT * FROM First


MINUS
SELECT * FROM Second;
RESULT:

EXCEPT:
1. In SQL, EXCEPT returns those tuples that are returned by the first SELECT
operation, and not returned by the second SELECT operation.
2. This is the same as using a subtract operator in relational algebra.
Example:
Say we have two relations, Students and TA (Teaching Assistant). We want to
return all those students who are not teaching assistants. The query can be
formulated as:

SELECT Name
FROM Students
EXCEPT
SELECT NAME
FROM TA;
Output:
1. Rohan
2. Mansi
3. Megha

Difference between EXCEPT and NOT IN Clause


EXCEPT automatically removes all duplicates in the final result, whereas NOT IN
retains duplicate tuples. It is also important to note that EXCEPT is not supported by
MySQL.

Example for UNION:


QUERY 4 : Make a list of all project numbers for projects that involve an employee
whose last name is 'Smith', either as a worker or as a manager of the department that
controls the project.
Q4: (SELECT DISTINCT PNUMBER FROM PROJECT, DEPARTMENT,
EMPLOYEE WHERE DNUM=DNUMBER AND MGRSSN=SSN
AND LNAME='Smith’)
UNION
(SELECT DISTINCT PNUMBER
FROM PROJECT, WORKS_ON, EMPLOYEE
WHERE PNUMBER=PNO AND ESSN=SSN AND LNAME='Smith');

3 multi set operations:


1. UNION ALL
2. EXCEPT ALL
3. INTERSECT ALL
4. Their results are multi set (duplicates are not removed ) Basically,
each tuple-whether it is a duplicate or not -is considered as a different
tuple when applying these operations.

3 multi set operations:


1. UNION ALL
The UNION ALL command combines the result set of two or more SELECT
statements (allows duplicate values).
When comparing UNION vs. UNION ALL, there is one major difference:
1. UNION only returns unique
2. UNION ALL returns all records, including duplicates.
UNION ALL Syntax
UNION ALL combines the results of two or
more SELECT statements, showing all values, including
duplicates if they exist.
SELECT column_1, column_2
FROM table_1
[WHERE condition]
UNION ALL
SELECT column_1, column_2
FROM table_2
[WHERE condition]

EXCEPT
1. In SQL, EXCEPT returns those tuples that are returned by the first SELECT
operation, and not returned by the second SELECT operation.
2. This is the same as using a subtract operator in relational algebra.
Except ALL
To retain duplicates, we must explicitly write EXCEPTALL instead of EXCEPT.
INTERSECT
Intersect returns the common rows of two or more table. Intersect removes
the duplicate after combining.
INTERSECT ALL
Intersect all does not remove duplicate.
Note :
Both INTERSECT and INTERSECT ALL returns the common rows of two
different SQLs. They differ in the way they handle duplicates.
Substring Pattern Matching and Arithmetic Operators
• The LIKE comparison operator can be used for string pattern
matching.
• Partial strings are specified using two reserved characters:
1. % replaces an arbitrary number of zero or more characters
2. the underscore (_) replaces a single character.
QUERY 12 : Retrieve all employees whose address is in Houston.

Q12: SELECT FNAME, LNAME FROM EMPLOYEE

WHERE ADDRESS LIKE '%Houston%';

QUERY 12A : Find all employees who were born during the 1950s.

Q12A: SELECT FROM WHERE

FNAME, LNAME EMPLOYEE

BDATE LIKE ‘195 _ _ _ _ _ _ _';

[YYYY-MM-DD]

LIKE Syntax
SELECT column1, column2, ...
FROM table_name
WHERE columnN LIKE pattern;
Here are some examples showing different LIKE operators with '%' and '_'
wildcards:

Examples of LIKE operators


Consider the following table on which we will apply various operations of the LIKE
operator.

Q1. Select all students starting with “a”


SELECT * FROM students
WHERE studentname LIKE 'a%';

Q2. Select all students with a studentname ending with “i”


SELECT * FROM students
WHERE studentname LIKE '%i’;

Q3. Select all students with a studentname that have “li” in any position
SELECT * FROM students
WHERE studentname LIKE '%li’;

Q4. Select all students with a studentname that have “o” in the second position:

SELECT * FROM students


WHERE studentname LIKE '_o%’;

Q5. Select all students with a studentname that start with “a” and are at least 5
characters in length

SELECT * FROM students


WHERE studentname LIKE '%a____%’;

Q6. Select all students with a studentname that start with “s” and end with “y”
SELECT * FROM students
WHERE studentname LIKE 's%y';
 If an underscore or % is needed as a literal character in the string, the character
should be preceded by an escape character, which is specified after the string
using the keyword ESCAPE.
For example:
'AB\_CD\%EF' ESCAPE '\' represents the literal string ‘AB_CD%EF',
because \ is specified as the escape character
• If an apostrophe (') is needed, it is represented as two consecutive
apostrophes (") so that it will not be interpreted as ending the string.

The standard arithmetic operators for addition (+), subtraction (-), multiplication (*),
and division (/) can be applied to numeric values or attributes with numeric
domains.
Example :
QUERY 13 : Show the resulting salaries if every employee working on the
'ProductX' project is given a 10 percent raise.
Q13: SELECT FNAME, LNAME, 1.1 * SALARY AS INCREASED_SAL
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE SSN=ESSN AND PNO=PNUMBER AND
PNAME='ProductX';

Another comparison operator that can be used for convenience is BETWEEN,


which is illustrated in Query 14.
QUERY 14 : Retrieve all employees in department 5 whose salary is between
$30,000 and $40,000.
Q14: SELECT * FROM EMPLOYEE
WHERE (SALARY BETWEEN 30000 AND 40000) AND DNO =5;
Note : The condition (SALARY BETWEEN 30000 AND 40000) in Q14 is
equivalent to the condition ((SALARY >= 30000) AND (SALARY <= 40000)

Ordering of Query Results


The SQL ORDER BY Keyword
1. The ORDER BY keyword is used to sort the result-set in ascending or
descending order.
2. The ORDER BY keyword sorts the records in ascending order by default. To
sort the records in descending order, use the DESC keyword.
ORDER BY Syntax
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;

ORDER BY Example
The following SQL statement selects all customers from the "Customers" table,
sorted by the "Country" column:
Example
SELECT * FROM Customers
ORDER BY Country;
ORDER BY DESC Example
The following SQL statement selects all customers from the "Customers" table,
sorted DESCENDING by the "Country" column:
SELECT * FROM Customers
ORDER BY Country DESC;

ORDER BY Several Columns Example


The following SQL statement selects all customers from the "Customers" table,
sorted by the "Country" and the "CustomerName" column. This means that it orders
by Country, but if some rows have the same Country, it orders them by
CustomerName:
Example
SELECT * FROM Customers
ORDER BY Country, CustomerName;
ORDER BY Several Columns Example 2
The following SQL statement selects all customers from the "Customers" table,
sorted ascending by the "Country" and descending by the "CustomerName" column:
Example
SELECT * FROM Customers
ORDER BY Country ASC, CustomerName DESC;

• SQL allows the user to order the tuples in the result of a query by the values
of one or more attributes, using the ORDER BY clause.
QUERY 15 : Retrieve a list of employees and the projects they are working on,
ordered by department and, within each department, ordered alphabetically by last
name, first name.
Q15: SELECT DNAME, LNAME, FNAME, PNAME
FROM DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECT
WHERE DNUMBER=DNO AND SSN=ESSN AND PNO=PNUMBER
ORDER BY DNAME, LNAME, FNAME;
Note : 1. The default order is in ascending order of values.
2. We can specify the keyword DESC if we want to see the result in a
descending order of values.

8. MORE SQL: COMPLEX QUERIES, TRIGGERS, VIEWS, AND


SCHEMA MODIFICATION

This chapter describes more advanced features of the SQL language standard for
relational databases. We start in Section 5.1 by presenting more complex features of
SQL retrieval queries, such as nested queries, joined tables, outer joins, aggregate
functions, and grouping. In Section 5.2, we describe the CREATE ASSERTION
statement, which allows the specification of more general constraints on the
database. We also introduce the concept of triggers and the CREATE TRIGGER
statement, which will be presented in more detail in Section 26.1 when we present
the principles of active databases. Then, in Section 5.3, we describe the SQL facility
for defining views on the database. Views are also called virtual or derived tables
because they present the user with what appear to be tables; however, the
information in those tables is derived from previously defined tables. Section 5.4
introduces the SQL ALTER TABLE statement, which is used for modifying the
database tables and constraints. Section 5.5 is the chapter summary.
This chapter is a continuation of Chapter 4. The instructor may skip parts of this
chapter if a less detailed introduction to SQL is intended.

More Complex SQL Retrieval Queries


In Section 4.3, we described some basic types of retrieval queries in SQL. Because of
the generality and expressive power of the language, there are many additional fea-
tures that allow users to specify more complex retrievals from the database. We dis-
cuss several of these features in this section.
Comparisons Involving NULLand Three Valued Logic
SQL has various rules for dealing with NULL values. Recall from Section 3.1.2
that NULL is used to represent a missing value, but that it usually has one of three
different interpretations—value unknown (exists but is not known), value not available
(exists but is purposely withheld), or value not applicable (the attribute is undefined for
this tuple). Consider the following examples to illustrate each of the meanings of
NULL.
1. Unknown value. A person’s date of birth is not known, so it is represented by
NULL in the database.
2. Unavailable or withheld value. A person has a home phone but does not want it
to be listed, so it is withheld and represented as NULL in the database.
3. Not applicable attribute. An attribute LastCollegeDegree would be NULL for a
person who has no college degrees because it does not apply to that person.
It is often not possible to determine which of the meanings is intended; for example, a
NULL for the home phone of a person can have any of the three meanings. Hence,
SQL does not distinguish between the different meanings of NULL.
In general, each individual NULL value is considered to be different from every other
NULL value in the various database records. When a NULL is involved in a
compari- son operation, the result is considered to be UNKNOWN (it may be
TRUE or it may be FALSE). Hence, SQL uses a three-valued logic with values
TRUE, FALSE, and UNKNOWN instead of the standard two-valued (Boolean)
logic with values TRUE or FALSE. It is therefore necessary to define the results (or
truth values) of three-valued logical expressions when the logical connectives AND,
OR, and NOT are used. Table 5.1 shows the resulting values.

Table 5.1 Logical Connectives in Three-Valued Logic

(a) AND TRUE FALSE UNKNOWN


TRUE TRUE FALSE UNKNOWN
FALSE FALSE FALSE FALSE
UNKNOWN UNKNOWN FALSE UNKNOWN

(b) OR TRUE FALSE UNKNOWN


TRUE TRUE TRUE TRUE
FALSE TRUE FALSE UNKNOWN
UNKNOWN TRUE UNKNOW UNKNOWN
N

(c) NOT
TRUE FALSE
FALSE TRUE
UNKNOWN UNKNOWN

In Tables 5.1(a) and 5.1(b), the rows and columns represent the values of the results of
comparison conditions, which would typically appear in the WHERE clause of an
SQL query. Each expression result would have a value of TRUE, FALSE, or
UNKNOWN. The result of combining the two values using the AND logical
connec- tive is shown by the entries in Table 5.1(a). Table 5.1(b) shows the result of
using the OR logical connective. For example, the result of (FALSE AND
UNKNOWN) is FALSE, whereas the result of (FALSE OR UNKNOWN) is
UNKNOWN. Table 5.1(c) shows the result of the NOT logical operation. Notice
that in standard Boolean logic, only TRUE or FALSE values are permitted; there is
no UNKNOWN value.
In select-project-join queries, the general rule is that only those combinations of
tuples that evaluate the logical expression in the WHERE clause of the query to
TRUE are selected. Tuple combinations that evaluate to FALSE or UNKNOWN are
not selected. However, there are exceptions to that rule for certain operations, such as
outer joins, as we shall see in Section 5.1.6.
SQL allows queries that check whether an attribute value is NULL. Rather than using
= or <> to compare an attribute value to NULL, SQL uses the comparison operators
IS or IS NOT. This is because SQL considers each NULL value as being distinct
from every other NULL value, so equality comparison is not appropriate. It follows
that when a join condition is specified, tuples with NULL values for the join attributes
are not included in the result (unless it is an OUTER JOIN; see Section 5.1.6).
Query 18 illustrates this.
Query 18. Retrieve the names of all employees who do not have supervisors.

Q18: SELECT Fname, Lname


FROM EMPLOYEE
WHERE Super_ssn IS NULL;

Comparisons Involving NULL and Three-Valued Logic


• SQL allows queries that check whether an attribute value is NULL.
• SQL uses IS or IS NOT rather than using = or <> to compare an attribute
value to NULL. This is because SQL considers each NULL value as being
distinct from every other NULL value, so equality comparison is not
appropriate.
QUERY 18 : Retrieve the names of all employees who do not have supervisors.
Q18: SELECT FNAME, LNAME FROM EMPLOYEE
WHERE SUPERSSN IS NULL;
1. The SQL NULL is the term used to represent a missing value. A NULL
value in a table is a value in a field that appears to be blank.
2. A field with a NULL value is a field with no value. It is very important to
understand that a NULL value is different than a zero value or a field that
contains spaces.
Syntax
The basic syntax of NULL while creating a table.
SQL> CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID) );
);

Example
The NULL value can cause problems when selecting data. However, because
when comparing an unknown value to any other value, the result is always
unknown and not included in the results. You must use the IS NULL or IS NOT
NULL operators to check for a NULL value.

IS NULL operator.
SQL> SELECT ID, NAME, AGE, ADDRESS, SALARY FROM
CUSTOMERS WHERE SALARY IS NULL;

NESTED QUERIES, TUPLES, AND SET/MULTISETCOMPARISONS


Some queries require that existing values in the database be fetched and then used in a
comparison condition. Such queries can be conveniently formulated by using
nested queries, which are complete select-from-where blocks within the WHERE
clause of another query. That other query is called the outer query. Query 4 is for-
mulated in Q4 without a nested query, but it can be rephrased to use nested queries as
shown in Q4A. Q4A introduces the comparison operator IN, which compares a
value v with a set (or multiset) of values V and evaluates to TRUE if v is one of the
elements in V.
The first nested query selects the project numbers of projects that have an employee
with last name ‘Smith’ involved as manager, while the second nested query selects
the project numbers of projects that have an employee with last name ‘Smith’
involved as worker. In the outer query, we use the OR logical connective to retrieve a
PROJECT tuple if the PNUMBER value of that tuple is in the result of either nested
query.
Q4A: SELECT DISTINCT Pnumber

FROM PROJECT

WHERE Pnumber IN

( SELECT Pnumber
FROM PROJECT, DEPARTMENT, EMPLOYEE

WHERE Dnum=Dnumber AND


Mgr_ssn=Ssn AND Lname=‘Smith’)
OR

Pnumber IN
( SELECT Pno
FROM WORKS_ON, EMPLOYEE
WHERE Essn=Ssn AND Lname=‘Smith’ );
If a nested query returns a single attribute and a single tuple, the query result will be a
single (scalar) value. In such cases, it is permissible to use = instead of IN for the
comparison operator. In general, the nested query will return a table (relation),
which is a set or multiset of tuples.
SQL allows the use of tuples of values in comparisons by placing them within
parentheses. To illustrate this, consider the following query:

SELECT DISTINCT Essn


FROM WORKS_ON
WHERE (Pno, Hours) IN (SELECT Pno, Hours
FROM WORKS_ON
WHERE Essn=‘123456789’ );
This query will select the Essns of all employees who work the same (project, hours)
combination on some project that employee ‘John Smith’ (whose Ssn =
‘123456789’) works on. In this example, the IN operator compares the subtuple of
values in parentheses (Pno, Hours) within each tuple in WORKS_ON with the set
of type-compatible tuples produced by the nested query.
In addition to the IN operator, a number of other comparison operators can be used to
compare a single value v (typically an attribute name) to a set or multiset v (typ-
ically a nested query). The = ANY (or = SOME) operator returns TRUE if the
value v is equal to some value in the set V and is hence equivalent to IN. The two
keywords ANY and SOME have the same effect. Other operators that can be
combined with ANY (or SOME) include >, >=, <, <=, and <>. The keyword ALL
can also be com- bined with each of these operators. For example, the comparison
condition (v > ALL V) returns TRUE if the value v is greater than all the values in the
set (or multiset) V. An example is the following query, which returns the names of
employees whose salary is greater than the salary of all the employees in
department 5:
SELECT Lname,Fname
FROM EMPLOYEE
WHERE Salary > ALL ( SELECT Salary
FROM EMPLOYEE
WHERE Dno=5 );
Notice that this query can also be specified using the MAX aggregate function (see
Section 5.1.7).
In general, we can have several levels of nested queries. We can once again be faced
with possible ambiguity among attribute names if attributes of the same name exist—
one in a relation in the FROM clause of the outer query, and another in a rela- tion in the
FROM clause of the nested query. The rule is that a reference to an unqualified
attribute refers to the relation declared in the innermost nested query. For example, in
the SELECT clause and WHERE clause of the first nested query of Q4A, a reference
to any unqualified attribute of the PROJECT relation refers to the PROJECT relation
specified in the FROM clause of the nested query. To refer to an attribute of the
PROJECT relation specified in the outer query, we specify and refer to an alias (tuple
variable) for that relation. These rules are similar to scope rules for program variables in
most programming languages that allow nested procedures and functions. To
illustrate the potential ambiguity of attribute names in nested queries, consider Query
16.
Query 16. Retrieve the name of each employee who has a dependent
with the same first name and is the same sex as the employee.

Q16: SELECT E.Fname, E.Lname


FROM EMPLOYEE AS E
WHERE E.Ssn IN ( SELECT Essn
FROM DEPENDENT AS D

WHERE E.Fname=D.Dependent_name
AND E.Sex=D.Sex );
In the nested query of Q16, we must qualify E.Sex because it refers to the Sex attrib- ute
of EMPLOYEE from the outer query, and DEPENDENT also has an attribute called
Sex. If there were any unqualified references to Sex in the nested query, they would
refer to the Sex attribute of DEPENDENT. However, we would not have to qualify
the attributes Fname and Ssn of EMPLOYEE if they appeared in the nested query
because the DEPENDENT relation does not have attributes called Fname and Ssn, so
there is no ambiguity.
It is generally advisable to create tuple variables (aliases) for all the tables referenced in
an SQL query to avoid potential errors and ambiguities, as illustrated in Q16.
1. Subqueries with the SELECT Statement
Subqueries are most frequently used with the SELECT statement.
SELECT column_name [, column_name ]
FROM table1 [, table2 ]
WHERE column_name OPERATOR
(SELECT column_name [, column_name ]
FROM table1 [, table2 ]
[WHERE])

Example
Consider the CUSTOMERS table having the following records

Example
Consider the CUSTOMERS table having the following records

SQL> SELECT * FROM CUSTOMERS WHERE ID IN (SELECT ID FROM


CUSTOMERS WHERE SALARY > 4500) ;

2. Subqueries with the INSERT Statement

Subqueries also can be used with INSERT statements. The INSERT statement
uses the data returned from the subquery to insert into another table. The
selected data in the subquery can be modified with any of the character, date
or number functions.
INSERT INTO table_name [ (column1 [, column2 ]) ]
SELECT [ *|column1 [, column2 ]
FROM table1 [, table2 ]
[ WHERE VALUE OPERATOR ]

Example
Consider a table CUSTOMERS_BKP with similar structure as CUSTOMERS table.
Now to copy the complete CUSTOMERS table into the CUSTOMERS_BKP table, you can
use the following syntax.
SQL> INSERT INTO CUSTOMERS_BKP
SELECT * FROM CUSTOMERS
WHERE ID IN
(SELECT ID FROM CUSTOMERS) ;

3. Subqueries with the UPDATE Statement


The subquery can be used in conjunction with the UPDATE statement. Either single
or multiple columns in a table can be updated when using a subquery with the UPDATE
statement.
The basic syntax is as follows.
UPDATE table
SET column_name = new_value
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]
Example
Assuming, we have CUSTOMERS_BKP table available which is backup of
CUSTOMERS table. The following example updates SALARY by 0.25 times in the
CUSTOMERS table for all the customers whose AGE is greater than or equal to 27.
SQL> UPDATE CUSTOMERS
SET SALARY = SALARY * 0.25
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );
Result:

4. Subqueries with the DELETE Statement


The subquery can be used in conjunction with the DELETE statement like
with any other statements mentioned above.
The basic syntax is as follows.
DELETE FROM TABLE_NAME
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]

Example
Assuming, we have a CUSTOMERS_BKP table available which is a backup
of the CUSTOMERS table. The following example deletes the records from
the CUSTOMERS table for all the customers whose AGE is greater than or
equal to 27.
SQL> DELETE FROM CUSTOMERS
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );

Result:
Example :
Query 4 can be rephrased to use nested queries as shown in Q4A.
(Make a list of all project numbers for projects that involve an employee whose
last name is 'Smith', either as a worker or as a manager of the department that
controls the project.)
Q4A: SELECT DISTINCT PNUMBER FROM PROJECT
WHERE PNUMBER IN
(SELECT PNUMBER FROM PROJECT, DEPARTMENT,
EMPLOYEE
WHERE DNUM=DNUMBER AND MGRSSN=SSN AND
LNAME=‘Smith’)
OR
PNUMBER IN (SELECT PNO FROM WORKS_ON, EMPLOYEE
WHERE ESSN=SSN AND LNAME='Smith');

The IN operator can also compare a tuple of values in parentheses with a set or
multiset of union-compatible tuples.
SELECT DISTINCT ESSN FROM WORKS_ON
WHERE (PNO, HOURS) IN
(SELECT PNO, HOURS FROM WORKS_ON WHERE SSN=‘123456789’);

• This query will select the social security numbers of all employees who work the
same (project, hours) combination on some project that employee 'John Smith'
(whose SSN ='123456789') works on.

• In this example, the IN operator compares the subtuple of values in parentheses


(PNO, HOURS) for each tuple in WORKS_ON with the set of union-compatible
tuples produced by the nested query.

• In addition to the IN operator, ANY , SOME and ALL comparison operators can
be used to compare a single value v (typically an attribute name) to a set or multiset
V (typically a nested query).

• ANY , SOME , ALL can be used with =, >,>=, <, <=, and < >.

Example : Retrieve names of employees whose salary is greater than the salary of all the
employees in department 5:

SELECT LNAME, FNAME FROM EMPLOYEE


WHERE SALARY > ALL (SELECT SALARY FROM EMPLOYEE WHERE DNO=5)

• The = ANY (or = SOME) operator returns TRUE if the value v is equal to some
value in the set V and is hence equivalent to IN.

• In general, we can have several levels of nested queries. We can once again be faced
with possible ambiguity among attribute names if attributes of the same name exist-
one in a relation in the FROM clause of the outer query, and another in a relation in
the FROM clause of the nested query.

• The rule is that a reference to an unqualified attribute refers to the relation declared
in the innermost nested query.

• To refer to an attribute of the PROJECT relation specified in the outer query, we can
specify and refer to an alias (tuple variable) for that relation.

To illustrate the potential ambiguity of attribute names in nested queries, consider Query 16.

QUERY 16 : Retrieve the name of each employee who has a dependent with the same first
name and same sex as the employee.

Q16: SELECT E.FNAME, E.LNAME FROM EMPLOYEE AS E

WHERE E.SSN IN (SELECT ESSN FROM DEPENDENT WHERE


E.FNAME=DEPENDENT_NAME AND E.SEX=SEX);

• In the nested query of Q16, we must qualify E. SEX because it refers to the SEX
attribute of EMPLOYEE from the outer query, and DEPENDENT also has an
attribute called SEX.

• All unqualified references to SEX in the nested query refer to SEX of


DEPENDENT.

• However, we do not have to qualify FNAME and SSN because the DEPENDENT
relation does not have attributes called FNAME and SSN, so there is no ambiguity.

CORRELATED NESTED QUERIES


Whenever a condition in the WHERE clause of a nested query references some attrib- ute
of a relation declared in the outer query, the two queries are said to be correlated. We can
understand a correlated query better by considering that the nested query is evaluated
once for each tuple (or combination of tuples) in the outer query. For exam- ple, we can
think of Q16 as follows: For each EMPLOYEE tuple, evaluate the nested query,
which retrieves the Essn values for all DEPENDENT tuples with the same sex and
name as that EMPLOYEE tuple; if the Ssn value of the EMPLOYEE tuple is in the
result of the nested query, then select that EMPLOYEE tuple.
In general, a query written with nested select-from-where blocks and using the = or IN
comparison operators can always be expressed as a single block query. For exam- ple,
Q16 may be written as in Q16A:
Q16A: SELECT E.Fname, E.Lname
FROM EMPLOYEE AS E, DEPENDENT AS D
WHERE E.Ssn=D.Essn AND E.Sex=D.Sex
AND E.Fname=D.Dependent_name;

• Whenever a condition in the WHERE clause of a nested query references


some attribute of a relation declared in the outer query, the two queries are
said to be correlated.
• We can understand a correlated query better by considering that the nested
query is evaluated once for each tuple (or combination of tuples) in the outer
query
• For example, we can think of Q16 as follows:
For each EMPLOYEE tuple, evaluate the nested query, which retrieves the ESSN
values for all DEPENDENT tuples with the same sex and name as that
EMPLOYEE tuple; if the SSN value of the EMPLOYEE tuple is in the result of
the nested query, then select that EMPLOYEE tuple.

• In general, a query written with nested select-from-where blocks and using


the = or IN comparison operators can always be expressed as a single block
query. For example, Q16 may be written as in Q16A:
• Q16A: SELECT E.FNAME, E.LNAME
FROM EMPLOYEE AS E, DEPENDENT AS D
WHERE E.SSN=D.ESSN AND E.SEX=D.SEX AND
E.FNAME=D.DEPENDENT_NAME;

SQL correlated subquery which is a subquery that uses values from the outer query.
If a sub query depends on outer query or the outer query depends on inner query
Finds employees whose salary is greater than the average salary of all employees:
SELECT employee_id, first_name, last_name, salary
FROM employees
WHERE salary >
(SELECT AVG(salary) FROM employees);
SQL correlated subquery which is a subquery that uses values from the outer
query.
If a sub query depends on outer query or the outer query depends on inner query

Finds employees whose salary is greater than the average salary of all employees:
SELECT employee_id, first_name, last_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Finds employees whose salary is greater than the average salary of all employees:

In this example, the subquery is used in the WHERE clause.

First, you can execute the subquery that returns the average salary of all employees
independently.

SELECT AVG(salary) FROM employees;

Second, the database system needs to evaluate the subquery only once.

Third, the outer query makes use of the result returned from the subquery. The outer query
depends on the subquery for its value. However, the subquery does not depend on the outer
query. Sometimes, we call this subquery is a plain subquery.

SQL correlated subquery which is a subquery that uses values from the outer query.

If a sub query depends on outer query or the outer query depends on inner query

Syntax:
SELECT column1,column2,…..
FROM table_name T1
WHERE condition IN | NOT IN
(SELECT column1, FROM table_name T2
WHERE T1.column = T2.column)

SQL correlated subquery which is a subquery that uses values from the outer
query.
If a sub query depends on outer query or the outer query depends on inner query
SELECT employee_id, first_name from employee e
where department_id in
(select department_id from departments d
where d.department_id = e.department_id);

The EXISTS and UNIQUE Functions in SQL


The EXISTS function in SQL is used to check whether the result of a correlated
nested query is empty (contains no tuples) or not. The result of EXISTS is a Boolean
value TRUE if the nested query result contains at least one tuple, or FALSE if the
nested query result contains no tuples. We illustrate the use of EXISTS—and NOT
EXISTS—with some examples. First, we formulate Query 16 in an alternative form
that uses EXISTS as in Q16B:
Q16B: SELECT E.Fname, E.Lname
FROM EMPLOYEE AS E
WHERE EXISTS ( SELECT *
FROM DEPENDENT AS D
WHERE E.Ssn= D.Essn AND E.Sex=D.Sex
AND E.Fname=D.Dependent_name);

EXISTS and NOT EXISTS are typically used in conjunction with a correlated
nested query. In Q16B, the nested query references the Ssn, Fname, and Sex attributes
of the EMPLOYEE relation from the outer query. We can think of Q16B as follows:
For each EMPLOYEE tuple, evaluate the nested query, which retrieves all
DEPENDENT tuples with the same Essn, Sex, and Dependent_name as the
EMPLOYEE tuple; if at least one tuple EXISTS in the result of the nested query, then
select that EMPLOYEE tuple. In general, EXISTS(Q) returns TRUE if there is at
least one tuple in the result of the nested query Q, and it returns FALSE otherwise.
On the other hand, NOT EXISTS(Q) returns TRUE if there are no tuples in the
result of nested query Q, and it returns FALSE otherwise. Next, we illustrate the use
of NOT EXISTS.
Query 6. Retrieve the names of employees who have no dependents.

Q6: SELECT Fname, Lname


FROM EMPLOYEE
WHERE NOT EXISTS ( SELECT *
FROM DEPENDENT
WHERE Ssn=Essn );
In Q6, the correlated nested query retrieves all DEPENDENT tuples related to a
par- ticular EMPLOYEE tuple. If none exist, the EMPLOYEE tuple is selected
because the WHERE-clause condition will evaluate to TRUE in this case. We can
explain Q6 as follows: For each EMPLOYEE tuple, the correlated nested query
selects all DEPENDENT tuples whose Essn value matches the EMPLOYEE Ssn;
if the result is empty, no dependents are related to the employee, so we select that
EMPLOYEE tuple and retrieve its Fname and Lname.
Query 7. List the names of managers who have at least one dependent.

Q7: SELECT Fname, Lname


FROM EMPLOYEE
WHERE EXISTS ( SELECT *
FROM DEPENDENT
WHERE Ssn=Essn )
AND
EXISTS ( *
SELECT DEPARTMENT
FROM
WHERE Ssn=Mgr_ssn );
One way to write this query is shown in Q7, where we specify two nested correlated
queries; the first selects all DEPENDENT tuples related to an EMPLOYEE, and the
sec- ond selects all DEPARTMENT tuples managed by the EMPLOYEE. If at least one
of the first and at least one of the second exists, we select the EMPLOYEE tuple. Can
you rewrite this query using only a single nested query or no nested queries?
The query Q3: Retrieve the name of each employee who works on all the projects con-
trolled by department number 5 can be written using EXISTS and NOT EXISTS in
SQL systems. We show two ways of specifying this query Q3 in SQL as Q3A and
Q3B. This is an example of certain types of queries that require universal quantification,
as we will discuss in Section 6.6.7. One way to write this query is to use the construct
(S2 EXCEPT S1) as explained next, and checking whether the result is empty. 1 This
option is shown as Q3A.
Q3A: SELECT Fname, Lname

FROM EMPLOYEE

WHERE NOT EXISTS ( ( SELECT Pnumber

FROM PROJECT

WHERE Dnum=5)
EXCEPT ( SELECT Pno
FROM WORKS_ON

WHERE Ssn=Essn) );

In Q3A, the first subquery (which is not correlated with the outer query) selects all
projects controlled by department 5, and the second subquery (which is correlated)
selects all projects that the particular employee being considered works on. If the set
difference of the first subquery result MINUS (EXCEPT) the second subquery result
is empty, it means that the employee works on all the projects and is therefore selected.
The second option is shown as Q3B. Notice that we need two-level nesting in Q3B
and that this formulation is quite a bit more complex than Q3A, which uses NOT
EXISTS and EXCEPT.
Q3B: SELECT Lname, Fname

FROM EMPLOYEE

WHERE NOT EXISTS ( SELECT *

FROM WORKS_ON B

WHERE ( B.Pno IN ( SELECT Pnumber

FROM PROJECT

WHERE Dnum=5 )

AND

NOT EXISTS ( SELECT *


FROM WORKS_ON C

WHERE C.Essn=Ssn
AND C.Pno=B.Pno )));
In Q3B, the outer nested query selects any WORKS_ON (B) tuples whose Pno is of
a project controlled by department 5, if there is not a WORKS_ON (C) tuple with
the same Pno and the same Ssn as that of the EMPLOYEE tuple under
consideration in the outer query. If no such tuple exists, we select the EMPLOYEE
tuple. The form of Q3B matches the following rephrasing of Query 3: Select each
employee such that there does not exist a project controlled by department 5 that the
employee does not work on. It corresponds to the way we will write this query in tuple
relation calculus (see Section 6.6.7).
There is another SQL function, UNIQUE(Q), which returns TRUE if there are no
duplicate tuples in the result of query Q; otherwise, it returns FALSE. This can be
used to test whether the result of a nested query is a set or a multiset.

EXPLICIT SETS AND RENAMING OF ATTRIBUTES IN SQL


We have seen several queries with a nested query in the WHERE clause. It is also pos-
sible to use an explicit set of values in the WHERE clause, rather than a nested
query. Such a set is enclosed in parentheses in SQL.
Query 17. Retrieve the Social Security numbers of all employees who work on project
numbers 1, 2, or 3.

Q17: SELECT DISTINCT Essn


FROM WORKS_ON
WHERE Pno IN (1, 2, 3);
In SQL, it is possible to rename any attribute that appears in the result of a query by
adding the qualifier AS followed by the desired new name. Hence, the AS construct
can be used to alias both attribute and relation names, and it can be used in both the
SELECT and FROM clauses. For example, Q8A shows how query Q8 from
Section
4.3.2 can be slightly changed to retrieve the last name of each employee and his or her
supervisor, while renaming the resulting attribute names as Employee_name and
Supervisor_name. The new names will appear as column headers in the query result.

Q8A: SELECT E.Lname AS Employee_name, S.Lname AS Supervisor_name


FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.Super_ssn=S.Ssn;
JOINED TABLES IN SQL AND OUTER JOINS
The concept of a joined table (or joined relation) was incorporated into SQL to
permit users to specify a table resulting from a join operation in the FROM clause of a
query. This construct may be easier to comprehend than mixing together all the select
and join conditions in the WHERE clause. For example, consider query Q1, which
retrieves the name and address of every employee who works for the ‘Research’
department. It may be easier to specify the join of the EMPLOYEE and
DEPARTMENT relations first, and then to select the desired tuples and attributes.
This can be written in SQL as in Q1A:

Q1A: SELECT Fname, Lname, Address


FROM (EMPLOYEE JOIN DEPARTMENT ON
Dno=Dnumber)
WHERE Dname=‘Research’;
The FROM clause in Q1A contains a single joined table. The attributes of such a table
are all the attributes of the first table, EMPLOYEE, followed by all the attributes of
the second table, DEPARTMENT. The concept of a joined table also allows the user to
specify different types of join, such as NATURAL JOIN and various types of
OUTER JOIN. In a NATURAL JOIN on two relations R and S, no join condition is
specified; an implicit EQUIJOIN condition for each pair of attributes with the same
name from R and S is created. Each such pair of attributes is included only once in
the resulting relation (see Section 6.3.2 and 6.4.4 for more details on the various
types of join operations in relational algebra).
If the names of the join attributes are not the same in the base relations, it is possi- ble
to rename the attributes so that they match, and then to apply NATURAL JOIN. In this
case, the AS construct can be used to rename a relation and all its attributes in the
FROM clause. This is illustrated in Q1B, where the DEPARTMENT relation is
renamed as DEPT and its attributes are renamed as Dname, Dno (to match the name
of the desired join attribute Dno in the EMPLOYEE table), Mssn, and Msdate.
The implied join condition for this NATURAL JOIN is
EMPLOYEE.Dno=DEPT.Dno, because this is the only pair of attributes with the
same name after renaming:
Q1B: SELECT Fname, Lname, Address
FROM (EMPLOYEE NATURAL JOIN
(DEPARTMENT AS DEPT (Dname, Dno, Mssn, Msdate)))

WHERE Dname=‘Research’;
The default type of join in a joined table is called an inner join, where a tuple is
included in the result only if a matching tuple exists in the other relation. For exam- ple,
in query Q8A, only employees who have a supervisor are included in the result; an
EMPLOYEE tuple whose value for Super_ssn is NULL is excluded. If the user
requires that all employees be included, an OUTER JOIN must be used explicitly (see
Section 6.4.4 for the definition of OUTER JOIN). In SQL, this is handled by
explicitly specifying the keyword OUTER JOIN in a joined table, as illustrated in
Q8B:

Q8B: SELECT E.Lname AS Employee_name,


S.Lname AS Supervisor_name
FROM (EMPLOYEE AS E LEFT OUTER JOIN EMPLOYEE AS S
ON E.Super_ssn=S.Ssn);
There are a variety of outer join operations, which we shall discuss in more detail in
Section 6.4.4. In SQL, the options available for specifying joined tables include
INNER JOIN (only pairs of tuples that match the join condition are retrieved, same
as JOIN), LEFT OUTER JOIN (every tuple in the left table must appear in the result;
if it does not have a matching tuple, it is padded with NULL values for the attributes of
the right table), RIGHT OUTER JOIN (every tuple in the right table must appear in
the result; if it does not have a matching tuple, it is padded with NULL values for the
attributes of the left table), and FULL OUTER JOIN. In the latter three options, the
keyword OUTER may be omitted. If the join attributes have the same name, one can also
specify the natural join variation of outer joins by using the keyword NATURAL
before the operation (for example, NATURAL LEFT OUTER JOIN). The
keyword CROSS JOIN is used to specify the CARTESIAN PRODUCT
operation (see Section 6.2.2), although this should be used only with the utmost care
because it generates all possible tuple combinations.
It is also possible to nest join specifications; that is, one of the tables in a join may
itself be a joined table. This allows the specification of the join of three or more
tables as a single joined table, which is called a multiway join. For example, Q2A is a
different way of specifying query Q2 from Section 4.3.1 using the concept of a
joined table:
Q2A: SELECT Pnumber, Dnum, Lname, Address, Bdate
FROM ((PROJECT JOIN DEPARTMENT ON Dnum=Dnumber )
JOIN EMPLOYEE ON Mgr_ssn=Ssn)
WHERE Plocation=‘Stafford’;
Not all SQL implementations have implemented the new syntax of joined tables. In
some systems, a different syntax was used to specify outer joins by using the com-
parison operators +=, =+, and +=+ for left, right, and full outer join, respectively,
when specifying the join condition. For example, this syntax is available in Oracle.
To specify the left outer join in Q8B using this syntax, we could write the query Q8C
as follows:

Q8C: SELECT E.Lname, S.Lname


FROM EMPLOYEE E, EMPLOYEE S
WHERE E.Super_ssn += S.Ssn;
AGGREGATE FUNCTIONS IN SQL
In Section 6.4.2, we will introduce the concept of an aggregate function as a rela-
tional algebra operation. Aggregate functions are used to summarize information from
multiple tuples into a single-tuple summary. Grouping is used to create sub- groups
of tuples before summarization. Grouping and aggregation are required in many
database applications, and we will introduce their use in SQL through exam- ples. A
number of built-in aggregate functions exist: COUNT, SUM, MAX, MIN, and
AVG.2 The COUNT function returns the number of tuples or values as specified in
a query. The functions SUM, MAX, MIN, and AVG can be applied to a set or
multiset of numeric values and return, respectively, the sum, maximum value, minimum
value, and average (mean) of those values. These functions can be used in the
SELECT clause or in a HAVING clause (which we introduce later). The functions
MAX and MIN can also be used with attributes that have nonnumeric domains if
the domain values have a total ordering among one another.3 We illustrate the use of
these func- tions with sample queries.
Query 19. Find the sum of the salaries of all employees, the maximum
salary, the minimum salary, and the average salary.
Q19: SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary)

FROM EMPLOYEE;
If we want to get the preceding function values for employees of a specific depart-
ment—say, the ‘Research’ department—we can write Query 20, where the
EMPLOYEE tuples are restricted by the WHERE clause to those employees who
work for the ‘Research’ department.
Query 20. Find the sum of the salaries of all employees of the ‘Research’
department, as well as the maximum salary, the minimum salary, and the aver- age
salary in this department.

Q20: SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary)
FROM (EMPLOYEE JOIN DEPARTMENT ON Dno=Dnumber)
WHERE Dname=‘Research’;
Queries 21 and 22. Retrieve the total number of employees in the
company (Q21) and the number of employees in the ‘Research’
department (Q22).

Q21: SELECT COUNT (*)


FROM EMPLOYEE;
Q22: SELECT COUNT (*)
FROM EMPLOYEE, DEPARTMENT
WHERE DNO=DNUMBER AND DNAME=‘Research’;

Here the asterisk (*) refers to the rows (tuples), so COUNT (*) returns the number of
rows in the result of the query. We may also use the COUNT function to count values
in a column rather than tuples, as in the next example.
Query 23. Count the number of distinct salary values in the database.
Q23: SELECT COUNT (DISTINCT Salary)
FROM EMPLOYEE;
If we write COUNT(SALARY) instead of COUNT(DISTINCT SALARY) in
Q23, then duplicate values will not be eliminated. However, any tuples with NULL
for SALARY will not be counted. In general, NULL values are discarded when
aggregate func- tions are applied to a particular column (attribute).
The preceding examples summarize a whole relation (Q19, Q21, Q23) or a selected
subset of tuples (Q20, Q22), and hence all produce single tuples or single values.
They illustrate how functions are applied to retrieve a summary value or summary
tuple from the database. These functions can also be used in selection conditions
involving nested queries. We can specify a correlated nested query with an aggregate
function, and then use the nested query in the WHERE clause of an outer query. For
example, to retrieve the names of all employees who have two or more dependents
(Query 5), we can write the following:
Q5: SELECT Lname, Fname

FROM EMPLOYEE

WHERE ( SELECT COUNT (*)

FROM DEPENDENT

WHERE Ssn=Essn ) >= 2;

The correlated nested query counts the number of dependents that each employee has; if
this is greater than or equal to two, the employee tuple is selected.

GROUPING: THE GROUP BY AND HAVING CLAUSES


In many cases we want to apply the aggregate functions to subgroups of tuples in a
relation, where the subgroups are based on some attribute values. For example, we may
want to find the average salary of employees in each department or the number of
employees who work on each project. In these cases we need to partition the rela- tion
into nonoverlapping subsets (or groups) of tuples. Each group (partition) will
consist of the tuples that have the same value of some attribute(s), called the
grouping attribute(s). We can then apply the function to each such group inde-
pendently to produce summary information about each group. SQL has a GROUP
BY clause for this purpose. The GROUP BY clause specifies the grouping
attributes, which should also appear in the SELECT clause, so that the value
resulting from applying each aggregate function to a group of tuples appears along
with the value of the grouping attribute(s).
Query 24. For each department, retrieve the department number, the number of
employees in the department, and their average salary.
Q24: SELECT Dno, COUNT (*), AVG (Salary)

FROM EMPLOYEE

GROUP BY Dno;

In Q24, the EMPLOYEE tuples are partitioned into groups—each group having
the same value for the grouping attribute Dno. Hence, each group contains the
employees who work in the same department. The COUNT and AVG functions are
applied to each such group of tuples. Notice that the SELECT clause includes only the
grouping attribute and the aggregate functions to be applied on each group of tuples.
Figure 5.1(a) illustrates how grouping works on Q24; it also shows the result of Q24.
Figure 5.1
Results of GROUP BY and HAVING. (a) Q24. (b) Q26.
If NULLs exist in the grouping attribute, then a separate group is created for all
tuples with a NULL value in the grouping attribute. For example, if the
EMPLOYEE table had some tuples that had NULL for the grouping attribute Dno,
there would be a separate group for those tuples in the result of Q24.
Query 25. For each project, retrieve the project number, the project name, and the
number of employees who work on that project.
Q25: SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON
WHERE Pnumber=Pno
GROUP BY Pnumber, Pname;

Q25 shows how we can use a join condition in conjunction with GROUP BY. In
this case, the grouping and functions are applied after the joining of the two
relations. Sometimes we want to retrieve the values of these functions only for groups
that sat- isfy certain conditions. For example, suppose that we want to modify Query
25 so that only projects with more than two employees appear in the result. SQL
provides a HAVING clause, which can appear in conjunction with a GROUP BY
clause, for this purpose. HAVING provides a condition on the summary information
regarding the group of tuples associated with each value of the grouping attributes.
Only the groups that satisfy the condition are retrieved in the result of the query. This is
illus- trated by Query 26.
Query 26. For each project on which more than two employees work, retrieve the project number, the
project name, and the number of employees who work on the project.
Q26: SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON
WHERE Pnumber=Pno
GROUP BY Pnumber, Pname
HAVING COUNT (*) > 2;
Notice that while selection conditions in the WHERE clause limit the tuples to which functions are applied,
the HAVING clause serves to choose whole groups. Figure 5.1(b) illustrates the use of HAVING and
displays the result of Q26.
Query 27. For each project, retrieve the project number, the project name, and the number of employees
from department 5 who work on the project.
Q27: SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON, EMPLOYEE
WHERE Pnumber=Pno AND Ssn=Essn AND Dno=5
GROUP BY Pnumber, Pname;
Here we restrict the tuples in the relation (and hence the tuples in each group) to those that satisfy the
condition specified in the WHERE clause—namely, that they work in department number 5. Notice that
we must be extra careful when two dif- ferent conditions apply (one to the aggregate function in the
SELECT clause and another to the function in the HAVING clause). For example, suppose that we want
to count the total number of employees whose salaries exceed $40,000 in each department, but only for
departments where more than five employees work. Here, the condition (SALARY > 40000) applies only to
the COUNT function in the SELECT clause. Suppose that we write the following incorrect query:
SELECT Dname, COUNT (*)
FROM DEPARTMENT, EMPLOYEE
WHERE Dnumber=Dno AND Salary>40000
GROUP BY Dname

HAVING COUNT (*) > 5;

This is incorrect because it will select only departments that have more than five employees who each
earn more than $40,000. The rule is that the WHERE clause is executed first, to select individual tuples
or joined tuples; the HAVING clause is applied later, to select individual groups of tuples. Hence, the
tuples are already restricted to employees who earn more than $40,000 before the function in the
HAVING clause is applied. One way to write this query correctly is to use a nested query, as shown in
Query 28.
Query 28. For each department that has more than five employees, retrieve the department number
and the number of its employees who are making more than $40,000.
Q28: SELECT Dnumber, COUNT (*)
FROM DEPARTMENT, EMPLOYEE
WHERE Dnumber=Dno AND Salary>40000 AND
( SELECT Dno
FROM EMPLOYEE

GROUP BY Dno

HAVING COUNT (*) > 5)

DISCUSSION AND SUMMARY OF SQL QUERIES


A retrieval query in SQL can consist of up to six clauses, but only the first two— SELECT and
FROM—are mandatory. The query can span several lines, and is ended by a semicolon. Query terms are
separated by spaces, and parentheses can be used to group relevant parts of a query in the standard way. The
clauses are specified in the following order, with the clauses between square brackets [ ... ] being optional:
SELECT <attribute and function list>
FROM <table list>
[ WHERE <condition> ]
[ GROUP BY <grouping attribute(s)> ]
[ HAVING <group condition> ]
[ ORDER BY <attribute list> ];
The SELECT clause lists the attributes or functions to be retrieved. The FROM clause specifies all
relations (tables) needed in the query, including joined relations, but not those in nested queries. The
WHERE clause specifies the conditions for selecting the tuples from these relations, including join
conditions if needed. GROUP BY specifies grouping attributes, whereas HAVING specifies a
condition on the groups being selected rather than on the individual tuples. The built-in aggregate functions
COUNT, SUM, MIN, MAX, and AVG are used in conjunction with grouping, but they can also be
applied to all the selected tuples in a query without a GROUP BY clause. Finally, ORDER BY specifies
an order for displaying the result of a query.
In order to formulate queries correctly, it is useful to consider the steps that define the meaning or
semantics of each query. A query is evaluated conceptually4 by first applying the FROM clause (to identify
all tables involved in the query or to material- ize any joined tables), followed by the WHERE clause to
select and join tuples, and then by GROUP BY and HAVING. Conceptually, ORDER BY is applied at
the end to sort the query result. If none of the last three clauses (GROUP BY, HAVING, and ORDER
BY) are specified, we can think conceptually of a query as being executed as follows: For each
combination of tuples—one from each of the relations specified in the FROM clause—evaluate the
WHERE clause; if it evaluates to TRUE, place the val- ues of the attributes specified in the SELECT
clause from this tuple combination in the result of the query. Of course, this is not an efficient way to
implement the query in a real system, and each DBMS has special query optimization routines to decide
on an execution plan that is efficient to execute. We discuss query processing and optimization in
Chapter 19.
In general, there are numerous ways to specify the same query in SQL. This flexibil- ity in specifying
queries has advantages and disadvantages. The main advantage is that users can choose the technique
with which they are most comfortable when specifying a query. For example, many queries may be
specified with join conditions in the WHERE clause, or by using joined relations in the FROM clause,
or with some form of nested queries and the IN comparison operator. Some users may be more
comfortable with one approach, whereas others may be more comfortable with another. From the
programmer’s and the system’s point of view regarding query optimization, it is generally preferable to
write a query with as little nesting and implied ordering as possible.
The disadvantage of having numerous ways of specifying the same query is that this may confuse the user,
who may not know which technique to use to specify particu- lar types of queries. Another problem is that
it may be more efficient to execute a query specified in one way than the same query specified in an
alternative way. Ideally, this should not be the case: The DBMS should process the same query in the same
way regardless of how the query is specified. But this is quite difficult in prac- tice, since each DBMS has
different methods for processing queries specified in dif- ferent ways. Thus, an additional burden on the
user is to determine which of the alternative specifications is the most efficient to execute. Ideally, the
user should worry only about specifying the query correctly, whereas the DBMS would deter- mine
how to execute the query efficiently. In practice, however, it helps if the user is aware of which types of
constructs in a query are more expensive to process than others.

You might also like