Download as pdf or txt
Download as pdf or txt
You are on page 1of 120

2021-2022

Database
From the conferences
of Dr. Ivanne Khoury

Tronc Commun

SEM 4
The Student Committee is a group of students dedicated
to working for the student body. As the book committee,
our job is to provide for the students all the materials
they need in a specific semester, starting from the
reference books till the exams of the previous years with
their solutions, and most importantly the course notes of
each course, whether they are slides or handwritten by
the teacher of the course or typed notes based on last
year’s lectures. Every single part of the package is
reviewed yearly to add the latest exercises and their
solutions, the latest notions and tips and tricks to ace
the course. And of course, all of this is made in
collaboration with the doctors giving their lectures.

For any questions regarding the contents of the package


don’t hesitate to contact your class delegate.
Table of Contents 
Chapter 1 ....................................................................................................................................................... ..1 
Chapter 2 ....................................................................................................................................................... 15 
Chapter 3 ....................................................................................................................................................... 27 
Chapter 4 ....................................................................................................................................................... 33 
Chapter 5 ....................................................................................................................................................... 45 
Chapter 6 (version 1)...................................................................................................................................... 55 
Chapter 6 (version 2)...................................................................................................................................... 63 
Chapter 7 ........................................................................................................................................................71 
Chapter 8 ........................................................................................................................................................ 89 
Chapter 9 ........................................................................................................................................................ 97 
Chapter 10 ..................................................................................................................................................... 105 
CHAPTER 1
1. Relational databases
This chapter presents the basic ideas and motivations which lie behind the concept of relational
database.
A relational database is defined as a collection of tables connected via relations. It is always a good
idea to have this table organized in a structured was that is called normal form.

1.1. Database in Normal Form


The easiest form of database, which can be handled even by Microsoft Excel, is a single table. To be
a database in normal form, the table must satisfy some requisites:
1. the first line contains the headers of the columns, which univocally define the content of the
column. For example:

2. Each column contains only what is indicated in its header. For example, in a column with header
“Telephone number” we may not put two numbers or indication on the preferred calling time, such
as in the second row of this table:
Student number Name Surname

3. each row refers to a single object. For example, there may not be a row with information on
several
objects or on a group of objects, such as in the second row of this table: Name Surname Degree
course

4. rows are independent, i.e. no cell has references to other rows, such as in the second row of this
table:

Page 1
5. rows and columns are disordered, i.e. their order is not important. For example, these four tables
are the same one:

6. cells do not contain values which can be directly calculated from cells of the same row, such as in
the last column of this table:

Database rows are called records and database columns are called fields.
Single table databases can be easily handled by many programs and by human beings, even when
the tableis very long or with many fields. There are however situations in which a single table is not
an efficient way to handle the information.

1.1.1. Primary key


Each table should have a primary key, which means a field whose value is different for every record.
Many times primary key has a natural candidate, as for example student number for a students’
table, tax code for a citizens table, telephone number for a telephones table. Other times a good
primary key candidate is difficult to detect, for example in a cars’ table the car name is not a primary
key since there are different series and different motor types of the same car. In these cases it is
possible to add an extra field, called ID or surrogate key, with a progressive number, to be used as
primary key. In many database programs this progressive number is handled directly by the program
itself.
It is also possible to define as primary key several fields together, for example in a people table the
first
name together with the last name, together with place and date of birth form a unique sequence for
every person. In this case the primary key is also called composite key or compound key. On some
database management programs however handling a composite key can create problems and
therefore it is a better idea to use, in this case, an ID.

1.2. Relations
1.2.1. Information redundancy
In some situations trying to put the information we need in a single table database causes a
duplication of identical data which can be called information redundancy. For example, if we add to
our students’ table the information on who is the reference secretary for each student, together
with other secretary’s information such as office telephone number, office room and timetables, we
get this table:

Page 2
Information redundancy is not a problem by itself, but:
storing several times the same information is a waste of computer space (hard disk and memory),
which for a very large table, has a bad impact on the size of the file and on the speed of every search
or sorting operation;
whenever we need to update a repeated information (e.g. the secretary changes office), we need
to
do a lot of changes;
manually inserting the same information several times can lead to typing (or copying&pasting)
mistakes, which decrease the quality of the database.

In order to avoid this situation, it is a common procedure to split the table into two distinct tables,
one for the students and another one for the secretaries. To each secretary we assign a unique code
and to each student we indicate the secretary’s code.

In this way the information on each secretary is written and stored only once and can be updated
very
easily. The price for this is that every time we need to know who is a student’s secretary we have to
look at its secretary code and find the corresponding code in the Secretaries table: this can be a long
and
frustrating procedure for a human being when the Secretaries table has many records, but is very
fast task for a computer program which is designed to quickly search through tables.

Page 3
1.2.2. Empty fields
Another typical problem which arises with single table databases is the case of many empty fields.
For
example, if we want to build an address book with the telephone numbers of all the people, we will
have somebody with no telephone numbers, many people with a few telephone numbers, and some
people with a lot of telephone numbers. Moreover, we must also take into consideration that new
numbers will probably be added in the future to anybody.
If we reserve a field for every telephone, the table looks like this:

As it is clear, if we reserve several fields for the telephone numbers, a lot of cells are empty. The
problems
of empty cells are:
an empty cell is a waste of computer space;
there is a fixed limit of fields which may be used. If a record needs another field (for example,
Elena
Burger gets another telephone number) the entire structure of the table must be changed;
since all these fields contain the same type of information, it is difficult to search whether an
information is present since it must be looked for in every field, including the cells which are empty.
In order to avoid this situation, we again split the table into two distinct tables, one for the people
and
another one for their telephone numbers. This time, however, we assign a unique code to each
person and we build the second table with combinations of person‐telephone.

Page 4
Even though it seems strange, each person’s code appears several times in the Telephones table.
This is
correct, since Telephones table uses the exact amount of records to avoid having empty cells: people
appear as many times as many telephones they have, and people with no telephone do not appear
at all. The drawback is that every time we want to get to know telephone numbers we have to go
through the entire Telephones table searching for the person’s code, but again this procedure is very
fast for an
appropriate computer program.

1.2.3. Foreign key


When a field, which is not the primary key, is used in a relation with another table this field is called
foreign key. This field is important for the database management program, such as Access, when it
has to check referential integrity (see section 1.6).
For example, in the previous examples Owner is a foreign key for Telephones table and Secretary is a
foreign key for Students table.

Page 5
1.3. One‐to‐many relation

A relation is a connection between a field of table A (which becomes a foreign key) and the primary
key of table B: on the B side the relation is “1”, meaning that for each record of table A there is one
and only one corresponding record of table B, while on the A side the relation is “many” (indicated
with the
mathematical symbol ∞) meaning that for each record of table B there can be none, one or more
corresponding records in table A.
For the example of section 1.2.1, the tables are indicated in this way, meaning that for each student
there is exactly one secretary and for each secretary there are many students. This relation is called
many‐to‐one relation.

For the example of section 1.2.2, the tables are instead indicated in this way, meaning that for each
person there can be none, one or several telephone numbers and for each number there is only one
corresponding owner. This relation is called one‐to‐many relation.

Clearly one‐to‐many and many‐to‐one are the same relation, the only difference being the order of
drawn tables.
It is however very important to correctly identify the “1” side, since it has several implications on the
correct working of the database. For example, in the previous example putting the “1” side on the
Telephones table means that for each person there is only one telephone and that for each
telephone
there are many people, a situation which is possible up to the 90s, when there was only one
telephone for a whole family used by all its components, but which is not what we want to describe
with the current 21st century’s database. Moreover, reversing the relation also need to change a little
the structure of the tables, putting the foreign key Telephone in the People table instead of the
foreign key Person in the Telephones table, such as

Page 6
1.4. One‐to‐one relation

A one‐to‐one relation is a direct connection between two primary keys. Each record of the first table
has
exactly one corresponding record in the second table and vice versa. An example can be countries
and
national flags. This relation can sometimes be useful to separate in two tables two conceptually
different objects with a lot of fields, but it should be avoided, since the two tables can be easily
joined together in a single table.

1.5. Many-to-many relation

Page 7
1.5.1. Details table

Page 8
1.6. Foreign Key with several relations

Page 9
Page 10
1.7. Refrential Integrity

Page 11
1.8. Temporal versus static database

1.9. Non‐relational structures

1.9.1. Hierarchical structure

1.9.2. Process

Page 12
1.10. Entity‐relationship model

Page 13
1.10.1. Enhanced entity‐relationship model

Page 14
CHAPTER 2
SQL Server Programming Tutorial Database

1- The Tutorial
Throughout this tutorial we will create and use a database to demonstrate a
possible real-world application of the Microsoft SQL Server RDBMS. This database
will support a job booking system that permits the storage and processing of
engineer visits for "DisasterFix", a company that provides services to businesses
and homeowners who have experienced a gas, electricity or plumbing emergency
requiring urgent repair.
NB: DisasterFix is a fictitious company. Any resemblance to any real company is
completely coincidental.
Although a user interface will not be created, the entire data structure and some
business logic elements will be created within the new database. This could be
extended to provide a complete software package using Microsoft .NET
technologies or any other development environment that permits communication
with SQL Server databases.
The following sections describe the company requiring the database and the basic
requirements for their new software.

1-1 DisasterFix
DisasterFix are a fictitious company based in the United Kingdom. They provide
services to businesses and homeowners who have experienced an emergency
with a utility in their property. This includes gas, electricity, plumbing, heating,
etc. When an emergency occurs, DisasterFix sends a suitably skilled engineer to
examine the problem and perform any required repairs.
DisasterFix's customers pay for services in one of two ways. For new customers
with an immediate problem, jobs are charged individually. A set fee is applied
dependent upon the type of job to be undertaken. Customers may also purchase
an insurance policy from DisasterFix. This is an annual contract that allows any
number of jobs to be provided to an address with no additional charges.
DisasterFix does not employ its own engineers. Instead, any service-providing
company or individual may apply to become a DisasterFix partner. Following
suitable training, engineers are approved to provide services on behalf of
DisasterFix and will automatically be assigned work in their designated areas.
All approved engineers are provided with a stock of standard parts that they may
use for jobs assigned by DisasterFix. When a job is completed, the engineer sends
a report back to the DisasterFix head office indicating the parts used and the time
taken. The parts can then be replenished and the cost of the job calculated.

Page 15
In addition to emergency responses, DisasterFix provides non-emergency work.
Such jobs are handled in exactly the same way as emergencies except that they
are always charged individually and are given a lower priority.

1-2 The Job Booking System


The new Job Booking System (JoBS) will replace the existing systems and improve
the efficiency of the DisasterFix operations team by meeting the following
requirements:

 The system will permit the creation of customers with multiple addresses.

 It will be possible to create contracts for annual renewal or for single jobs.
Each contract will be linked to a customer address.

 It will be necessary to hold standard pricing for emergency and non-


emergency work.

 The system will hold a list of approved engineers and the skills that they
have. Skills include "gas", "electricity", "plumbing", etc. Each engineer will be
assigned one of more geographical areas in which they operate.

 When a customer requires work to be undertaken, the system will generate


a job and automatically assign it to an appropriate engineer. The engineer
selection will be dependent upon the customer's address, the skill set and
workload of the engineer, the hourly rate charged by the engineer and the
number of complaints the engineer has received in the previous six months.

 If a customer complains about a job and an engineer is deemed to be at


fault, this will be recorded against the engineer.

 The system must integrate with the existing stock system that controls the
replenishment of the engineers' stock. This requires a daily export of used
parts to the external system.

 The system must integrate with the existing billing system that controls
invoicing and payments.

Page 16
2- SQL Server Management Studio
The Microsoft SQL Server relational database management system executes on a
Windows-based computer as a group of Windows services. These background
processes run continuously and process any database activities that are required.
However, as these are services, they have no user interface that permits their
configuration.
SQL Server Management Studio (SSMS) is a tool that is distributed with SQL
Server. This powerful utility provides a graphical user interface that connects to
one or more local or remote SQL Server installations and allows the user to
perform configuration and design tasks. These include the creation of new
databases and their constituent parts, and the execution of queries and other SQL
statements.
NB: SSMS can be installed using the original SQL Server CD-ROM or DVD-ROM. If
you are using SQL Server Express Edition, SQL Server Management Studio Express
edition can be utilised instead. If this tool is not installed, you can download it
from the Microsoft SQL Server web site.
Using SSMS, databases can be created in two ways. Firstly, SSMS provides screens
that guide you through the process. Secondly, SSMS can execute commands to
create a database in a textual format using a query window. In this article, we will
use SSMS's graphical user interface tools to create a new database for the JoBS
project. We will look at the command-based approach later.
This article assumes that you have access to a SQL Server installation and have the
details and login credentials required connecting to it and creating databases.

a. Starting SQL Server Management Studio


By default, SQL Server Management Studio is installed into a program group
named "Microsoft SQL Server 2005" in the Start menu. To start SSMS, simply click
the associated shortcut. After a few seconds, the "Connect to Server" dialog box is
displayed.

Page 17
b. Connecting to a SQL Server Instance
The Connect to Server dialog box allows you to specify the details of the SQL
Server that you wish to connect to and the credentials that you wish to use to
authenticate with that server. The information that must be completed is as
follows:

 Server Type. SQL Server includes various options for server types, including
Database Engine for core database functionality, Analysis Services for
analytical data warehouses, Reporting Services for reporting purposes and
Integration Services for integration work. Other options may also be
available depending upon your particular workstation configuration.

 Server Name. This allows you to specify the name of a SQL Server to connect
to. Each SQL Server on your network will have one or more instances. An
instance is a named installation of the SQL Server product. If the server
instance you wish to connect to is the default instance, you simply provide
the server's name. If a specific, named instance is to be used, this is provided
in the format <server-name>\<instance-name>. For example, the figure
above shows connection to the VIRTUALXP server's default instance. An
alternative option may be VIRTUALXP\Instance2.

 Authentication. SQL Server's database engine provides two forms of


authentication. Windows authentication uses your current login name to
determine your level of access. SQL Server authentication requires that you
provide a user name and password.

Page 18
To connect to SQL Server in readiness for creating the JoBS database, select the
"Database Engine" server type and provide the details of the server name, and
optional instance name, and your authentication details. Click the Connect button
to make the connection. Once connected, the SSMS main screen will be
populated.

c. The SQL Server Management Studio Main Screen


The SSMS main screen includes some standard elements, such as a menu bar and
toolbar at the top of the screen and a status bar at the bottom. These are used to
perform actions within the tool. The main area of the window is divided into two
sections. To the left is the Object Explorer. This displays an expandable tree
containing the hierarchy of objects from the SQL Server instance. To the right, the
Object Explorer Details section shows the contents or configuration of the
selected item in the Object Explorer.

Page 19
3- System Databases
On starting SSMS, you will find that some databases already exist. These will be
separated into two distinct areas that are accessible by expanding the Databases
section of the Object Explorer tree. Within this branch of the tree is a further,
expandable node named "System Databases". Databases created by the system
appear here whilst user-created databases are displayed at the higher level.
All SQL Server instances include four system databases. These are named master,
model, msdb and tempdb.

a. The "master" Database


The master database contains a series of system tables, stored procedures and
other database objects. This database is critical to the correct operation of the
entire SQL Server instance. It includes all security information, configuration
information for all databases and many other system settings. In some, rare
situations it can be useful to query this database. However, in general your only
interaction with the master database should be ensuring that it is backed up.
Incorrect changes made to this database or corruption of its data can potentially
render the entire system unusable.
b. The "model" Database
The model database is a template for all databases that you create. When you add
a new database to a SQL Server instance, a copy of the model is made under the
name that you provide. This means that any changes that you make to model will
be duplicated in all new databases.
c. The "msdb" Database
One of the Windows Services that forms part of Microsoft SQL Server is the SQL
Server Agent. This service permits the scheduling of tasks such as backups and
maintenance jobs. When a SQL Server Agent task is created, the configuration of
the job is held in the msdb database.
d. The "tempdb" Database
The tempdb database is a special system database that is not persisted when the
SQL Server service is inactive. Each time the service is started, tempdb is rebuilt.
This database is used to hold information temporarily during complex queries,
either because SQL Server requires it or because a procedure or query that you
write creates a temporary table.

4- Creating a Database
Creating a new database is a relatively simple task. However, there are a number
of options that can be set during creation to alter the behaviour of the new

Page 20
database. In the remainder of this article we will create the JoBS database whilst
examining the most relevant configuration items.
NB: In this tutorial we will be creating a database named JoBS. In the event that
your SQL Server instance already has a JoBS database, select a different name and
use this new name in place of JoBS for the rest of the tutorial.

a. The New Database Dialog Box


The simplest manner in which to begin the creation of a new database is to right-
click the "Databases" tree node in the Object Explorer. This displays a context-
sensitive menu. From here, select "New Database..." The New Database dialog
box will appear:

b. Database Name
The first page of the New Database dialog box allows the new database to be
named. In the simplest scenarios, this is actually the only piece of information
that must be provided. All other configuration options can be left at their default
settings.

Page 21
The name of a database can contain letters, numbers, spaces and many symbols.
This gives a great deal of flexibility to the name but can also lead to difficult-to-
read names and names that cannot be used without adding delimiters. It is usual,
therefore, to use only letters and numbers and no spaces. A good convention is to
use Pascal case when joining multiple words and capitalisation for abbreviations,
as in the example database.
To set the name of the database, type JoBS into the Database name text box.
c. Database Owner
All databases are assigned an owner. This is a user that has full rights to modify
and maintain the new database and may be any Windows account or SQL Server
login with the rights to create databases. For new databases, the Owner textbox
shows <default>. This indicates that the owner will be the current user, as used to
log in to the database. The owner can be changed by typing a new name or
searching for a suitable login using the ellipsis button.
For the JoBS database, use the default option.
d. Full-Text Indexing
SQL Server includes a facility known as full-text indexing. This allows the
construction of special indexes that permit searching of textual data. This can be
useful when holding large sections of text that need to be read by a search engine
similar to Google, for example.
In this tutorial we will not be examining full-text indexing so leave this checkbox
blank.
e. Database Files
All databases use two files by default. These are the primary data file and the log
file. The primary data file keeps track of all other data files and holds some or all
of the information that is stored in the database. The log file contains details of
transactions that have occurred against the database and allows those
transactions to be recovered in the event of a disaster.
Each file is listed in a table in the dialog box. The table shows seven columns:

 Logical Name. This is the name of the file that SQL Server will use. By
default, the name is based upon the database name.

 File Type. The file type is either "Data" or "Log", depending upon its purpose.

 Filegroup. The name of the filegroup that the file will be created within. See
below for more information on filegroups.

 Initial Size. SQL Server files do not grow continuously in the same manner as
files in applications such as those in Microsoft Office. Instead, the files are
created with a specific size and grow in increments when the available space
is exhausted. This provides improved performance, which is critical in

Page 22
database systems. The Initial Size column shows the starting size of each file
in megabytes.

 Autogrowth. When a data or log file becomes full, the autogrowth settings
determine the action that is taken by the database engine. If autogrowth is
disabled, the database simply raises an error to indicate that the file is full.
When enabled, the data file can be set to increase by a specific size in
megabytes or by a percentage of the current file size. Each file may be set to
grow indefinitely or can be limited to a specific maximum size.

 Path. This column shows the path to the folder in which the file will be
created.

 File. The last column in the table shows the name of the database file. For
new databases, this appears blank.
Additional files can be added to a database. These may be secondary data files or
additional log files. When extra data files are added, the information held in the
database is distributed amongst all of the available files. This allows files to be
spread over a series of disks for maximum performance.
For the JoBS database, use the default settings for the data and log files.
f. Filegroup Options
Filegroups are simply administrative containers for database files. Every data file
created is allocated to a single filegroup. In many databases, the default
"PRIMARY" filegroup is the only one in use.
Filegroups are useful because individual tables and indexes within a database can
be allocated to a particular filegroup. Using this facility, the disk location of groups
of tables can be determined by the database administrator (DBA). This allows
configurations where highly utilised data can be placed in a group of files that is
using very fast disks whilst archive data is held in a filegroup using slower
hardware.
New filegroups can be added to a database by clicking on the Filegroups option at
the left of the dialog box. Additional filegroups can be created by clicking the Add
button and providing a name. One filegroup can be marked as the default. This is
the one that will be used by new tables and indexes where a filegroup is not
specified.
For the JoBS database, we will use the single, default filegroup.
g. Database Options
The final set of database options can be viewed and modified by clicking
"Options" at the left of the dialog box. This page shows various general
configuration options that may be applied to the new database.
h. Collation

Page 23
The collation setting determines how information in the database is compared
and sorted by SQL Server. This is important as different languages and cultures
use different rules for comparing and sorting text. The collation also determines
whether capital letters and lower-case letters are considered to be equivalent.
When installing SQL Server, a default collation is required. If a collation is not
specified for each new database, the default collation is used instead.
For the JoBS database, use the default collation for the SQL Server instance.
NB: The JoBS database in the tutorial examples uses the Latin1_General_CI_AS
collation. If your collation is set differently, the results of comparisons and sort
operations may vary from those shown.

i. Recovery Model
SQL Server provides three recovery models that determine how a database's log
file is used. Each model has advantages and disadvantages, as explained below:

 Full. The full recovery model is used to minimise the chance of data loss in a
disaster situation. In this model, all transactions are logged to the log file in
detail. The log file must be backed up along with all data files. In a recovery
situation, the data files and log files can be restored to a specific point in
time and usually no data is lost. It is possible that the log file can be
damaged in which case operations since the last log file backup must be re-
run. This model can lead to very large log files if they are not backed up and
truncated often. The full recovery model is useful for business-critical
systems.

 Simple. In the simple recovery model the log file is automatically cleared by
the system to keep it small. There is no requirement, or benefit, to backing
up the log file. Instead, only the data files are backed up. This limits recovery
operations to restoring to the point at which the database was last backed
up. The simple model is useful for development and test databases, or for
non-critical live applications.

 Bulk Logged. The bulk-logged model is similar to the full model except that
most bulk operations are logged in minimal detail. The lowers the size of the
log files at the cost of not being able to recover to a specific point in time.
The transaction logs must be backed up when using this option. The bulk-
logged model is useful for small periods of bulk transactions. Following these
periods, the database can be backed up and switched back to the full model.
As the JoBS database is to be used for example purposes only, choose the simple
recovery model.
j. Compatibility Level
The Management Studio can be used to create databases for earlier versions of
SQL Server. If an earlier version is to be targeted, an alternative option can be

Page 24
selected from the Compatibility Level drop-down list. Choosing an earlier version
will mean that less functionality is available.
For the JoBS database, ensure that the SQL Server 2005 option is selected.
k. Other Options
The final area of the New Database dialog box contains a set of further
configuration options that can be adjusted as required. These are beyond the
scope of this article and should be left at their default settings for the JoBS
database.
l. Creating the New Database
Now that the entire configuration for the new database is complete, it can be
generated. Click the OK button to complete the process. You will notice that the
status at the bottom-left of the dialog box changes briefly to "Executing" whilst
the model database is duplicated and the configuration settings are applied. Once
completed, the dialog box will close and the Object Browser will have a new tree
node named "JoBS" beneath the Databases branch.
Congratulations, you have now created the new database!

Page 25
Page 26
CHAPTER 3
Creating SQL Server Databases Part 2

1- Transact-SQL
As we have seen in the previous instalment of this tutorial, the SQL Server
Management Studio (SSMS) graphical user interface can be used to create and
configure new databases. This use of SSMS is ideal for development scenarios
where full access to the server is available. However, in many real-life situations
you may have no direct access to a SQL Server. For example, with off-the-shelf
software you may have to rely upon executing commands from your script or
program against a SQL Server installation that you have never seen.
To allow commands to be executed against a SQL Server instance or database, the
Transact-SQL (T-SQL) language is used. This is Microsoft's variant of the structured
query language (SQL). It contains textual commands, or statements, that are used
to create databases and their constituent parts, to query tables and views and to
manipulate data. All tasks that can be undertaken with SQL Server are controlled
using T-SQL.
NB: When creating a database using a graphical user interface such as SSMS, the
selections made are converted into an appropriate T-SQL statement for execution.
In many cases, you will find a "Script" button on the screen that allows you to view
the underlying T-SQL code.

2- Executing T-SQL in SQL Server Management Studio


There are many commercial and freeware tools available for purchase or
download that can be used to issue T-SQL commands to a SQL Server instance. T-
SQL statements may also be executed from program code, such as .NET
framework-based software. However, for the purposes of this article and tutorial
we will use either SQL Server Management Studio or the free SQL Server
Management Studio Express software.
To run T-SQL commands in SSMS, a query window must be opened and
connected to a SQL server instance and database. If you have not already done so,
open SSMS and enter the details of the SQL Server that you are using for the
tutorial's examples. You can now open a new query window by opening the File
menu, selecting the New submenu and clicking the option, "Query with Current
Connection". You can also open such a window by clicking the "New Query"
toolbar button or by pressing Ctrl-N.

Page 27
Windows in SSMS appear in a tabbed layout by default, allowing you to click
between tabs to show the other open windows. The tab at the top of the query
window shows the name of the current SQL server and the name of the database
that has been selected automatically. The database chosen will vary according to
your security privileges.
NB: It is very important to ensure that the database selected is the one that you
want to execute commands against. If the database is incorrect, you can quickly
change it using the drop-down list of databases in the toolbar.

a. Executing a Simple Command


Query windows allow T-SQL statements to be typed and executed. We can test
this with a simple command that outputs some text. Type the following command
into the new query window. Note the use of apostrophes around the literal text.
This is T-SQL's way of declaring string data.

PRINT 'Hello world'

To execute the command, ensure that no text is selected and then choose
Execute from the Query menu. You can also click the Execute button or press the
F5 key to run a command. On completion, the query window will update to show
the output of the print statement.

b. Executing a Multiple Line Command


T-SQL commands do not need to be run individually. A script of multiple
statements can be created, simply by adding more commands to the query
window. Try changing the query as follows and then hitting F5 to see the results.
Again, ensure that no text is selected in the query window before executing.

PRINT 'Hello'

PRINT 'world'

c. Executing a Selection
In the last two examples no selection was made when the Execute command was
given. When no text is selected, the entire content of the query window forms the
script to be run. One very useful aspect of SSMS is the ability to select one or
more lines, or even parts of lines, from a query window and only run that part.
You can try this by selecting either of the two print statements and pressing F5.
This is advantageous when you have multiple statements in a single window but
want to run them individually.

Page 28
3- The CREATE DATABASE Command
Now that we can run statements, we can investigate the CREATE DATABASE
command. This command allows a new database to be generated. In its simplest
form, only a name for the new database is required with all settings and file
locations automatically using their defaults.
To create a new database for the JoBS example, try executing the following
statement. After execution, right-click the "Databases" branch of the object
explorer tree and choose "Refresh" from the context-sensitive menu that appears
to see the new database in the list.
NB: In this tutorial we will be creating a database named JoBS. In the event that
your SQL Server instance already has a JoBS database, select a different name and
use this new name in place of JoBS for the entire tutorial.

CREATE DATABASE JoBS

If you have been following the tutorial examples so far, the above command will
fail with an error explaining that the database already exists. A database can be
deleted, or dropped, using the DROP DATABASE command. To drop the JoBS
database, execute the following statement.
NB: If you are using a different name than "JoBS", replace this in the drop
statement. Check this carefully to ensure that you do not delete the wrong
database!

DROP DATABASE JoBS

You may receive an error when trying to drop a database if the database is in use.
Any connection to the database, including having the database selected as
current for any open query window, will prevent it from being deleted.

4- Files and Filegroups


In some cases creating a database using the default file names and locations is
acceptable. However, often it is not and you must provide the names of the
filegroups and files that you wish the database to use. This is achieved by
providing file specifications and filegroups to the CREATE DATABASE command.
A file specification is defined with the minimum of a name and filename,
surrounded by parentheses (). The following file specification represents a file
named "DatabaseFile" located in the root of the C: drive. It is usual to use the
MDF extension for data files and LDF for log files.

(NAME='DatabaseFile', FILENAME='C:\NewDatabase.mdf')

Page 29
To provide names and file details for both the data file and the log file for a new
database, two file specifications must be provided. The following sample shows
this with the first file specification provided for the primary data file and the
second for the log file. Executing this command will create both files in the root of
C: and use non-default naming.

CREATE DATABASE JoBS

ON PRIMARY (NAME='JoBSData', FILENAME='c:\JoBS.mdf')

LOG ON (NAME='JoBSLog', FILENAME='c:\JoBS.ldf')

As described in the first article regarding database creation, multiple files can be
created for both the data and log files. The information in the database or log is
then spread amongst the available files. To add more files for either the data or
the log, additional file specifications are included in the CREATE DATABASE
command, each separated by a comma. The following example demonstrates this
by using two data files and two logs files. The files are located on different disks
for a performance improvement.

CREATE DATABASE JoBS

ON PRIMARY (NAME='JoBSData', FILENAME='c:\JoBS.mdf'),

(NAME='JoBSData2', FILENAME='d:\JoBS2.mdf')

LOG ON (NAME='JoBSLog', FILENAME='e:\JoBS.ldf'),

(NAME='JoBSLog2', FILENAME='f:\JoBS2.ldf')

The data files used for a new database may be arranged in filegroups. These are
administrative containers for files that can aid performance as individual tables
and indexes can be assigned to a filegroup, and therefore can be targeted at a
specific disk drive or set of drives. A filegroup can be specified within the comma-
separated of data files with subsequent files in the list linked to that group.
Filegroups may not be specified for log files.

CREATE DATABASE JoBS

ON PRIMARY (NAME='JoBSData', FILENAME='c:\JoBS.mdf'),

FILEGROUP SECONDARY

(NAME='JoBSData2', FILENAME='c:\JoBS2.mdf')

LOG ON (NAME='JoBSLog', FILENAME='c:\JoBS.ldf')

Page 30
When a new table or index is added to a database, the filegroup that it will be
stored in is set to the default filegroup unless otherwise specified. The default
filegroup is usually the primary group. However, this can be changed by adding
the DEFAULT keyword when creating a filegroup, as shown below.

CREATE DATABASE JoBS

ON PRIMARY (NAME='JoBSData', FILENAME='c:\JoBS.mdf'),

FILEGROUP SECONDARY DEFAULT

(NAME='JoBSData2', FILENAME='c:\JoBS2.mdf')

LOG ON (NAME='JoBSLog', FILENAME='c:\JoBS.ldf')

5- Setting File Sizes and Growth Options


The files created by the above scripts use the default settings for their original size
and the manner in which they grow. You can determine an initial size for each file
using the SIZE clause within the file specification. The size must be specified in
kilobytes, megabytes, gigabytes or terabytes by providing an integer with a suffix
of KB, MB, GB or TB respectively.
The following statement creates the JoBS database with a data file of one
gigabyte and a log file of 512 megabytes. If you use Windows Explorer to view the
generated files' properties you will be able to see that these file sizes have been
pre-allocated.

CREATE DATABASE JoBS

ON PRIMARY (NAME='JoBSData', FILENAME='c:\JoBS.mdf', SIZE=1GB)

LOG ON (NAME='JoBSLog', FILENAME='c:\JoBS.ldf', SIZE=512MB)

A maximum file size may be set within a file specification. This uses a similar
syntax to assign a value to the MAXSIZE option. If the value is set to UNLIMITED,
the file will grow as required until the disk is full.

CREATE DATABASE JoBS

ON PRIMARY (NAME='JoBSData', FILENAME='c:\JoBS.mdf', SIZE=1GB, MAXSIZE=UNLIMITED)

LOG ON (NAME='JoBSLog', FILENAME='c:\JoBS.ldf', SIZE=512MB, MAXSIZE=1GB)

Finally, the growth options can be changed for each file specification individually.
The FILEGROWTH option determines the automatic size increase of a file when it
becomes full. The value can be set as a specific increment using a suffix of KB, MB,

Page 31
GB or TB. Alternatively, the file can be configured to increase by a percentage
value when required. The following example shows a percentage increment for
the data file and a specific value for the log file.

CREATE DATABASE JoBS

ON PRIMARY

(NAME='JoBSData', FILENAME='c:\JoBS.mdf', SIZE=1GB, MAXSIZE=4GB, FILEGROWTH=15%)

LOG ON

(NAME='JoBSLog', FILENAME='c:\JoBS.ldf', SIZE=512MB, MAXSIZE=1GB, FILEGROWTH=64MB)

6- Setting a Collation
The final element of the CREATE DATABASE command that we will consider in this
article is the ability to set the collation. This is achieved using the COLLATE clause.
This clause is appended to the command with the name of a valid SQL collation or
Windows collation. The selected collation name determines how data is ordered
and compared. The following example uses the "Latin1_General_CI_AS" collation,
which is case-insensitive and accent-sensitive.

CREATE DATABASE JoBS

ON PRIMARY (NAME='JoBSData', FILENAME='c:\JoBS.mdf')

LOG ON (NAME='JoBSLog', FILENAME='c:\JoBS.ldf')

COLLATE Latin1_General_CI_AS

Page 32
CHAPTER 4
Creating SQL Server Tables Part 1

1- What is a Table?
The table is the most important part of a relational database. Tables are the
storage location for all information within a database. Each table holds a single
set of structured information. For example, you may have a database that stores
customers, order headers and order lines. Three tables could be used in this
situation, one for each key entity.

Columns and Rows


A table is organised into columns and rows. One named column is defined in a
table for each individual piece of data that may be stored. In a table of customers,
columns may be created for first name, last name, etc. The columns define a set
of rules that restrict the information held in the table. Rows contain the actual
information stored in the table. Each row includes a value for every column
according to the rules that they enforce. In the case of customers, this would be
one row per customer.

2- Types of Data
All columns have a data type. The data type enforces a rule that restricts the kinds
of information that can be held in the table. Generally speaking, the data types
can be classified into five groups: numeric, character, date and time, large objects
and other miscellaneous types. The various data types are described in the
following sections.
When choosing the type of data for numeric columns, the type that uses the
minimum amount of storage space, whilst still allowing for the full range of
possible values, should be selected. If in the future the requirements for a column
change, it is possible to modify a column's data type to allow for a larger range.

Numeric Data Types


The numeric data types are used for columns that hold numeric data only. This
includes integers, floating-point numbers and fixed-point values. There are eleven
numeric data types.

Page 33
a. Bit Data Type
The bit data type holds a Boolean value. Though technically this may not be
considered a numeric data type, it is included in this section as often the values it
contains are referred to as either zero or one.

b. Int Data Type


Int data is used to store 32-bit integers that can hold values between -
2,147,483,648 and 2,147,483,647. Columns of this type require four bytes of
storage per row of data.

c. BigInt Data Type


BigInt columns can store 64-bit integer values. They can hold values between -
9,223,372,036,854,775,808 and 9,223,372,036,854,775,807. Columns of this type
require eight bytes of storage per row of data.

d. SmallInt Data Type


A smallint is a 16-bit integer that can hold values between -32,768 and 32,767.
Columns of this type require two bytes of storage per row.

e. TinyInt Data Type


The TinyInt data type is the smallest integer type. It holds an unsigned 8-bit
integer value containing values between 0 and 255. Columns of this type require
one byte of storage per row.

f. Decimal Data Type


The decimal data type is used to store fixed-point numbers with up to 38 digits.
The limitations of the number are described using precision and scale values. The
precision determines the total number of digits that can appear in the number,
including digits to the left and right of the decimal point. The scale value
determines how many of these digits appear to the right of the decimal point. For
example, a precision of five and a scale of two, described as (5,2), allows values
with up to three digits in the integer part and two digits in the fractional part,
such as 123.45.
The amount of storage space used by decimal value varies according to the
precision value specified:

Precision Storage Bytes

Page 34
1-9 5
10 - 19 9
20 - 28 13
29 - 38 17
g. Numeric Data Type
The numeric data type is functionally equivalent to the decimal data type.

h. Float Data Type


The float data type stores floating-point values. Unlike decimals and numerics, the
float data type stores approximate values only and its use is subject to rounding
errors.
The float data type requires the specification of a value that determines the
accuracy of the stored value. The value can be between one and fifty-three and
specifies the number of bits used to hold the mantissa of the floating-point
number when described in scientific notation. The overall precision and the
storage space required for floats are also determined by this value.

Mantissa Bits Precision Storage Bytes


1 - 24 7 digits 4
25 - 53 15 digits 8

i. Real Data Type


The real data type is functionally equivalent to float(24), ie. a float data type with
24 bits for the storage of the mantissa. A real value uses four bytes of data
storage.

j. Money Data Type


The money data type is used to hold currency values. The money data type uses
eight bytes of storage to hold a fixed-point value with four decimal places. The
value may be between -922,337,203,685,477.5808 and
922,337,203,685,477.5807.

k. SmallMoney Data Type


The smallmoney data type is similar to the money type but uses only four bytes of
storage. The range of values that it may hold is from -214,748.3648 to
214,748.3647.

Character Data Types

Page 35
Character data types are used to store textual information. Depending upon the
type selected, the data is of fixed or variable length. Fixed length columns hold a
specific number of characters. If the string stored in the column is larger than that
available, the text is truncated. If the text is shorter than the fixed size, the data is
padded. Variable length data types have a maximum size defined and truncate
strings that are oversized.
Character data types can be used to store either Unicode or non-Unicode
information. Unicode strings use two bytes per character and allow letters,
numbers and symbols from many international languages to be stored. Non-
Unicode data types cannot store multilingual character sets but generally only use
a single byte per character.
The selection of textual data type is not at simple as choosing a numeric type.
Firstly, consider whether the data to be stored will contain information in multiple
languages. If it will, select a Unicode type. If not, select the smaller non-Unicode
equivalent. If the size of the information stored in a character-based column will
generally be approximately the same size for all rows in a table, select a fixed size.
If the text will vary in length considerably, select a variable size column type.

a. Char Data Type


The char data type is used to hold a fixed length, non-Unicode text string of up to
8,000 characters. A size is specified that determines the number of bytes that will
be used to hold the text. For English collations, the number represents the
number of characters that may be held. If the collation specified for the database
or the column uses double-byte characters, the maximum length of the string will
be less.

b. NChar Data Type


The nchar, or national character, data type stores fixed length Unicode text. The
size of the data type is specified in characters and may be between one and four
thousand. The storage size for an nchar column is always double the maximum
number of characters.

c. VarChar Data Type


The varchar data type is similar to char but stores variable length text. A
maximum size for varchar columns is specified in bytes and may be between one
and eight thousand. The amount of storage space used by a varchar is dependant
upon the length of the string stored, not the maximum length specified.

d. NVarChar Data Type


The nvarchar data type is the Unicode equivalent of the varchar type. The
maximum size of the column is specified in characters and the storage space used

Page 36
is double the length of the data actually stored. The maximum size may be
between one and four thousand characters.

Date and Time Data Types


The date and time data types are used to store dates and times. These two data
types differ only in the accuracy of the information that can be held. You should
choose the smalldatetime data type if the accuracy and date range is acceptable.
If not, use the datetime type.

a. DateTime Data Type


The datetime data type is the more accurate of the two date and time data types.
Columns of this type can hold dates between 1 January 1753 and 31 December
9999. The time element of the data type is accurate to one three-hundredth of a
second. Columns of this type require four bytes of storage per row of data.

b. SmallDateTime Data Type


The smalldatetime is similar to datetime but uses only four bytes of storage per
data row. Accordingly, the range of dates permitted is reduced to 1 January 1900
to 6 June 2079. The time element of the data type has a lower accuracy, only
storing hours and minutes.

Miscellaneous Data Types


These data type are used to store information that cannot be classified as simply
numeric or textual.

a. UniqueIdentifier Data Type


The uniqueidentifier data type is used to hold globally unique identifiers (GUIDs).
A GUID is a large hexadecimal number that can be generated with a very small
chance of ever being duplicated. Each GUID uses sixteen bytes of storage space.

b. Timestamp
A timestamp column simply holds an eight-byte number that is unique to the
database. Timestamps change each time their containing row in the database is
modified. This makes timestamps ideal for determining a version number for a
row and determining if a row has changed between two points in time. It is
important to understand that, despite the name, timestamps do not store a date
or time.

c. Binary Data Type

Page 37
The binary data type is used to hold fixed length binary-formatted data. This is
useful for storing information that must be encrypted, such as passwords. The
size of binary columns is specified as between one and eight thousand bytes.

d. VarBinary Data Type


The varbinary data type is a variable length version of the binary type. The data
size is between one and eight thousand bytes but the amount of storage space
used is determined by the actual length of information stored.

e. SQL_Variant Data Type


The sql_variant data type can be used to store information of varying types in a
single table column. An sql_variant can contain most other data types with the
exception of large objects and user defined types.

Large Object Data Types


Large object data types are used to hold information that cannot fit in the limited
size of the previously mentioned types. They permit the storage of binary data in
addition to standard Unicode and non-Unicode text. The information in a large
object is held separately to standard row data, potentially in a separate filegroup.

a. VarChar(MAX) Data Type


The varchar(MAX) data type is used to hold non-Unicode text and is similar to the
standard varchar type. The MAX data size indicates that columns of this type may
hold up to almost two gigabytes of information with an enforced maximum size of
2,147,483,647 bytes.
This data type was introduced in SQL Server 2005 and should be considered a
replacement to the text data type. Text data is still permitted for legacy and
compatibility purposes. It may be deprecated in a future release of SQL Server.

b. NVarChar(Max) Data Type


The nvarchar(MAX) data type is equivalent to varchar(MAX) except that it stores
Unicode characters. This data type is a replacement for the ntext data type, which
may be deprecated in future SQL Server versions.

c. Text Data Type


The text data type provides a method for storing large non-Unicode character
strings of up to just less than two gigabytes. In SQL Server 2005, the text data type
is provided for legacy and compatibility purposes only. The varchar(MAX) data
type should be used instead for new databases.

Page 38
d. NText Data Type
The ntext data type is the Unicode equivalent of text. For new database, use
nvarchar(MAX) instead of ntext.

e. XML Data Type


The XML data type is used to hold XML documents and fragments. A fragment is
an XML document that does not include a single top-level element. An XML
schema may be associated with an XML data column, preventing information that
does not confirm to the schema from being stored. This type of column is said to
hold typed XML. The maximum size of XML data is 2,147,483,647 bytes.

f. VarBinary(MAX) Data Type


The varbinary(MAX) data type is similar to the varbinary data type. This large
object version may hold up to 2,147,483,647 bytes of information in a single
column. It is often used for holding graphical information or information that
would normally be held in a file. The data type is a replacement for the image
data type, which may be deprecated in the future.

g. Image Data Type


The final data type in SQL Server 2005 is the image. This is a legacy data type that
is provided for compatibility with older versions of SQL Server. For new databases,
the varbinary(MAX) data type should be used instead.

Nullable Columns
In addition to the type of data held in a column, all table columns can be marked
as either nullable or not nullable. A column that is not nullable requires that a
value is present in every row. Nullable columns can be assigned the special value
of NULL. A null value in a row indicates that the value has not been set.
The null value gives an additional option to the range of values that a column can
contain. This is beneficial when all of the possible values that a data type gives
could potentially be used in a column but an additional, "undefined", option is
required. For example, consider a database that holds the resultant data from a
survey where interviewees answer yes or no to a series of questions. The answers
would be stored in nullable bit columns with a 0 indicating that the user answered
"no", a 1 for "false" and null to indicate that the question remains to be
answered.

3- Creating Tables
In this tutorial we will consider two methods for creating database tables. The
first is using the graphical user interface aspects of SQL Server Management

Page 39
Studio and is examined in this article. The second, using Transact-SQL (T-SQL)
commands and scripts will be described in the next tutorial instalment. We will
use these technologies to create tables within the sample JoBS database. If you
have not done so already, create the empty JoBS database using the techniques
described in the earlier article, Creating SQL Server Databases Part 1.

3.1 Creating Tables Using SQL Server Management Studio


In this section we will using SQL Server Management Studio (SSMS) to create a
new table in the JoBS database. This table will hold the list of customers that
receive services from DisasterFix. The usual naming convention for database
tables is a name that describes the table's contents and is specified in Pascal case.
In this case, we will name the table "Customers".

To create the new table, expand the tree in the Object Explorer so that the Tables
branch of the JoBS database is visible. Right-click this branch and select "New
Table..." from the context-sensitive menu that appears. A new design window will
be opened in the main area of SSMS and the Properties area will update to allow
modification of the new table's options. At this point, the new table has not yet
been created in the database.

Setting Table Properties

Page 40
There are several properties that can be set for a new table. The key properties
that are of interest to this article are as follows:

 Name. The name of the table. A new name for the table can be entered here
to replace the default name. If the name is not changed, a new name will be
requested when the table is saved.

 Description. A simple, human-readable description for the table.

 Schema. Tables may be grouped into schemas using this option. Schemas
are beyond the scope of this article.

 Regular Data Space Specification. This setting can be expanded and, using
the options within, you can set the filegroup for storage of the table's row
data. This is the storage location for normal column data, excluding large
objects.

 Text/Image Filegroup. If desired, large object data can be held in a different


filegroup to normal row data. The target filegroup is specified using this
property.
The only property that requires modification for this first table is the name.
Change the name property of the table to "Customers".

Adding Columns
The Customers table will be used to hold various elements of data for each
customer row. These are as follows:

 Customer Number. Each customer will have a unique reference number.


This will be used to identify the customer and will appear on
correspondence. A simple, non-nullable integer will suffice for this column. It

Page 41
will be named "CustomerNumber" to match the Pascal case standard. Note
that the space has been removed from between the words in the column
name. This is not strictly necessary but makes the column easier to work
with.

 Contact Name. All customers will require a name. For business customers
this will be the primary contact for the business. For homeowners this will be
the name of the person who purchased a service or contract from
DisasterFix. The name will be held in two columns named "FirstName" and
"LastName". As these will be quite variable in size we will use a varchar
column for each. The maximum size for each will be 25 characters.

 Business Name. For business customers, the name of the company will be
held. This will be stored as a variable length character string with a
maximum size of one hundred bytes. As homeowners will not have a
business name, the "BusinessName" column will be nullable.

 Creation Date. The final column will hold the date of creation of the
customer. This will be used for auditing purposes with only the date being
required. We will use a non-nullable, smalldatetime column named
"CreationDate".

To create the table's columns, you use the grid in the main area of SSMS. The first
column in the grid determines the name of the new column. The second column
in the grid permits the selection of the data type for the new table column. Where
the data type requires a size to be specified, this is appended to the data type
name within parentheses (). The last column shows a checkbox for each table
column. If checked, the new column will allow null values to be stored. If not
checked, a value will always be required.
Add the five column definitions to the table designer grid. The final result should
look similar to the diagram below. If you make an error whilst adding the
columns, you can modify values after clicking on them. You can also insert and
delete items by right-clicking the small square at the left of the grid row and
choosing the appropriate option from the context-sensitive menu.

Once your design matches that shown, the table must be saved. To do so, select
"Save Customers" from the File menu, click the Save button on the toolbar or

Page 42
press Ctrl-S. The table will be committed to the database and will appear beneath
the Tables branch of the Object Explorer. NB: If the table does not appear, right-
click the Tables branch and select "Refresh".
The table designer may now be closed by clicking the X at the top-right of the
tabbed window. If you need to edit the table again at a later date, the designer
can be reopened by right-clicking the table name in the Object Explorer and
selecting the "Design" option.
Congratulations, you have now created the first JoBS database table!

Limitations
There are some limitations that you should be aware of when creating database
tables. The first of these is that there is a limit of two billion tables permitted per
database. However, a database containing such a large number of tables would
be unwieldy to say the least!
A single table may contain up to 1,024 columns. The total size of these columns
must not exceed 8,060 bytes, excluding large object data. This is a limitation on
the size of the actual information held in a row. It is, therefore, possible to create
a table containing variable length columns with a maximum size greater than
8,060 bytes. However, if you try to insert a row with combined data exceeding
this limit you will receive an error.

Page 43
Page 44
CHAPTER 5
Creating SQL Server Tables Part 2

The CREATE TABLE Command


In the previous article we created database tables using the graphical user
interface tools of SQL Server Management Studio (SSMS). This method is simple
and ideal for development work. However, when deploying an application it is
often necessary to script the table creation as Transact-SQL (T-SQL) commands. In
this article we will describe the use of the CREATE TABLE command, which allows
the generation of new database tables and their constituent columns.
In this instalment of the tutorial, we will create all of the remaining tables
required by the JoBS database.

Command Syntax
The simplest way to use the CREATE TABLE command is to provide a table name
and a comma-separated list of column specifications. This is shown in the syntax
below:
CREATE TABLE table-name
(
column-specification1,
column-specification2,
...
column-specificationN
)
Each column specification describes a column that will be created within the
table. Here we can provide a name for the column and specify the column type
and size to use. We can also determine if the column should permit the storage of
null values by appending either "NULL" or "NOT NULL" to the specification. These
three items are separated by space characters, making the syntax for a column
specification as follows:
column-name data-type [NULL | NOT NULL]
NB: The use of "NULL" is optional. If omitted, a column will be nullable by default.

Creating a Simple Table


Creating a table using T-SQL is a relatively simple operation, though the
statements can quickly become rather long if the number of columns is high. We
will start by creating a simple table with just two columns in the JoBS database. If

Page 45
you have not already done so, start SQL Server Management Studio and open a
new query window.
We need to ensure that we are working against the correct database before
executing any of the commands below. In an earlier article I explained how to
change the active database using the drop-down menu in SSMS. You can also
change databases using T-SQL with the USE command. If your query window is
not showing JoBS as its active database, run the following script. You should see
the title of the query window and the database selection drop-down list update
accordingly. NB: If your tutorial database is not named "JoBS", substitute the
correct name in the USE command.

USE JoBS

We can now create our new table. This table will hold a list of all of the
geographical areas that are serviced by DisasterFix engineers. Each row will have
a two-letter area code and name of the geographical area, with both columns
being mandatory. Later in the tutorial we will use this information to link a
customer's address to a specific area so that customers can be matched to
appropriate engineers. Let's create the table using the script below:

CREATE TABLE GeographicalAreas

AreaCode CHAR(3) NOT NULL,

AreaName VARCHAR(25) NOT NULL

After executing the statement, right-click the "Tables" branch in the Object
Explorer and choose "Refresh". The new table should now be visible. If you right-
click the table name, you can select "Design" to see the table's structure in the
designer, as you did when creating the Customers table in the previous article in
this tutorial.

Setting the Main Filegroup


The previous example table makes no mention of a filegroup to use for storing the
table's data. If you are building a database that includes multiple filegroups you
may want to specify the location of each table individually. You can do this by
adding the suffix "ON" to the CREATE TABLE statement, following the keyword
with the name of the filegroup that you wish to use. If you do not specify a
filegroup, the primary filegroup is used by default.

Page 46
The JoBS database includes only a single filegroup. However, if it had a second
filegroup named "METADATA", we could target the GeographicalAreas table at
this location by modifying the T-SQL command as follows:

CREATE TABLE GeographicalAreas

AreaCode CHAR(3) NOT NULL,

AreaName VARCHAR(25) NOT NULL

ON METADATA

Setting a Text / Image Filegroup


Some tables contain one or more columns that use the large object data types,
such as VarChar(MAX), XML, etc. If desired, the large object data can be held in a
separate filegroup to the information in the other columns of the table. This is
useful when using multiple files and filegroups in order to give the best
performance.
In the JoBS database we require a table that will hold complaints. Each complaint
will include a column that holds details of the complaint made with the text
potentially being very long. This will therefore be held in a VarChar(MAX) data
type. The complaint can be linked to a customer by holding the customer's unique
reference number and, potentially, to an engineer who will also have a reference
number.
We can create this table using the following statement:

CREATE TABLE Complaints

ComplaintReference INT NOT NULL,

CustomerNumber INT NOT NULL,

EngineerId INT,

ComplaintTime DATETIME,

Complaint VARCHAR(MAX) NOT NULL

Page 47
If the JoBS database contained multiple filegroups, we may decide that the
information would generally be stored in the filegroup named CORE whilst the
large object data would be held in a separate filegroup named LARGE. You have
already seen how to specify the location of the regular data. To set the large
object filegroup, you use the TEXTIMAGE_ON command as shown below:

CREATE TABLE Complaints

ComplaintReference INT NOT NULL,

CustomerNumber INT NOT NULL,

EngineerId INT,

ComplaintTime DATETIME,

Complaint VARCHAR(MAX) NOT NULL

ON CORE

TEXTIMAGE_ON LARGE

Creating the JoBS Database Tables


The JoBS database now includes three tables, permitting the storage of
customers, geographical areas and complaints. To provide for all of the
functionality required in the application we need to add a further twelve tables.
These tables are described in the following sections. As a learning exercise, you
may want to create these tables yourself using the SSMS graphical user interface
or with T-SQL commands. Alternatively, you can download the script using the
link at the top of this article and run it to create the database and its fifteen
tables.

CustomerAddresses Table
The CustomerAddresses table holds details of addresses. By holding the addresses
in a separate table to the customers we can have multiple addresses for a single
customer without unnecessary repetition of data. This is known as a "one-to-
many" relationship.
Each address includes a reference to the customer number so that the two tables
can later be linked together. The address also includes a region code that will link
to the GeographicalAreas table. Each address includes a reference number.

Page 48
Column Name Data Type Nullable?
AddressId Int No
CustomerNumber Int No
Address1 VarChar(50) No
Address2 VarChar(50) Yes
Address3 VarChar(50) Yes
TownOrCity VarChar(30) No
AreaCode Char(3) No
Postcode Char(8) No

Contracts Table
The Contracts table holds information relating to single jobs and annual contracts.
Each contract is linked to a customer address; if a customer wishes to obtain an
insurance policy for multiple premises, they will actually create several contracts.
Recurring contracts include a renewal date and a flag that indicates if the
customer wishes to automatically renew or whether contact must be made
beforehand. Single jobs will hold a null value in the renewal columns. Both single
jobs and contracts include a monetary value.
NB: The CustomerAddressId column will be used to provide a link to the
CustomerAddresses table. Note that the name of the two matching columns need
not be identical but that the data type should be.

Column Name Data Type Nullable?


ContractNumber Int No
CustomerAddressId Int No
RenewalDate SmallDateTime Yes
RenewAutomatically Bit Yes
ContractValue Money No

Skills Table
The Skills table will be used to store the various skills that may be required of an
engineer. A unique code and a name for each skill will be held.

Column Name Data Type Nullable?


SkillCode Char(3) No
SkillName VarChar(30) No
StandardJobs Table

Page 49
The JoBS system will hold a list of standard jobs that may be undertaken by
engineers. For simplicity, custom jobs are not permitted; if an unexpected job
must be undertaken it will be added to this table first. A standard job includes a
unique reference number, a name for the job and a standard price to charge.

Column Name Data Type Nullable?


StandardJobId Int No
JobName VarChar(100) No
StandardPrice Money No

RequiredSkills Table
Each standard job will require one or more skills to complete successfully. In order
that the correct engineer is despatched to do the work, the RequiredSkills table
holds links between the StandardJobs and Skills tables, determining which skills
are required. In a later article, we will create links between the tables to cement
this "many-to-many" relationship.

Column Name Data Type Nullable?


StandardJobId Int No
SkillCode Char(3) No

Engineers Table
We need a table to hold a list of people that may be employed to undertake work
on behalf of DisasterFix. This information will be stored in the Engineers table.
The table will hold an optional photograph to be used on engineer ID cards.

Column Name Data Type Nullable?


EngineerId Int No
EngineerName VarChar(50) No
HourlyRate Money No
Photograph VarBinary(MAX) Yes
NB: You would probably need to hold more engineer information than this in a
real-world solution. In this case the table has been simplified for the purposes of
the tutorial.

EngineerWorkingAreas Table

Page 50
The EngineerWorkingAreas table will provide a many-to-many link between
engineers and geographical areas. This information will determine which areas
the engineer is willing to work in. This data is necessary when matching customer
work to engineers.

Column Name Data Type Nullable?


EngineerId Int No
AreaCode Char(3) No

EngineerSkills Table
In a similar manner to the EngineerWorkingAreas table, this table will provide a
link between the engineers and the skills they have. This will be used to ensure
that the engineer sent to a job is equipped to undertake the work.

Column Name Data Type Nullable?


EngineerId Int No
SkillCode Char(3) No

Jobs Table
The Jobs table will hold information relating to all jobs that have been performed
or that are booked as future work. This table is central to the JoBS application.
Each job will be given a unique identifier. There will be links to the standard job
type and the contract under which the work will take place. The contract number
will be used to find the customer's address and other details.
Once a job has been booked with a customer, the engineer and the date of visit
will be recorded. These columns are nullable because they may not be known
when the job has been requested but not yet scheduled.
On completion of a job, the number of hours taken by the engineer will be
recorded along with the charge rate for the time This information will be copied
from the engineer's daily rate but will be stored in this table for historical
purposes, as daily rates may change.
Sometimes a job will not be completed in a single visit. This may be due to a lack
of parts or because the job takes longer than anticipated. In these cases the job
will be marked as requiring a follow up. Once a follow up visit is booked as a new
Job row in the table, a reference to the new JobId will be created so that the
history of a job may be tracked.

Column Name Data Type Nullable?


JobId UniqueIdentifier No

Page 51
StandardJobId Int No
ContractNumber Int No
EngineerId Int Yes
VisitDate DateTime Yes
HoursTaken Real Yes
HourlyRate Money Yes
FollowUpRequired Bit Yes
FollowUpJobId UniqueIdentifier Yes
Notes VarChar(MAX) Yes

Parts Table
The Parts table will hold the list of standard parts that may be used by an
engineer. Each part has a cost that will be used when calculating the total cost of
a job and the profit made. Parts that are not included in the list are not
reimbursed by DisasterFix to the engineer.

Column Name Data Type Nullable?


PartNumber Char(8) No
PartName VarChar(50) No
Cost Money No

UsedParts Table
In order for DisasterFix to determine their costs, the parts used during each job
must be recorded. As multiple parts may be used for a single job, the UsedParts
table will be used in a one-to-many relationship. Each row in this table links a part
number to a job and records the unit cost of the part number and the number of
units used in each case. The unit cost is recorded at the time of the job so that the
historical data is not lost if the current cost changes. The number of units used is
held as a fixed-point number to allow the engineer to specify that they used a
fraction of a unit, for example 1.2 metres of copper piping.

Column Name Data Type Nullable?


JobId UniqueIdentifier No

Page 52
PartNumber Char(8) No
UnitsUsed Decimal(6,2) No
CostPerUnit Money No

EngineerStock Table
Each engineer is expected to hold a stock of parts that they will use during their
day-to-day work. The current stock level for each part held will be stored in the
final table so that replenishment can be made as required. This will also be used
in a report that details the value of stock that an engineer has in their possession.

Column Name Data Type Nullable?


EngineerId Int No
PartNumber Char(8) No
UnitsHeld Decimal(6,2) No

Page 53
Page 54
CHAPTER 6
Basic SQL Server Data Manipulation

JoBS Database Scripts


In the first six instalments of this tutorial we have created a SQL Server database
named "JoBS". This database gives an example of the data storage that may be
required for a simple, real-world job processing application.
If you have not followed the previous articles in the tutorial, you can download a
script to create the tutorial database using the link at the top of this page. Executing
the "CreateJoBS.sql" script will create a new, empty database. If you already have
the database, you will not need to follow this step. However, you may decide to do
so if you have modified the JoBS database and wish to refresh it.

Creating Data
As the JoBS tutorial database currently holds no data, the first task is to generate
some. As with previous SQL Server processes that have been described in this
tutorial, we can add data using SQL Server Management Studio (SSMS). Furthermore,
we can create information using either the graphical interface tools or with Transact-
SQL (T-SQL) commands.

SQL Server Management Studio Data Entry Tools


The simplest way to manually enter small amounts of information is to use the SSMS
data entry tools. To try this, open the "JoBS" database branch of the Object Explorer
tree in SSMS. Expand the "Tables" node to see the list of tables that have been
created. You can now select the table for which you wish to add data rows.
Right-click the "Skills" table and choose "Open Table" from the menu. A grid appears
with one column for each column in the table. You can now simply add and edit rows
in the table by typing.
Add the rows shown below to the table to set up the basic skills that may be
assigned to engineers and jobs. As you tab across the table cells, notice that entering
invalid information, such as a SkillCode with more that three characters, is not
permitted. As you leave each row, the data is committed to the database. If you

Page 55
make an error, you can simply modify the information to update a data row or
highlight the entire row and press the delete key to remove it completely.
Once you have added all four rows, your grid should look similar to the image below.
Click the close button at the top-right of the table's window to close the grid. You can
also select "Close" from the File menu.

NB: The final row showing two NULL values appears in readiness for a fifth data row
to be added. This empty row is not stored in the table.

The INSERT Statement


When developing software, the above graphical user interface-based method of
creating information is useful for test data. However, in the software itself you are
far more likely to want to generate T-SQL commands that create information in
response to user actions. To do so, you can use the INSERT statement.
The simplest form of the INSERT statement is used when adding a full row of data to
a table. The syntax for the command is as follows:
INSERT INTO table-name VALUES (value1, value2, ..., valueN)
The table-name specifies the table that you wish to add a row to. The data to be
created is then provided in a comma-separated list. The list of values must be in the
same order as the columns within the table so that information is not added in the
wrong columns.
We can use this syntax to add a row to the Parts table in the database. This table has
three columns to hold an alphanumeric part number, a description of the part and
the unit cost. Using a new query window in SSMS, run the following command. Note
the use of apostrophes around the data for the first two columns. These delimiters
are required for character-based information.

INSERT INTO Parts VALUES ('15COPIPE', '15mm Copper Pipe', 1.52)

If the command executes correctly, you will see a response that indicates that one
row was affected.
(1 row(s) affected)
You can see the new row in the table by opening the table in the data entry mode
described earlier. Alternatively, you can list the entire contents of the table using a

Page 56
SELECT statement. Try executing the following to see the data in the query window.
NB: The SELECT statement is described in detail in the next article in this series.

SELECT * FROM Parts

Using Named Columns


The simple form of inserting data using the INSERT statement can be problematic. As
the data being stored is not explicitly applied to a column, a change to the structure
of the table could leave existing statements invalid. If, for example, a column is
added or the column order is changed, statements embedded in software would
become incorrect and have to be changed. To avoid this, the column names should
also be included in the command.
INSERT INTO table-name
(column1, column2, ..., columnN)
VALUES
(value1, value2, ..., valueN)
This syntax is more resilient to change than the previous version. An additional
improvement is that not every column must be specified in the statement. If a
column name is excluded, a default value or a null value is stored when creating the
new row. If the column does not permit null values and has no default value an error
will occur.
Let's add another row to the Parts table using the improved syntax:

INSERT INTO Parts

(PartNumber, PartName, Cost)

VALUES

('22COPIPE', '22mm Copper Pipe', 2.97)

Inserting Different Data Types


As we have seen, when inserting character data the information must be provided
within apostrophes whereas numeric information does not require delimiters. Each
data type has different requirements, described below:

Date and Time Data


Date information can be provided in many formats, including versions with named
months, such as '27 July 2008' or numeric formats such as '27/07/2008'. Numeric

Page 57
formats should be used with care, particularly where the application may be used in
more than one country. Depending upon the local settings for the SQL Server, the
format may be incorrectly interpreted. For example, '06/07/08' could be 6 July 2008,
7 June 2008 or 8 July 2006, depending upon the local interpretation. A safer option is
the ISO 8601 format. This specifies that dates be provided as YYYY-MM-DD, eg.
'2008-07-27' for 27 July 2008.
To comply with the ISO standard, the time portion of date and time information
should be supplied in 24-hour format with the hours, minutes and seconds separated
by colons, eg. '16:20:25'. If fractions of seconds are required, these can be included.
You may also decide to omit the seconds or the entire time portion if these are not
required.
To demonstrate the creation of date and time information try the following
statements, which will create complaints. Note the use of apostrophes to delimit the
dates.

INSERT INTO Complaints

(ComplaintReference, CustomerNumber, ComplaintTime, Complaint)

VALUES

(1, 23, '2008-07-23', 'Engineer did not arrive.')

INSERT INTO Complaints

(ComplaintReference, CustomerNumber, ComplaintTime, Complaint)

VALUES

(2, 19, '2008-07-26 14:02:25', 'Leak in radiator still present.')

UniqueIdentifer Data
Globally unique identifiers (GUIDs) are thirty-two digit hexadecimal numbers with
digits organised into five groups of eight, four, four, four and twelve digits
respectively. To insert a specific GUID into a uniqueidentifier column, these groups
must be separated using hyphens (-) and the entire value must be delimited using
apostrophes. For example:

INSERT INTO UsedParts

Page 58
(JobId, PartNumber, UnitsUsed, Cost)

VALUES

('12345678-1234-1234-1234-123456789ABC', '22COPIPE', 3, 2.97)

You can also generate GUIDs automatically when you wish to create new unique
values.
Binary Data
It is usual to insert binary data into a database using a graphical user interface tool or
application that generates the binary information and appropriate scripts in the
background. However, it is possible to provide binary data directly in T-SQL scripts in
SSMS. To do so, the binary is represented as a continuous set of two-digit
hexadecimal bytes. To indicate that this format is being used, the string of digits is
prefixed with "0x", as with the photograph column in the following example:

INSERT INTO Engineers

(EngineerId, EngineerName, HourlyRate, Photograph)

VALUES

(95, 'James Fields', 28.50, 0x1234567890ABCDEF)

Inserting Timestamp Data


If a table has a timestamp column it will be set automatically when a row is inserted
in the table. It will also be updated automatically whenever the row is modified. The
data is this type of column is completely controlled by SQL Server and may not be
explicitly specified in an INSERT statement. If the column is included in the
command, the value applied must be null and will be replaced with a new timestamp
value.

Inserting Null Data


When you wish to indicate that the information in a column is undefined, the column
value should be set to null. This is achieved using the NULL keyword, as follows. NB:
In this case the column would be NULL even if it were excluded from the INSERT
statement as it has no default value.

INSERT INTO Engineers

Page 59
(EngineerId, EngineerName, HourlyRate, Photograph)

VALUES

(96, 'Rachel Moran', 28.50, NULL)

Inserting Apostrophes in Character Data


The final item to be considered in this section returns to character types. As the data
to be inserted is delimited using apostrophes, an apostrophe cannot simply be
included in the text being stored. Instead, the apostrophe to be added to the row
should be included twice in the statement. SQL Server automatically converts the
two apostrophes into a single character in the row. Try executing the following and
then reviewing the table's data to demonstrate this.

INSERT INTO Engineers

(EngineerId, EngineerName, HourlyRate, Photograph)

VALUES

(97, 'Jack O''Leary', 28.25, NULL)

Updating Data
Once you have information in your database you can modify it using the UPDATE
statement. This command requires that you specify the table to be modified
followed by a comma-separated list of the changes that you wish to make to
individual columns. Columns that are to remain unchanged are not included in the
list. The basic syntax of the statement is as follows:
UPDATE table-name
SET column1 = value1,
column2 = value2,
...,
columnN = valueN
Using this syntax, every row in the specified table is updated. For example, to
increase the hourly rate of every engineer by one pound, execute the following
command. Note the use of the column name in a calculation to the right of the
assignment operator (=). The new value for the column can be a literal value or can
be calculated.

Page 60
UPDATE Engineers

SET HourlyRate = HourlyRate + 1

The WHERE Clause


Although the above syntax for updating information is useful, more often you will
want to update a single item or a set of rows that meet a given criteria. This is
achieved by adding a WHERE clause to the statement. The WHERE clause contains a
condition that is checked for every row in the table. If the condition is met, the row is
updated. If not, the row remains unchanged.
To demonstrate the WHERE clause, execute the update statement below. This
statement increases the hourly rate for all engineers that currently have a rate of
£29.50 or greater. If the engineer is paid less than £29.50, their rate remains the
same.

UPDATE Engineers

SET HourlyRate = HourlyRate + 0.25

WHERE HourlyRate >= 29.50

The WHERE clause will be considered in greater detail in the next article in this
series.

Deleting Data
The last T-SQL data manipulation command to consider is DELETE. This command
simply removes rows from a table. As with UPDATE, a DELETE statement can be
executed without a WHERE clause to empty a table, or with a WHERE clause to
delete rows that match specified criteria.
To demonstrate, we will delete all engineers with an hourly rate over £29.50:

DELETE FROM Engineers WHERE HourlyRate > 29.50

NB: Note the use of the "FROM" keyword. This is optional but some SQL developers
prefer to include it.

Page 61
Page 62
CHAPTER 6
BASIC SQL SERVER DATA MANIPULATION

In this chapter we will create, update and delete information in SQL Server
tables.

JoBS Database Scripts


In the first six instalments of this tutorial we have created a SQL Server
database named "JoBS". This database gives an example of the data storage
that may be required for a simple, real-world job processing application.
If you have not followed the previous articles in the tutorial, you can download
a script to create the tutorial database using the link at the top of this page.
Executing the "CreateJoBS.sql" script will create a new, empty database. If
you already have the database, you will not need to follow this step. However,
you may decide to do so if you have modified the JoBS database and wish to
refresh it.
Creating Data
As the JoBS tutorial database currently holds no data, the first task is to
generate some. As with previous SQL Server processes that have been
described in this tutorial, we can add data using SQL Server Management
Studio (SSMS). Furthermore, we can create information using either the
graphical interface tools or with Transact-SQL (T-SQL) commands.
SQL Server Management Studio Data Entry Tools
The simplest way to manually enter small amounts of information is to use the
SSMS data entry tools. To try this, open the "JoBS" database branch of the
Object Explorer tree in SSMS. Expand the "Tables" node to see the list of
tables that have been created. You can now select the table for which you
wish to add data rows.
Right-click the "Skills" table and choose "Open Table" from the menu. A grid
appears with one column for each column in the table. You can now simply
add and edit rows in the table by typing.
Add the rows shown below to the table to set up the basic skills that may be
assigned to engineers and jobs. As you tab across the table cells, notice that
entering invalid information, such as a SkillCode with more that three
characters, is not permitted. As you leave each row, the data is committed to
the database. If you make an error, you can simply modify the information to
update a data row or highlight the entire row and press the delete key to
remove it completely.

Page 63
Once you have added all four rows, your grid should look similar to the image
below. Click the close button at the top-right of the table's window to close the
grid. You can also select "Close" from the File menu.

NB: The final row showing two NULL values appears in readiness for a fifth
data row to be added. This empty row is not stored in the table.
The INSERT Statement
When developing software, the above graphical user interface-based method
of creating information is useful for test data. However, in the software itself
you are far more likely to want to generate T-SQL commands that create
information in response to user actions. To do so, you can use the INSERT
statement.
The simplest form of the INSERT statement is used when adding a full row of
data to a table. The syntax for the command is as follows:
INSERT INTO table-name VALUES (value1, value2, ..., valueN)
The table-name specifies the table that you wish to add a row to. The data to
be created is then provided in a comma-separated list. The list of values must
be in the same order as the columns within the table so that information is not
added in the wrong columns.
We can use this syntax to add a row to the Parts table in the database. This
table has three columns to hold an alphanumeric part number, a description of
the part and the unit cost. Using a new query window in SSMS, run the
following command. Note the use of apostrophes around the data for the first
two columns. These delimiters are required for character-based information.

INSERT INTO Parts VALUES ('15COPIPE', '15mm Copper Pipe', 1.52)


If the command executes correctly, you will see a response that indicates that
one row was affected.
(1 row(s) affected)
You can see the new row in the table by opening the table in the data entry
mode described earlier. Alternatively, you can list the entire contents of the
table using a SELECT statement. Try executing the following to see the data
in the query window. NB: The SELECT statement is described in detail in the
next article in this series.

Page 64
SELECT * FROM Parts

Using Named Columns


The simple form of inserting data using the INSERT statement can be
problematic. As the data being stored is not explicitly applied to a column, a
change to the structure of the table could leave existing statements invalid. If,
for example, a column is added or the column order is changed, statements
embedded in software would become incorrect and have to be changed. To
avoid this, the column names should also be included in the command.
INSERT INTO table-name
(column1, column2, ..., columnN)
VALUES
(value1, value2, ..., valueN)
This syntax is more resilient to change than the previous version. An
additional improvement is that not every column must be specified in the
statement. If a column name is excluded, a default value or a null value is
stored when creating the new row. If the column does not permit null values
and has no default value an error will occur.
Let's add another row to the Parts table using the improved syntax:

INSERT INTO Parts

(PartNumber, PartName, Cost)

VALUES

('22COPIPE', '22mm Copper Pipe', 2.97)

Inserting Different Data Types


As we have seen, when inserting character data the information must be provided
within apostrophes whereas numeric information does not require delimiters. Each
data type has different requirements, described below:
Date and Time Data
Date information can be provided in many formats, including versions with named
months, such as '27 July 2008' or numeric formats such as '27/07/2008'. Numeric
formats should be used with care, particularly where the application may be used in
more than one country. Depending upon the local settings for the SQL Server, the
format may be incorrectly interpreted. For example, '06/07/08' could be 6 July 2008, 7
June 2008 or 8 July 2006, depending upon the local interpretation. A safer option is
the ISO 8601 format. This specifies that dates be provided as YYYY-MM-DD, eg.
'2008-07-27' for 27 July 2008.

Page 65
To comply with the ISO standard, the time portion of date and time information
should be supplied in 24-hour format with the hours, minutes and seconds separated
by colons, eg. '16:20:25'. If fractions of seconds are required, these can be included.
You may also decide to omit the seconds or the entire time portion if these are not
required.
To demonstrate the creation of date and time information try the following statements,
which will create complaints. Note the use of apostrophes to delimit the dates.

INSERT INTO Complaints

(ComplaintReference, CustomerNumber, ComplaintTime, Complaint)

VALUES

(1, 23, '2008-07-23', 'Engineer did not arrive.')

INSERT INTO Complaints

(ComplaintReference, CustomerNumber, ComplaintTime, Complaint)

VALUES

(2, 19, '2008-07-26 14:02:25', 'Leak in radiator still present.')

UniqueIdentifer Data
Globally unique identifiers (GUIDs) are thirty-two digit hexadecimal numbers with
digits organised into five groups of eight, four, four, four and twelve digits
respectively. To insert a specific GUID into a uniqueidentifier column, these groups
must be separated using hyphens (-) and the entire value must be delimited using
apostrophes. For example:

INSERT INTO UsedParts

(JobId, PartNumber, UnitsUsed, Cost)

VALUES

('12345678-1234-1234-1234-123456789ABC', '22COPIPE', 3, 2.97)


You can also generate GUIDs automatically when you wish to create new unique
values.
Binary Data
It is usual to insert binary data into a database using a graphical user interface tool or
application that generates the binary information and appropriate scripts in the

Page 66
background. However, it is possible to provide binary data directly in T-SQL scripts
in SSMS. To do so, the binary is represented as a continuous set of two-digit
hexadecimal bytes. To indicate that this format is being used, the string of digits is
prefixed with "0x", as with the photograph column in the following example:

INSERT INTO Engineers

(EngineerId, EngineerName, HourlyRate, Photograph)

VALUES

(95, 'James Fields', 28.50, 0x1234567890ABCDEF)

Inserting Timestamp Data


If a table has a timestamp column it will be set automatically when a row is inserted in
the table. It will also be updated automatically whenever the row is modified. The data
is this type of column is completely controlled by SQL Server and may not be
explicitly specified in an INSERT statement. If the column is included in the
command, the value applied must be null and will be replaced with a new timestamp
value.
Inserting Null Data
When you wish to indicate that the information in a column is undefined, the column
value should be set to null. This is achieved using the NULL keyword, as follows.
NB: In this case the column would be NULL even if it were excluded from the INSERT
statement as it has no default value.

INSERT INTO Engineers

(EngineerId, EngineerName, HourlyRate, Photograph)

VALUES

(96, 'Rachel Moran', 28.50, NULL)

Inserting Apostrophes in Character Data


The final item to be considered in this section returns to character types. As the data to
be inserted is delimited using apostrophes, an apostrophe cannot simply be included in
the text being stored. Instead, the apostrophe to be added to the row should be
included twice in the statement. SQL Server automatically converts the two
apostrophes into a single character in the row. Try executing the following and then
reviewing the table's data to demonstrate this.

Page 67
INSERT INTO Engineers

(EngineerId, EngineerName, HourlyRate, Photograph)

VALUES

(97, 'Jack O''Leary', 28.25, NULL)

Updating Data
Once you have information in your database you can modify it using the UPDATE
statement. This command requires that you specify the table to be modified followed
by a comma-separated list of the changes that you wish to make to individual
columns. Columns that are to remain unchanged are not included in the list. The basic
syntax of the statement is as follows:
UPDATE table-name
SET column1 = value1,
column2 = value2,
...,
columnN = valueN
Using this syntax, every row in the specified table is updated. For example, to
increase the hourly rate of every engineer by one pound, execute the following
command. Note the use of the column name in a calculation to the right of the
assignment operator (=). The new value for the column can be a literal value or can be
calculated.

UPDATE Engineers

SET HourlyRate = HourlyRate + 1

The WHERE Clause


Although the above syntax for updating information is useful, more often you will
want to update a single item or a set of rows that meet a given criteria. This is
achieved by adding a WHERE clause to the statement. The WHERE clause contains a
condition that is checked for every row in the table. If the condition is met, the row is
updated. If not, the row remains unchanged.
To demonstrate the WHERE clause, execute the update statement below. This
statement increases the hourly rate for all engineers that currently have a rate of
£29.50 or greater. If the engineer is paid less than £29.50, their rate remains the same.

Page 68
UPDATE Engineers

SET HourlyRate = HourlyRate + 0.25

WHERE HourlyRate >= 29.50


The WHERE clause will be considered in greater detail in the next article in this
series.
Deleting Data
The last T-SQL data manipulation command to consider is DELETE. This command
simply removes rows from a table. As with UPDATE, a DELETE statement can be
executed without a WHERE clause to empty a table, or with a WHERE clause to
delete rows that match specified criteria.
To demonstrate, we will delete all engineers with an hourly rate over £29.50:

DELETE FROM Engineers WHERE HourlyRate > 29.50


NB: Note the use of the "FROM" keyword. This is optional but some SQL developers
prefer to include it.

Page 69
Page 70
CHAPTER 7:
Basic T-SQL Queries
This chapter describes the use of the SELECT statement of the structured query
language. This command allows the creation of database queries that return table
rows that meet specified criteria.

Querying Data
The power of a database management system such as SQL Server is the ability to
create both simple and complex queries. A query interrogates one or more tables,
finding all of the rows that match the specified criteria. The rows are gathered into a
set of results, formatted into rows and columns, which are returned to the process that
performed the query.
Querying of data tables in SQL Server is achieved using the SELECT statement. The
command appears deceptively simple, allowing basic queries to be generated very
quickly. However, its flexibility permits the creation of very complex queries that
combine information from many tables that match detailed criteria. It allows
calculations to be performed upon the data as it is gathered, including row-by-row
calculations and aggregation of information based upon groups of rows.
In this article we will concentrate on simple querying of a single table. More advanced
querying techniques that join the results of multiple tables, calculate aggregate
information and perform grouping will be described in later articles.
This article's examples require that you are using the JoBS tutorial database and that
the database is correctly populated with information. If you have not already created
this database, download the script from the link at the top of the page and use it to
create the database and sample data. If you have used the tutorial database, you should
drop the old version and replace it using the downloaded script to ensure that the data
is correct.
The SELECT Statement
The SELECT statement can include a variety of clauses that control the behaviour of
the command and the results that are returned. The most basic form of the statement
simply returns literal values that are not sourced from a table. These values are
provided after the command as a comma-separated list.
Try executing the following command in a SQL Server Management Studio query
window:

SELECT 1, 2, 3.4, 'Hello'


The results area shows a row of data containing the four values specified. The value of
such a statement may not immediately be apparent. However, there are various
reasons that you may wish to select a fixed row of data or include literal values within

Page 71
other styles of query. For example, when combining the results of multiple queries
you may want to add a predefined row or a column categorising the results according
to the query from which they originated.
Selecting Data From a Table
When selecting information from a table, the name of the table must be provided. This
is specified using the FROM clause. You must also provide details of which columns
to return from the database. If you wish to return data from every column in the table,
you can use an asterisk (*) as shown below. This query retrieves every column of each
of the twenty-five rows in the Engineers table.

SELECT * FROM Engineers


NB: Note that the results are returned as rows and columns, in a similar format to the
storage of information in a table.
Specifying a Column List
Although using the previous syntax to retrieve all of the columns for a query is useful
for testing and maintenance tasks, it is not best practise for production systems. The
main problem is that changes to the underlying structure of the table mean that the
results may change. If the software using the results assumes the position of the
columns, this could lead to incorrect data being used. A second issue is that the query
may return many columns that are not required by the calling function. In this case,
unnecessary network traffic is generated and the query's performance is impacted.
A more effective and efficient use of the SELECT statement is with the inclusion of a
comma-separated column list in place of the asterisk. In this case, only the columns
that are required are retrieved from the database and passed back to the function that
initiated the query.
The following statement retrieves only the unique identifier, name and hourly rate for
engineers. As the other column in the table holds binary data for a photograph of the
engineer, omitting it from the list when it is not needed could vastly reduce the
network traffic and time required to return the data.

SELECT EngineerId, EngineerName, HourlyRate FROM Engineers


When using a column list, the information does not need to be ordered in the same
manner as the underlying table. If you change the column list, the results are modified
accordingly.

SELECT HourlyRate, EngineerName, EngineerId FROM Engineers

Page 72
The JoBS tutorial database does not include any columns that include spaces in their
names. If you are using a database with columns that have names that include spaces,
the name in the column list must be delimited using square brackets. For example:

SELECT [Engineer Name] FROM Engineers


NB: This notation can be used elsewhere in SQL Server where names include spaces.
The brackets can even be included where spaces are not present.
Using Column Aliases
Sometimes you will want to change the names of the columns in the set of results. To
do so, a column alias can be provided in the column list using the AS keyword. The
AS keyword appears after the real column name and before the alias. In the following
example, all three columns have aliases, one including a space.

SELECT

EngineerId AS Id,

EngineerName AS [Engineer Name],

HourlyRate AS Cost

FROM

Engineers

Limiting the Result Set Size


If a query will return a very large number of results, you may decide to limit the result
set size. By adding the TOP clause immediately after the SELECT keyword you can
specify the maximum number of rows to return. This is particularly useful when
working with ordered results as the sorting occurs before the result set is limited. The
sample below limits the number of engineers returned to ten.
NB: The parentheses around the number 10 are optional for backward compatibility.
Microsoft advises that these parentheses are included.

SELECT TOP (10) EngineerId, EngineerName, HourlyRate FROM Engineers


The row limitation can be expressed as a percentage, rather than a discrete value, by
adding the PERCENT clause. If the exact percentage is not an integer, the number of
rows will be rounded upwards. For example, in the following statement ten percent of
the results would be 2.5 rows. This is rounded and three rows are returned.

Page 73
SELECT TOP (10) PERCENT EngineerId, EngineerName, HourlyRate FROM Engineers

Eliminating Duplicates
In all of the above examples, each use of the SELECT statement has resulted in the
return of every row. Had there been duplicate rows in the tables, you would have seen
duplicate rows in the results. We can demonstrate this by switching to a different
table. In this case, we will use the EngineerWorkingAreas table to find the list of areas
that are serviced by DisasterFix.

SELECT AreaCode FROM EngineerWorkingAreas


As the table simply links engineers to their working areas, the results include
duplicates. For example, "CEN" appears four times in the output as several engineers
work in this area. If you want to eliminate these duplicates from the results, you can
use the DISTINCT clause before the column list. The previous statement returned all
sixty-two rows of data; the modified version below only returns the forty-eight unique
rows.

SELECT DISTINCT AreaCode FROM EngineerWorkingAreas


The DISTINCT clause operates on the columns that are included in the column list. If
multiple columns are provided, the combination of data must be unique. When using
the clause to remove duplicates, null values are considered equal.
Using Expressions
As we have seen, literal values can be present in the column list of a query. In
addition, expressions can be included to perform calculations and various operations
using literal values and column names as their operands. A review of all of the
operators and functions available is beyond the scope of this article; some of the more
important functions will be described later in the tutorial. In this article we will review
some simple arithmetic and concatenation operators.
To perform a calculation using the arithmetic operators (+ - * /), the expression is
simply included in the column list. Usually you will specify an alias for the calculated
column, as no name will be provided by default. The statement below calculates the
total cost of the parts used on all jobs by multiplying the unit cost by the number of
parts used.

SELECT PartNumber, UnitsUsed, Cost, UnitsUsed * Cost AS ExtendedCost FROM


UsedParts
Another useful operation is the concatenation of character values. This is achieved
using the concatenation operator (+). The text to be combined can be sourced from

Page 74
columns or can include literal strings surrounded by apostrophes, as in the next
sample, which combines the part names and numbers from the Parts table.

SELECT PartName + ' (' + PartNumber + ')' FROM Parts

Type Casting
Operations may require that the values involved be of specific types. If the column
values are not of the correct type, they can be changed using the CAST function. This
function has two parameters. The first is the value to be converted and the second is
the new type, including a size where appropriate.
A simple example of the function is when concatenating text. For this combination to
work correctly, the two operands must be character types. Concatenating numeric
values is not permitted unless they are first cast to fixed or variable length character
fields. The following statement generates a full parts list including a formatted price.
The price is held in a money column that must be cast before concatenation with the
part name.

SELECT PartName + ' (£' + CAST(Cost AS VARCHAR(7)) + ')' FROM Parts


NB: The current and desired types must be compatible in order for the cast operation
to be successful. A full matrix of the possible conversions can be found on the Cast
and Convert page of the MSDN web site.
Filtering Result Sets
The real power and flexibility of basic queries is provided by the ability to filter the
result set, only retrieving rows that match a given set of criteria. To filter the results,
one or more predicates must be provided in a WHERE clause. A predicate is an
expression that can be evaluated to true, false or unknown. When the combined value
of predicates is true for a row, the row is included in the returned results. If the
evaluation is false or unknown, the row is omitted from the results.
Comparison Operators
Predicates can be created using comparison operators that require two operands. The
six basic operators are as follows:

Operator True Condition


= The two operands are equal.
!= The two operands are not equal.
> The first operand is greater than the second.
>= The first operand is greater than or equal to the second.
< The first operand is less than the second.
<= The first operand is less than or equal to the second.

Page 75
Let's try some of these operators using the Engineers table. Firstly, we can find all of
the engineers that have an hourly rate of exactly £20.00. There should be five such
engineers.

SELECT

EngineerId AS Id,

EngineerName AS [Engineer Name],

HourlyRate AS Cost

FROM

Engineers

WHERE

HourlyRate = 20
To find all of the engineers who earn £20.00 per hour or less, try this query:

SELECT

EngineerId AS Id,

EngineerName AS [Engineer Name],

HourlyRate AS Cost

FROM

Engineers

WHERE

HourlyRate <= 20

Finding Values in a Range


Another interesting method for building a predicate is using the BETWEEN clause.
This allows an inclusive range of values to be specified. When evaluated for a row, if
the value appears within the range then the row is included in the results. The
following statement retrieves all of the customers that were created within a specific
date range.

Page 76
SELECT

FirstName,

LastName,

CreatedDate

FROM

Customers

WHERE

CreatedDate BETWEEN '2007-12-08' AND '2008-01-15'


NB: When using dates without specifying the time element, the time is assumed to be
12:00am. This means that in the previous query, if a customer were created on 15
January 2008 at any time other than midnight the row would be excluded from the
results. To include such a row the end of the range should be '2008-01-15
23:59:59.997' or comparison operators should be used instead of BETWEEN.
Comparing Values Against a List
Sometimes you will want to determine if a value matches an item from a list. This is
possible using the IN clause and a comma-separated list of values in parentheses. If
we want to retrieve a list of the ID's of engineers that service the Humberside or
Yorkshire areas, we can use the information in the EngineerWorkingAreas table and
the following query:

SELECT DISTINCT

EngineerId

FROM

EngineerWorkingAreas

WHERE

AreaCode IN ('HUM', 'NYO', 'SYO', 'WYO')

Performing Wildcard Searches


When working with character columns it is possible to perform pattern-matching
queries using the LIKE keyword and a pattern containing wildcard characters and
escape sequences. The column or expression to be evaluated is positioned to the left
of the LIKE keyword and the pattern to be compared is placed to the right.

Page 77
There are several wildcards that can be included in the pattern. The first is a
percentage sign (%), which indicates that the value can include any number of any
characters in place of the wildcard, including no characters at all. This is useful for
"begins with", "ends with" or "contains" queries. For example, executing the
following statement retrieves all customers with a last name that starts with the letter
"M".

SELECT

FirstName,

LastName,

BusinessName

FROM

Customers

WHERE

LastName LIKE 'M%'


The underscore (_) wildcard in a pattern is used to represent any single character. In
the following example, four underscores are used to return customers with a last name
that is exactly five characters long and ends with an "s".

SELECT

FirstName,

LastName,

BusinessName

FROM

Customers

WHERE

LastName LIKE '____s'


Sometimes checking for any character in the pattern is not desired. Instead, you may
wish to check that a particular character position contains a single character from a
list. In these cases, you can use a pair of square brackets containing each acceptable
character. When the predicate is evaluated it returns true only if the character at the

Page 78
position exists within the brackets. The following statement demonstrates this by
finding customers with a first name of "Dita" or "Rita".

SELECT

FirstName,

LastName,

BusinessName

FROM

Customers

WHERE

FirstName LIKE '[RD]ita'


The list of characters within the square brackets can be expressed as a range, rather
than a discrete set. This is achieved by specifying the start and end of the range with a
hyphen between the two extremes.

SELECT

FirstName,

LastName,

BusinessName

FROM

Customers

WHERE

FirstName LIKE '[D-R]ita'


The list or range of values can also be negated using a caret symbol (^). In this case,
the wildcard specifies that the character at the given position must not be within the
stated list or range. In the following example engineers are included in the result set
only if their name does not begin with any letter between "A" and "M".

SELECT

EngineerName AS [Engineer Name],

Page 79
HourlyRate AS Cost

FROM

Engineers

WHERE

EngineerName LIKE '[^A-M]%'


One final use of the square bracket syntax is when you wish to find patterns that
include one of the wildcard characters. To search for a string that includes a
percentage symbol, for example, you would use a pattern such as "%[%]%". The outer
percentage signs are wildcards but the inner one is interpreted as a literal character.

The NOT Operator


The NOT operator is a useful keyword that negates the value of a predicate. This can
make it much simpler to express some conditions in a where clause, allowing the
SELECT statement to be more readable. The operator is placed to the left of a
predicate to reverse its meaning. The predicate can, if preferred, be contained within
parentheses.
The statement below provides the opposite result to an earlier example. The predicate
alone would cause the command the return all of the engineers earning £20 or less per
hour. With the introduction of the NOT keyword, the statement returns only those
engineers earning more than this amount.

SELECT

EngineerId AS Id,

EngineerName AS [Engineer Name],

HourlyRate AS Cost

FROM

Engineers

WHERE

NOT HourlyRate <= 20


The NOT keyword is sometimes included in an operator directly. An example of this
is with the IN keyword. This can be negated using the NOT IN variant, as shown
below.

Page 80
SELECT DISTINCT

EngineerId

FROM

EngineerWorkingAreas

WHERE

AreaCode NOT IN ('HUM', 'NYO', 'SYO', 'WYO')


I mentioned earlier that predicates can return one of three possible values, these being
true, false and unknown. When using NOT, an expression that evaluates to true
becomes false. A false value becomes true. We will shortly see predicates that return
unknown. It is important to understand that when the NOT operator is used with an
unknown value, the result remains unknown.
Checking for Null Values
When a column value or an expression evaluates as NULL, this indicates that the
value is undefined or empty. This is different from a value of zero or an empty string
as every null value should be considered to be a different value. Any comparison of a
value with NULL or of two NULL values returns a result of "unknown".
The result of this type of comparison has some effects that are unusual but that must
be carefully anticipated. We can query the Customers table to demonstrate the
potential pitfalls. This table contains twenty rows. Of these, only three represent
business customers, so the BusinessName column in the other seventeen rows
contains NULL.
If we execute a query that returns all of the rows where the business name is "Blue-
Green ltd", we find that only one row matches the predicate.

SELECT

FirstName,

LastName,

BusinessName

FROM

Customers

WHERE

BusinessName = 'Blue-Green ltd'

Page 81
Developers who are new to SQL Server may expect that changing this query to use
the "not equal" operator, as shown below, would reverse the results. They may expect
to see the other nineteen rows. However, as the comparison returns unknown for all
comparisons with NULL, only two rows are included in the result set. This means that
two apparently opposite queries do not return every row when combined.

SELECT FirstName,

LastName,

BusinessName

FROM

Customers

WHERE

BusinessName != 'Blue-Green ltd'


The same effect would be seen when using the NOT operator. If the predicate returns
unknown because of a comparison with NULL, negating the result still returns an
unknown value so the row is excluded from the results.
IS NULL and IS NOT NULL
If you wish to test if a column or expression is null, you can use the IS NULL
keyword, rather than another comparison operator. To retrieve all of the customers
that have no business name specified, execute the following statement:

SELECT

FirstName,

LastName,

BusinessName

FROM

Customers

WHERE

BusinessName IS NULL
The opposite of "IS NULL" is "IS NOT NULL". This evaluates to true only when the
column value does not contain a null value. Therefore, you can find all of the
customers that are associated with businesses by executing the following.

Page 82
SELECT

FirstName,

LastName,

BusinessName

FROM

Customers

WHERE

BusinessName IS NOT NULL

Using Multiple Criteria


So far in this article we have only considered WHERE clauses with a single predicate.
To expand the flexibility of the criteria in a query, you can add as many predicates as
required to obtain the desired results. To do so, you use the AND and the OR
operators.
AND Operator
The AND operator combines two predicates using a logical AND operation. However,
as the predicates in a query can be evaluated as true, false or unknown, the results are
not as simple as for pure Boolean logic. The table below shows the possible outcomes
of combining these values. Remember that rows are returned if the outcome is true
and omitted from the result set if false or unknown.
Operand 1 Operand 2 Result
true true true
true false false
false true false
false false false
true unknown unknown
false unknown false
unknown true unknown
unknown false false
unknown unknown unknown
Let's use the AND keyword to perform a SELECT that requires that two predicates
return true. In this case, we will obtain all of the rows in the Customers table where
the customer is not linked to a business and was created on or after 1 January 2008.

Page 83
SELECT

FirstName,

LastName,

BusinessName,

CreatedDate

FROM

Customers

WHERE

BusinessName IS NULL

AND

CreatedDate >= '2008-01-01'

OR Operator
The OR operator combines predicates using a logical OR operation. This allows you
to build a result set containing results that meet at least one of several criteria. Again,
consideration must be given to the effects of a predicate being unknown.
Operand 1 Operand 2 Result
true true true
true false true
false true true
false false false
true unknown true
false unknown unknown
unknown true true
unknown false unknown
unknown unknown unknown
We can modify the previous query to use the OR operator instead of AND. This time,
we will obtain all of the rows in the Customers table where the customer is not linked
to a business or where the customer was created on or after 1 January 2008.

Page 84
SELECT

FirstName,

LastName,

BusinessName,

CreatedDate

FROM

Customers

WHERE

BusinessName IS NULL

OR

CreatedDate >= '2008-01-01'

Combining AND and OR


Although the examples shown above only use two predicates in their criteria, you can
add as many predicates as required. This may include the use of AND and OR
operators in the same WHERE clause. In these cases, it is important to know the order
in which these operators will be considered. SQL Server provides a simple rule to
determine the order. Firstly, all of the AND operations are evaluated. Afterwards, the
OR operations are processed to obtain the final result. If you wish to modify this
order, you can include parentheses:

SELECT

FirstName,

LastName,

BusinessName,

CreatedDate

FROM

Customers

WHERE

Page 85
BusinessName IS NOT NULL

AND

(FirstName LIKE 'A%' OR CreatedDate >= '2008-01-01')

Sorting Data
The final clause to be considered in this article is ORDER BY. When added to the end
of a SELECT statement, the clause specifies the columns that should be used to sort
the results of the query. The results may be sorted by a single column or by multiple
columns specified in a comma-separated list. For example, we can sort the customers
by business name and then last name using the following query. If the business names
match, the matching rows are sorted by last name. NB: Note that NULL values appear
first in the list.

SELECT

FirstName,

LastName,

BusinessName,

CreatedDate

FROM

Customers

ORDER BY

BusinessName,

LastName
Adding the DESC keyword to one or more of the columns reverses the sort order.
This time, we will sort the customers by business name in descending order. For
duplicate business names, we will sort by first name in ascending order. NB: Note the
use of the ASC keyword for ascending order sorting. This is optional as ascending
order is the default.

SELECT

FirstName,

Page 86
LastName,

BusinessName,

CreatedDate

FROM

Customers

ORDER BY

BusinessName DESC,

LastName ASC

Using TOP and ORDER BY


A very useful technique is to combine the use of ORDER BY with the use of the TOP
clause. This allows the creation of queries such as the following, which retrieves the
newest nine customers based on the CreatedDate column value.

SELECT TOP 9

FirstName,

LastName,

BusinessName,

CreatedDate

FROM

Customers

ORDER BY

CreatedDate DESC
This does cause an interesting effect. In the case of the above query, the ninth and
tenth newest customers actually have the same CreatedDate value. In the query, SQL
Server arbitrarily selects one of these for inclusion in the results and discards the
other. This may be acceptable in some cases but often you will want to retrieve
additional results where such duplication exists. This is possible if you add the WITH
TIES clause to the TOP keyword. Now if there are duplicates that would normally be
discarded, they are included in the result set.

Page 87
The final sample statement shows the WITH TIES clause. Although the top nine
results are requested, a duplicate means that the query returns ten rows.

SELECT TOP 9 WITH TIES

FirstName,

LastName,

BusinessName,

CreatedDate

FROM

Customers

ORDER BY

CreatedDate DESC

Page 88
CHAPTER 8
SQL Server Primary Keys
This chapter explains the use and creation of primary keys. A primary key is a
column, or group of columns, within a database table that provides a unique
reference for each row added to that table.

What is a Primary Key?


A primary key provides a unique reference for every row in a table. A primary key is
defined in terms of a single column or a group of columns in the table. A primary key
with more than one column is known as a composite key. Once created, every row
within the table must have a different value in the column, or a different combination
of values across composite key columns.
A primary key is a special type of unique index or unique constraint. A unique index
indicates that a single column, or a combination of columns, in a table must contain
only unique values. You may create several unique constraints in a table to ensure that
different elements of the data have uniqueness. However, you may only define one
primary key for a table. Another important difference is that whilst a unique index
may contain nullable columns, a primary key cannot.
In addition to defining a unique reference for every row, a primary key adds a
clustered index to the table. This special type of index controls the physical ordering
of information within the table, so that the data is sorted according to the values in the
primary key columns. This vastly improves the performance of queries that select data
according to the primary key column values. NB: Although the primary key index is
clustered by default, the clustering can be removed and applied to another index. This
will be described in a later article in this tutorial.
An important use of primary keys in relational databases is for the creation of "one-to-
many" relationships between tables. In most cases, the primary key is used as the
unique reference that is specified in the linked table, by means of a foreign key.

Natural Keys
A natural key is a primary key that uses naturally unique information from the table in
its constituent columns. Many database developers prefer this type of key as the use of
naturally unique data makes reading the raw data in linked tables easier. In the Parts
table below, the PartNumber column provides a naturally unique value that is an ideal
candidate for a primary key.

PartNumber PartName UnitCost

Page 89
15COPIPE 15mm Copper Pipe 1.52
22COPIPE 22mm Copper Pipe 2.97
10MICCOP 10mm Microbore Copper Tube 1.17
25STPTAP 25x25mm Stop Tap 5.99
This table defines a set of parts that engineers may use when performing jobs for
customers. With the part number being used as the primary key, a related table can be
defined that holds details of the stock that engineers have in their possession. In the
EngineerStock table below, the PartNumber column provides the link to the Parts
table and is easily readable by a developer who understands the data. For example, we
can see immediately that the engineer with an EngineerId of "1" has 18 units of 15mm
copper pipe.

EngineerId PartNumber UnitsHeld


1 15COPIPE 18
1 22COPIPE 15
2 22COPIPE 9
2 25STPTAP 1
Selection Problems
There can be problems when trying to select naturally unique values for a primary
key. These occur when the unique information in the table does not meet the
following guidelines and rules:

 Primary key data should be immutable. This means that once set, the information
should never be changed. Although it is possible to modify the information in a
primary key column, it is inadvisable. This is because the change would need to
be propagated to every linked table to maintain referential integrity. This usually
prevents the use of data such as a car's registration number as the primary key; if
a car's registration number is changed, the primary key information would need
to be updated.

 The natural key must be unique for every row in the table and must not include
nullable information. This would prevent the use of a person's name as a primary
key as names are not unique. It is also possible that a person's name changes
through marriage or a legal name change, breaking the first guideline.

 The information within the primary key columns should be as compact as


possible. If the column sizes in the primary key are large, the number of index
entries that can be held in memory simultaneously is reduced. Large keys causes
inefficiencies in index access and input/output, reducing the performance of your
database. In a flight school database, the combination of pilot reference number,

Page 90
co-pilot reference number, aircraft registration and departure date and time may
be unique but could be considered too large for use in a primary key.

 SQL Server does not permit the use of large object columns in primary keys. If
you are holding binary file data in a database it may be unique but it will not be
possible to use it in a primary key.

Surrogate Keys
When no suitable natural key exists for a table, a surrogate key can be used instead. A
surrogate key is generally a single-column primary key containing a unique value.
Often the value is generated automatically. The use of such a key guarantees a unique
and immutable value for every row in the table. However, it can be more difficult to
read the information in a table when used in a foreign key.
If the previously described Parts table were updated to include an integer-based
surrogate key, the data may appear as follows:
PartsKey PartNumber PartName UnitCost
1 15COPIPE 15mm Copper Pipe 1.52
2 22COPIPE 22mm Copper Pipe 2.97
3 10MICCOP 10mm Microbore Copper Tube 1.17
4 25STPTAP 25x25mm Stop Tap 5.99
If the new surrogate key is used in relationship in the EngineerStock table, the stock
data is more difficult to read. To determine the number of units of 15mm copper pipe,
the data in both tables must be compared.
EngineerId PartsKey UnitsHeld
1 1 18
1 2 15
2 2 9
2 4 1
Surrogate keys add a small number of bytes to the storage requirement for every row
in the table in which they are defined. This can increase the overall size of the
database. However, if the size of the surrogate key is small, this size increase may be
offset by the decrease in size of related tables, when compared to the use of larger
natural keys.
Surrogate keys are often defined as integers or globally unique identifiers (GUIDs). In
the case of integers, the size of integer column selected is dependant upon the
anticipated number of rows that will be required in the table. The number usually

Page 91
starts at one and increments by one for each row added to the table. This is ideal for
databases that are only used in on-line scenarios. If a database includes
synchronisation for offline use, a GUID may be more appropriate to reduce the risk of
duplicated values being created during disconnected use.
Creating a Primary Key
Primary keys can be added to tables using various methods. In this article we will
concentrate on two options: adding a primary key using the SQL Server Management
Studio graphical user interface tools and creating a key using Transact-SQL
statements.
The examples below add primary keys to all of the tables in the JoBS database that we
have created over the course of the tutorial. If you have not followed the tutorial,
download the script using the link at the top of this page and use it to create the
database and some sample data. You should also use this script to recreate the
database if you have followed the tutorial so far, as is includes modifications to the
schema and data not described in the previous article.
Creating a Primary Key Using the SQL Server Management Studio
During development of a new database, SQL Server Management Studio (SSMS)
tools can be used to create primary keys. To demonstrate, we will add a primary key
to the Engineers table. To begin, open SSMS and expand the Object Explorer so that
you can see the Engineers table. Right-click the table name and choose "Design" to
open the table designer.
To add a single-column primary key, you simply select the column in question and
choose "Set Primary Key" from the Table Designer menu. You can also use the Set
Primary Key toolbar button. To try this, ensure that the EngineerId column is selected
by clicking the first row selection button. Use the menu option or toolbar button to set
the primary key. You will see a small key icon alongside the column definition.
NB: The EngineerId column provides a surrogate key as this information has been
added to give a unique reference.

If you mistakenly add a primary key, you can remove it using the same menu option
or toolbar button. You will notice that the description of the menu item, or the tool tip
for the button, is updated to show that they will remove the primary key. Once the key
is correct, save the changes to the table design and close the table designer window.
Now that the Engineers table has a primary key, it is impossible to create rows with
duplicates in the EngineerId column. To see what happens when you try, open a new
query window for the JoBS database and execute the following T-SQL statement:

Page 92
INSERT INTO Engineers VALUES (1, 'Bill Jones', 19.85, null)
The primary key prevents the new row being created and reports an error. NB: Note
the default name of PK_Engineers that has been applied to the new primary key
constraint and the error includes a multipart name.
Violation of PRIMARY KEY constraint 'PK_Engineers'. Cannot insert duplicate
key in object
'dbo.Engineers'. The statement has been terminated.
If you review the contents of the table you will see that the new row was not stored.
Creating a Composite Key Using the Table Designer
A composite key is a primary key containing more than one column. Once defined,
the combined value of all of the primary key columns must be unique but the
individual columns may include duplicates. This is the type of key that is required for
the EngineerStock table. In this table the EngineerId may be repeated, as may the
values in the PartNumber column. However, we want to restrict the data so that the
combination of these columns is always unique. This will ensure that there is only one
stock figure per engineer and part.
In the table designer for the EngineerStock, select both the EngineerId and
PartNumber column. To do this, select the EngineerId column first and then click the
row selector button for the PartNumber column whilst holding the Ctrl key. Once both
columns are highlighted, click the Set Primary Key toolbar button or menu item. A
key icon will appear alongside both columns to indicate that the primary key has been
set. You can now save the table and close the designer window.

Creating a Primary Key Using Transact-SQL


Primary keys can be generated using T-SQL commands, either at the time of creation
of the table or at a later date. To create a constraint when creating a table, the
CONSTRAINT clause is used within the parentheses containing the column
specifications. To create a simple primary key with a clustered index, use the
following syntax:
CONSTRAINT constraint-name PRIMARY KEY CLUSTERED (column1, column2, ..., columnN)
The constraint-name provides a unique name for the primary key. This name must be
a unique name for the entire set of database objects rather than just being unique
amongst the primary keys. The comma-separated column list specifies which columns
make up the primary key.

Page 93
NB: This is a simplified syntax for the creation of keys with default settings.
Additional settings related to the primary key's index exist and will be examined in a
later article describing indexes.
To use the above syntax within the creation of the Engineers table, you could execute
the following statement:

CREATE TABLE Engineers

EngineerId INT NOT NULL,

EngineerName VARCHAR(50) NOT NULL,

HourlyRate MONEY NOT NULL,

Photograph VARBINARY(MAX)

CONSTRAINT PK_Engineers PRIMARY KEY CLUSTERED

EngineerId

)
NB: The above statement will fail as the Engineers table already exists.
Altering an Existing Table
A primary key can be added to an existing table using the ALTER TABLE statement
with the ADD clause. The syntax for the constraint element is the same as for when
creating tables:
ALTER TABLE table-name ADD
CONSTRAINT constraint-name PRIMARY KEY CLUSTERED (column1, column2, ...,
columnN)
We can test this syntax by adding a primary key to the Parts table. This table has a
natural key in the unique part number. To set this column to be the primary key,
execute the following statement:

ALTER TABLE Parts ADD CONSTRAINT PK_Parts PRIMARY KEY CLUSTERED (PartNumber)

JoBS Database Primary Keys

Page 94
The remainder of the primary keys can now be added to the JoBS tutorial database.
Use a combination of the graphical user interface tools and Transact-SQL to add the
following primary keys.
Table Column(s)
Contracts ContractNumber
Customers CustomerNumber
CustomerAddresses AddressId
CustomerComplaints ComplaintReference
EngineerSkills EngineerId, SkillCode
EngineerWorkingAreas EngineerId, AreaCode
GeographicalAreas AreaCode
Jobs JobId
RequiredSkills StandardJobId, SkillCode
Skills SkillCode
StandardJobs StandardJobId
UsedParts JobId, PartNumber

Page 95
Page 96
Chapter 9 

SQL Server Foreign Keys


What is a Foreign Key?
When working with a relational database that has been normalised there are, in all
except the simples cases, multiple tables containing related information. As we have
seen in an earlier article, the normalisation process prevents data anomalies caused by
duplication. However, simply extracting information into multiple tables to avoid this
replication presents the potential of referential integrity problems.
A referential integrity problem occurs when two tables are logically linked but where
one table refers to data in another that does not exist, either because it was never
created or has been deleted. To enforce correct referential integrity, you can create
one or more foreign key relationships.
Each foreign key specifies a link between a parent table and a child table based upon
one or more compatible columns. The parent table's part of the foreign key must be a
candidate key. This can either be its primary key or a unique constraint. The child
table's part of the relationship is known as the foreign key. Once created, the
relationship enforces the referential integrity of the link in two ways. Firstly, it
prevents rows in the parent table being deleted if they are referenced in the foreign
key of the child table. Secondly, non-null values cannot be stored in the child table's
foreign key columns unless matching values exist in the parent. Null values can be
held in a foreign key if the columns permit this.
In most cases, the parent and child tables of a foreign key relationship are separate
tables. However, it is quite acceptable to create a relationship where the candidate key
and the foreign key are within the same table. This is known as a self-referencing or
recursive key and is ideal for creating hierarchical relationships between entities of the
same type.
In SQL Server databases, you can create multiple foreign key relationships for the
same table. Each foreign key can reference the same candidate key in the same parent
table if desired, or you can link to several different tables from the same child table.
Although there is no enforced limit on foreign keys, Microsoft recommends that you
create no more than two hundred and fifty-three foreign keys for a single child table
and reference a single parent table from no more than two hundred and fifty-three
children.
Foreign key columns tend to be used heavily in queries, as they are often the linking
columns of joins. However, by default SQL Server will not create an index on the
foreign key columns of the child table. It is usually beneficial to manually add such an
index.
Many-to-Many Relationships


 
Page 97
Foreign keys create one-to-many relationships. These are so-named because one row
in the parent table can be linked to many rows in the child table. However, each entry
in the child table may only be linked to a single row in the parent. Another type of
relationship is the many-to-many link. In this scenario, many rows in either the parent
or child table may link to many rows in the other table.
Many-to-many relationships cannot be created directly in most relational database
management systems, including SQL Server. To create this type of link, a third table
is required. This table, known as a junction table, includes a foreign key to each of the
tables that you wish to link. The combination of two one-to-many relationships has
the effect of a single many-to-many link.
You can see how a many-to-many link operates below. In this case the Employees
table links to the Skills table via a junction table. This allows each employee to have
multiple skills and each skill to be assigned to any number of employees. In the
example, Chris Green is proficient in both C# and Visual Basic. The C# skill can be
provided by two employees: Arthur Brown and Chris Green. Note also that it is
possible for a row to have no related items, as in the case of the SQL Server skill.

Employees Table Junction Table Skills Table


EmployeeID Name EmployeeID SkillID SkillID Skill
AB Arthur Brown AB CS CS C#
CG Chris Green CG CS VB Visual Basic
LB Louise Black CG VB SQL SQL Server
LB VB
Cascading Updates and Deletes
If you try to update the values in the primary key or unique constraint that is used in
the parent table of a foreign key relationship, and the child table has rows that
reference the values being changed, the default response is an error. Similarly, if you
try to delete a row from the parent table that is referenced by one or more rows in the
child table, an error occurs. You can modify this behaviour with cascading updates
and deletes.
When a foreign key includes cascading updates, changes to the parent table's
candidate key do not automatically cause an error. Instead, all of the linked rows in
the child table are updated. The new values for the child row's foreign key columns
vary according to the cascading update setting:

 No Action. This is the default setting, which does not allow cascading updates.

 Cascade. When updates are set to cascade, a change in the candidate key of the
parent table causes all matching rows in the child table to be updated with the
same value, correctly retaining all links.


 
Page 98
 Set NULL. With this setting, any change to a parent's key changes matching
foreign key values to NULL.

 Set Default. When using this option, changes to a parent's key causes matching
foreign key values to be changed to their default values.
Cascading deletes are similar to cascading updates. When configured, deleting a row
in the parent table causes matching rows in the child table to be deleted, or matching
foreign key columns to be set to NULL or default values. The actual action depends
upon the setting selected.

Adding a Foreign Key to a Database


In this article we will consider two methods of adding foreign keys to a database.
These will be using the SQL Server Management Studio (SSMS) and Transact-SQL
(T-SQL) commands.
We will add the foreign keys to tables in the JoBS tutorial database. This has been
created over the course of the tutorial so far. To prepare the tutorial database,
download and run the script via the link at the start of this page. This will create the
database and some sample data.
Adding a Foreign Key Using SQL Server Management Studio
Foreign keys can be created, edited and deleted using the graphical user interface of
SQL Server Management Studio (SSMS). We will start by adding a new relationship
between customers and their addresses. The CustomerAddresses table already
includes a column named CustomerNumber that defines which customer each address
belongs to. This column will be the foreign key that is related to the Customers table's
primary key. To begin, right-click the CustomerAddresses table in the Object
Explorer and choose "Design" from the menu that appears. The table designer will be
displayed.
To create a new foreign key, choose "Relationships..." from the Table Designer menu.
You can also click the Relationships toolbar button. The "Foreign Key Relationships"
dialog box will be displayed. Initially this will contain no foreign keys as none have
yet been created. To add the first, click the Add button.


 
Page 99
To create the new foreign key relationship we will first specify the linking columns.
To do so, click the Tables and Columns Specification option. An ellipsis button will
appear in the property box. Click the button to display the Tables and Columns
selection dialog box.

The Tables and Columns dialog box is used to select the parent and child tables and
the candidate key and foreign key columns to be linked. The foreign key table cannot


 
Page 100
be changed, as it is the table that is currently being edited. From the primary key table
drop-down box, select "Customers".
With the two tables selected, you must select one or more columns from the parent
table in the left column of the grid and then select the matching foreign key columns
in the right column. In this case, select CustomerNumber in both columns, as shown
in the figure above. Click the OK button to store the selections and return to the
previous dialog box.
The remaining options for the new foreign key relationship are as follows:

 Check Existing Data on Creation or Re-enabling. This option specifies


whether the new foreign key should be enforced for data that already exists within
the table. If set to No, any data that breaks the new referential integrity rule will
remain in the table. If set to Yes, the creation of the new foreign key will not be
permitted if conflicting data exists. You should leave this option set to Yes.

 Name. The Name property allows you to create a unique name for the foreign
key. Replace the provided name with "FK_CustomerAddresses_Customers".

 Description. The Description property allows you to add some comments to the
relationship. This property does not modify the behaviour of the relationship but
can be useful when other developers or database administrators are viewing the
foreign key definition.

 Enforce for Replication. This option determines whether the foreign key rules
should be enforced when changes to the data are made by SQL Server replication
processes. Replication is beyond the scope of this tutorial. Set this option to Yes.

 Enforce Foreign Key Constraint. This option enables or disables the


relationship rules for manual changes to the data. Set this option to Yes.

INSERT and UPDATE Specification. This property contains two settings that

determine the cascade options when updating and deleting information in the
parent table. Set both of these options to "No Action".
To accept the settings for the new foreign key, click the Close button. Next, save the
table definition and close the table designer. You may be asked to confirm the
changes to the two tables. If so, click the Yes button to complete the save.
We can test the foreign key by attempting to create invalid data or by trying to delete
rows from the parent table that are referenced in the child table. As a first test, try
executing the following insert statement. This references a customer with a customer
number of 99. As there is no such customer, the INSERT fails.

INSERT INTO CustomerAddresses


 
Page 101
(CustomerNumber, Address1, TownOrCity, AreaCode, Postcode)

VALUES

(99, '1 Short Street', 'Leeds', 'WYO', 'LS1 1SL')


As a second test, we can try to delete customer number one. This customer has a
related address, which would be orphaned, so the DELETE fails with an error.

DELETE FROM Customers WHERE CustomerNumber = 1

Adding a Foreign Key Using Transact-SQL


Foreign keys can be added using the T-SQL ALTER TABLE command with the
ADD CONSTRAINT clause. Two further items are required. The first is the
FOREIGN KEY clause followed by a comma-separated list of the parent table's
primary key or unique constraint columns in parentheses. The second additional item
is a REFERENCES clause, which specifies the list of foreign key columns.
To create a relationship between the ContractNumber of the Jobs and Contracts table,
execute the following statement:

ALTER TABLE Jobs

ADD CONSTRAINT FK_Jobs_Contracts

FOREIGN KEY (ContractNumber)

REFERENCES Contracts (ContractNumber)

JoBS Database Foreign Keys


The remainder of the foreign key constraints can now be added to the JoBS tutorial
database. Use either the graphical user interface tools or T-SQL to add the following
items.

Parent Table Parent Key Child Table Foreign Key


CustomerAddresses AddressId Contracts CustomerAddressId
Customers CustomerNumber CustomerComplaints CustomerNumber
Engineers EngineerId CustomerComplaints EngineerId
Engineers EngineerId EngineerSkills EngineerId
Engineers EngineerId EngineerStock EngineerId
Engineers EngineerId EngineerWorkingAreas EngineerId

Page 102
Engineers EngineerId Jobs EngineerId
GeographicalAreas AreaCode CustomerAddresses AreaCode
GeographicalAreas AreaCode EngineerWorkingAreas AreaCode
Jobs JobId Jobs FollowUpJobId
Jobs JobId UsedParts JobId
Parts PartNumber EngineerStock PartNumber
Parts PartNumber UsedParts PartNumber
Skills SkillCode EngineerSkills SkillCode
Skills SkillCode RequiredSkills SkillCode
StandardJobs StandardJobId Jobs StandardJobId
StandardJobs StandardJobId RequiredSkills StandardJobId

Page 103
Page 104
Chapter 10 

Transact-SQL Joins
Joins
In the previous two articles we have examined the process of normalising a database
and creating foreign key relationships between tables that are logically linked. When
you are working with a normalised relational database, it is essential that you can
perform queries that gather data from several linked tables but return the matching
data in a single set of results. This is achieved with the use of joins.
A join is a clause within a SELECT statement that specifies that the results will be
obtained from two tables. It usually includes a join predicate, which is a conditional
statement that determines exactly which rows from each of the tables will be joined by
the query. Usually the join will be based upon a foreign key relationship and will only
return combined results from the two tables when the key values in both tables match.
However, it is possible to join tables based upon non-key values or even perform a
cross join that has no predicate and returns all possible combinations of values from
the two tables.
Usually normalisation results in a database with many related tables. You may
therefore want to join more than two tables for a single set of results. In this case, you
can include multiple joins in a SELECT statement. The first join combines the first
two tables. The second join combines the results of the first join with a third table, and
so on.
JoBS Database
The examples in this article use the JoBS tutorial database. If you do not have a copy
of the JoBS database available, download the script using the link at the top of this
page. You can execute the script to create and populate a new database.
Performing Queries with Joins
Inner Joins
Possibly the most common form of join that you will use is the inner join. With this
type of join, the two tables are combined based upon a join predicate. Wherever a row
in one table matches a row in the other, the two rows are combined and added to the
outputted results. If a row in either table matches several in the other, each
combination will be included in the results. If a row in either table does not match any
in the other table, it will be excluded from the results altogether.
The join clause, second table name and join predicate are included in the SELECT
statement immediately after the name of the first table. The join uses the INNER
JOIN clause and the predicate uses the ON clause. The basic syntax for the SELECT
statement is as follows:

Page 105
SELECT columns FROM table-1 INNER JOIN table-2 ON predicate
As an example, we can join the Jobs and Engineers tables. The following query
returns a list of jobs and the details of the engineer that performed the work. In this
case, the predicate specifies that the tables will be joined only where the EngineerId in
both tables is a match. These columns are used in a foreign key relationship definition
for the two tables, although this is not a requirement for the join to be executed.
However, if non-key columns are frequently used in join operations, you should
consider adding appropriate indexes to improve query performance.

SELECT * FROM Jobs INNER JOIN Engineers ON Jobs.EngineerId =


Engineers.EngineerId
The query returns four results as each of the four jobs in the database has an engineer
associated with it. However, not all of the engineers have been assigned work so the
twenty-one that have not undertaken a job are not included in the results.
The example query returns every column from both tables. For larger tables, or when
using multiple joins, this may include a lot of data that you do not require. In addition
to making the results more difficult to read, it can also lengthen the processing time
for the query and increase the amount of network traffic generated. You should
therefore only return the columns that you require. This can be achieved by specifying
a column list as usual. However, if the two tables include columns with the same
name, the ambiguity can cause an error. Try executing the following query:

SELECT

EngineerId,

JobId,

VisitDate,

EngineerName

FROM

Jobs

INNER JOIN

Engineers

ON

Jobs.EngineerId = Engineers.EngineerId

Page 106
This query fails because the EngineerId column appears in both tables. You must
therefore specify which table's EngineerId column you require by prefixing it with the
table name and a full-stop (period) character. The following query resolves the
problem:

SELECT

Jobs.EngineerId,

JobId,

VisitDate,

EngineerName

FROM

Jobs

INNER JOIN

Engineers

ON

Jobs.EngineerId = Engineers.EngineerId
When using joins, you can also use WHERE clauses, ORDER BY clauses, etc. As
with the column selection, you must specify the table name for any columns with
ambiguous names. For example:

SELECT

Jobs.EngineerId,

JobId,

VisitDate,

EngineerName

FROM

Jobs

INNER JOIN

Engineers

10 
 
Page 107
ON

Jobs.EngineerId = Engineers.EngineerId

WHERE

Jobs.EngineerId = 4

OR

Jobs.EngineerId = 8

ORDER BY

Jobs.EngineerId

Implicit Inner Joins


A second syntax can be used for creating inner joins without providing a join clause.
This syntax is known as an implicit inner join. In this case, the names of the tables to
be joined are provided in a comma-delimited list. The join predicate becomes part of
the WHERE clause.

SELECT

Jobs.EngineerId,

JobId,

VisitDate,

EngineerName

FROM

Jobs, Engineers

WHERE

Jobs.EngineerId = Engineers.EngineerId
Generally speaking, the syntax that you use for inner joins is a matter of personal
preference or will be proscribed by your organisation's coding standards. Many people
prefer the explicit syntax as the join predicates are separated from those in the
WHERE clause and are kept closer to the names of the tables that they are acting
upon. It is also easier to change the type of join with the explicit syntax. It is useful to

11 

Page 108
understand both variations, as you are likely to encounter each syntax in real-world
scenarios.
Using Table Aliases
Providing the full name of the table as a prefix to column names can lead to long
query statements. This is especially true if, to make the query more easily understood,
you prefer to prefix every column name rather than just those that are ambiguous. In
such circumstances, it is useful to replace the table names with table aliases.
A table alias provides a short code, usually of one or two letters, that can be used in
place of a table name. Each alias is defined in the query after the full table name. In
the next example, the "J" alias represents the Jobs table and the Engineers table has
the alias "E". This makes the query much more readable.

SELECT

J.EngineerId,

J.JobId,

J.VisitDate,

E.EngineerName

FROM

Jobs J

INNER JOIN

Engineers E

ON

J.EngineerId = E.EngineerId

Self-Referencing Joins
Table aliases are always required when creating self-referencing joins, where the two
tables that are being joined are actually the same table. In the following example, the
Jobs table is being joined to itself so that the initial job and follow up job can be
combined in a single row in the results. The initial job's table has an alias of "J",
whilst the follow up job uses the "JF" alias.

SELECT

J.JobId,

12 

Page 109
J.StandardJobId,

J.EngineerId,

JF.JobId AS FollowUpJobId,

JF.StandardJobId AS FollowUpStandardJobId,

JF.EngineerId AS FollowUpEngineerId

FROM

Jobs J

INNER JOIN

Jobs JF

ON

J.FollowUpJobId = JF.JobId

Joining Multiple Tables


When you wish to join more than two tables, you can simply add further INNER
JOIN and ON clauses to the query. For example, in the JoBS database there is a
many-to-many link between Engineers and their Skills via a junction table named
"EngineerSkills". We can join the three tables to generate a list that contains all
engineers that have skills with one row for every engineer and skill combination. The
following query returns these results in order of skill name. Note that some engineers
appear more than once in the list because they have multiple skills.

SELECT

E.EngineerName,

E.HourlyRate,

E.OvertimeRate,

S.SkillName

FROM

Engineers E

INNER JOIN

13 

Page 110
EngineerSkills ES

ON

E.EngineerId = ES.EngineerId

INNER JOIN

Skills S

ON

ES.SkillCode = S.SkillCode

ORDER BY

S.SkillName

Outer Joins
Outer joins use a similar syntax to explicit inner joins but provide different results in
some cases. The key difference is that outer joins include all of the rows from one or
both tables, even if there are no matching rows defined by the join predicate. Where
information is missing, NULL values are substituted in the columns of the returned
results. The easiest way to explain the difference is with an example.
Firstly, let's consider an inner join. The query below joins the CustomerComplaints
table to the Engineers table so that each complaint can be shown alongside the
engineer that was at fault. Although there are three complaints in the database, only
one row is returned by the query. This is because the other two rows define
complaints that were deemed not to be the engineer's fault. In these cases the
EngineerId column in the CustomerComplaints table is set to NULL.

SELECT

E.EngineerName,

C.Complaint

FROM

CustomerComplaints C

INNER JOIN

Engineers E

14 
 
Page 111
ON

C.EngineerId = E.EngineerId

EngineerName Complaint
Joey Ohara Engineer did not have appropriate parts and was
rude.
This may not be the information that we require from the query. If we actually want to
list every complaint in the database, showing the engineer's name only where an
engineer is associated with the complaint, we must use an outer join. In this case, we
will use a left outer join. This indicates that all of the values from the table to the left
of the join clause will be returned. If there is no associated row from the table on the
right of the join, the columns from that table will be included but will contain NULL
values.
To demonstrate, execute the following command, noting that the join clause has been
modified to LEFT JOIN:

SELECT

E.EngineerName,

C.Complaint

FROM

CustomerComplaints C

LEFT JOIN

Engineers E

ON

C.EngineerId = E.EngineerId
The results from this query contain all three complaints. For the two complaints not
associated with an engineer, the EngineerName column's value is NULL.

EngineerName Complaint
NULL Customer has received an incorrect charge on their direct debit account.
NULL Customer does not wish to receive direct marketing information.
Joey Ohara Engineer did not have appropriate parts and was rude.
The opposite of a left outer join is a right outer join. As you may imagine, this returns
all of the rows from the table on the right of the join clause with null values for
15 
 
Page 112
columns from the table to the left of the clause when no matching row exists. If you
alter the previous query to use a right join, all twenty-five engineers will be returned.
One of these engineers, Joey Ohara, will have a complaint listed, whilst all of the
others will have NULL in the Complaint column.

SELECT

E.EngineerName,

C.Complaint

FROM

CustomerComplaints C

RIGHT JOIN

Engineers E

ON

C.EngineerId = E.EngineerId
The third style of outer join is the full outer join. This type of join ensures that every
row from both tables is included in the final results. If any row in either table does not
have a matching partner, the row is populated with NULL values for the missing data.
Try modifying the query to use a full join as shown below. This time you will see
twenty-seven returned rows. One will be a complaint with a matching engineer, two
will be complaints without engineers and twenty-four will be for engineers with no
complaints made against them.

SELECT

E.EngineerName,

C.Complaint

FROM

CustomerComplaints C

FULL JOIN

Engineers E

ON

16 
 
Page 113
C.EngineerId = E.EngineerId

Cross Joins
Cross joins are the simplest form of join and possibly the least used. A cross join does
not define a join predicate. This means that the results contain every possible
combination from the two tables. This is the Cartesian product of the table so this type
of join is often called a Cartesian join.
If we modify the outer join query to provide a cross join, all combinations of data are
returned. As there are twenty-five rows in the Engineers table and three rows in the
CustomerComplaints table, the resultant list contains seventy-five rows.

SELECT

E.EngineerName,

C.Complaint

FROM

CustomerComplaints C

CROSS JOIN

Engineers E
NB: Cross joins should be used with care, particularly with large tables as the
number of results can grow very quickly.
Implicit Cross Joins
As with inner joins, cross joins have an implicit syntax variation. This is the same
syntax as for inner joins but with no join predicate in the WHERE clause:

SELECT E.EngineerName, C.Complaint FROM CustomerComplaints C, Engineers E

17 

Page 114

You might also like