Professional Documents
Culture Documents
DB Course
DB Course
Database
From the conferences
of Dr. Ivanne Khoury
Tronc Commun
SEM 4
The Student Committee is a group of students dedicated
to working for the student body. As the book committee,
our job is to provide for the students all the materials
they need in a specific semester, starting from the
reference books till the exams of the previous years with
their solutions, and most importantly the course notes of
each course, whether they are slides or handwritten by
the teacher of the course or typed notes based on last
year’s lectures. Every single part of the package is
reviewed yearly to add the latest exercises and their
solutions, the latest notions and tips and tricks to ace
the course. And of course, all of this is made in
collaboration with the doctors giving their lectures.
2. Each column contains only what is indicated in its header. For example, in a column with header
“Telephone number” we may not put two numbers or indication on the preferred calling time, such
as in the second row of this table:
Student number Name Surname
3. each row refers to a single object. For example, there may not be a row with information on
several
objects or on a group of objects, such as in the second row of this table: Name Surname Degree
course
4. rows are independent, i.e. no cell has references to other rows, such as in the second row of this
table:
Page 1
5. rows and columns are disordered, i.e. their order is not important. For example, these four tables
are the same one:
6. cells do not contain values which can be directly calculated from cells of the same row, such as in
the last column of this table:
Database rows are called records and database columns are called fields.
Single table databases can be easily handled by many programs and by human beings, even when
the tableis very long or with many fields. There are however situations in which a single table is not
an efficient way to handle the information.
1.2. Relations
1.2.1. Information redundancy
In some situations trying to put the information we need in a single table database causes a
duplication of identical data which can be called information redundancy. For example, if we add to
our students’ table the information on who is the reference secretary for each student, together
with other secretary’s information such as office telephone number, office room and timetables, we
get this table:
Page 2
Information redundancy is not a problem by itself, but:
storing several times the same information is a waste of computer space (hard disk and memory),
which for a very large table, has a bad impact on the size of the file and on the speed of every search
or sorting operation;
whenever we need to update a repeated information (e.g. the secretary changes office), we need
to
do a lot of changes;
manually inserting the same information several times can lead to typing (or copying&pasting)
mistakes, which decrease the quality of the database.
In order to avoid this situation, it is a common procedure to split the table into two distinct tables,
one for the students and another one for the secretaries. To each secretary we assign a unique code
and to each student we indicate the secretary’s code.
In this way the information on each secretary is written and stored only once and can be updated
very
easily. The price for this is that every time we need to know who is a student’s secretary we have to
look at its secretary code and find the corresponding code in the Secretaries table: this can be a long
and
frustrating procedure for a human being when the Secretaries table has many records, but is very
fast task for a computer program which is designed to quickly search through tables.
Page 3
1.2.2. Empty fields
Another typical problem which arises with single table databases is the case of many empty fields.
For
example, if we want to build an address book with the telephone numbers of all the people, we will
have somebody with no telephone numbers, many people with a few telephone numbers, and some
people with a lot of telephone numbers. Moreover, we must also take into consideration that new
numbers will probably be added in the future to anybody.
If we reserve a field for every telephone, the table looks like this:
As it is clear, if we reserve several fields for the telephone numbers, a lot of cells are empty. The
problems
of empty cells are:
an empty cell is a waste of computer space;
there is a fixed limit of fields which may be used. If a record needs another field (for example,
Elena
Burger gets another telephone number) the entire structure of the table must be changed;
since all these fields contain the same type of information, it is difficult to search whether an
information is present since it must be looked for in every field, including the cells which are empty.
In order to avoid this situation, we again split the table into two distinct tables, one for the people
and
another one for their telephone numbers. This time, however, we assign a unique code to each
person and we build the second table with combinations of person‐telephone.
Page 4
Even though it seems strange, each person’s code appears several times in the Telephones table.
This is
correct, since Telephones table uses the exact amount of records to avoid having empty cells: people
appear as many times as many telephones they have, and people with no telephone do not appear
at all. The drawback is that every time we want to get to know telephone numbers we have to go
through the entire Telephones table searching for the person’s code, but again this procedure is very
fast for an
appropriate computer program.
Page 5
1.3. One‐to‐many relation
A relation is a connection between a field of table A (which becomes a foreign key) and the primary
key of table B: on the B side the relation is “1”, meaning that for each record of table A there is one
and only one corresponding record of table B, while on the A side the relation is “many” (indicated
with the
mathematical symbol ∞) meaning that for each record of table B there can be none, one or more
corresponding records in table A.
For the example of section 1.2.1, the tables are indicated in this way, meaning that for each student
there is exactly one secretary and for each secretary there are many students. This relation is called
many‐to‐one relation.
For the example of section 1.2.2, the tables are instead indicated in this way, meaning that for each
person there can be none, one or several telephone numbers and for each number there is only one
corresponding owner. This relation is called one‐to‐many relation.
Clearly one‐to‐many and many‐to‐one are the same relation, the only difference being the order of
drawn tables.
It is however very important to correctly identify the “1” side, since it has several implications on the
correct working of the database. For example, in the previous example putting the “1” side on the
Telephones table means that for each person there is only one telephone and that for each
telephone
there are many people, a situation which is possible up to the 90s, when there was only one
telephone for a whole family used by all its components, but which is not what we want to describe
with the current 21st century’s database. Moreover, reversing the relation also need to change a little
the structure of the tables, putting the foreign key Telephone in the People table instead of the
foreign key Person in the Telephones table, such as
Page 6
1.4. One‐to‐one relation
A one‐to‐one relation is a direct connection between two primary keys. Each record of the first table
has
exactly one corresponding record in the second table and vice versa. An example can be countries
and
national flags. This relation can sometimes be useful to separate in two tables two conceptually
different objects with a lot of fields, but it should be avoided, since the two tables can be easily
joined together in a single table.
Page 7
1.5.1. Details table
Page 8
1.6. Foreign Key with several relations
Page 9
Page 10
1.7. Refrential Integrity
Page 11
1.8. Temporal versus static database
1.9.2. Process
Page 12
1.10. Entity‐relationship model
Page 13
1.10.1. Enhanced entity‐relationship model
Page 14
CHAPTER 2
SQL Server Programming Tutorial Database
1- The Tutorial
Throughout this tutorial we will create and use a database to demonstrate a
possible real-world application of the Microsoft SQL Server RDBMS. This database
will support a job booking system that permits the storage and processing of
engineer visits for "DisasterFix", a company that provides services to businesses
and homeowners who have experienced a gas, electricity or plumbing emergency
requiring urgent repair.
NB: DisasterFix is a fictitious company. Any resemblance to any real company is
completely coincidental.
Although a user interface will not be created, the entire data structure and some
business logic elements will be created within the new database. This could be
extended to provide a complete software package using Microsoft .NET
technologies or any other development environment that permits communication
with SQL Server databases.
The following sections describe the company requiring the database and the basic
requirements for their new software.
1-1 DisasterFix
DisasterFix are a fictitious company based in the United Kingdom. They provide
services to businesses and homeowners who have experienced an emergency
with a utility in their property. This includes gas, electricity, plumbing, heating,
etc. When an emergency occurs, DisasterFix sends a suitably skilled engineer to
examine the problem and perform any required repairs.
DisasterFix's customers pay for services in one of two ways. For new customers
with an immediate problem, jobs are charged individually. A set fee is applied
dependent upon the type of job to be undertaken. Customers may also purchase
an insurance policy from DisasterFix. This is an annual contract that allows any
number of jobs to be provided to an address with no additional charges.
DisasterFix does not employ its own engineers. Instead, any service-providing
company or individual may apply to become a DisasterFix partner. Following
suitable training, engineers are approved to provide services on behalf of
DisasterFix and will automatically be assigned work in their designated areas.
All approved engineers are provided with a stock of standard parts that they may
use for jobs assigned by DisasterFix. When a job is completed, the engineer sends
a report back to the DisasterFix head office indicating the parts used and the time
taken. The parts can then be replenished and the cost of the job calculated.
Page 15
In addition to emergency responses, DisasterFix provides non-emergency work.
Such jobs are handled in exactly the same way as emergencies except that they
are always charged individually and are given a lower priority.
The system will permit the creation of customers with multiple addresses.
It will be possible to create contracts for annual renewal or for single jobs.
Each contract will be linked to a customer address.
The system will hold a list of approved engineers and the skills that they
have. Skills include "gas", "electricity", "plumbing", etc. Each engineer will be
assigned one of more geographical areas in which they operate.
The system must integrate with the existing stock system that controls the
replenishment of the engineers' stock. This requires a daily export of used
parts to the external system.
The system must integrate with the existing billing system that controls
invoicing and payments.
Page 16
2- SQL Server Management Studio
The Microsoft SQL Server relational database management system executes on a
Windows-based computer as a group of Windows services. These background
processes run continuously and process any database activities that are required.
However, as these are services, they have no user interface that permits their
configuration.
SQL Server Management Studio (SSMS) is a tool that is distributed with SQL
Server. This powerful utility provides a graphical user interface that connects to
one or more local or remote SQL Server installations and allows the user to
perform configuration and design tasks. These include the creation of new
databases and their constituent parts, and the execution of queries and other SQL
statements.
NB: SSMS can be installed using the original SQL Server CD-ROM or DVD-ROM. If
you are using SQL Server Express Edition, SQL Server Management Studio Express
edition can be utilised instead. If this tool is not installed, you can download it
from the Microsoft SQL Server web site.
Using SSMS, databases can be created in two ways. Firstly, SSMS provides screens
that guide you through the process. Secondly, SSMS can execute commands to
create a database in a textual format using a query window. In this article, we will
use SSMS's graphical user interface tools to create a new database for the JoBS
project. We will look at the command-based approach later.
This article assumes that you have access to a SQL Server installation and have the
details and login credentials required connecting to it and creating databases.
Page 17
b. Connecting to a SQL Server Instance
The Connect to Server dialog box allows you to specify the details of the SQL
Server that you wish to connect to and the credentials that you wish to use to
authenticate with that server. The information that must be completed is as
follows:
Server Type. SQL Server includes various options for server types, including
Database Engine for core database functionality, Analysis Services for
analytical data warehouses, Reporting Services for reporting purposes and
Integration Services for integration work. Other options may also be
available depending upon your particular workstation configuration.
Server Name. This allows you to specify the name of a SQL Server to connect
to. Each SQL Server on your network will have one or more instances. An
instance is a named installation of the SQL Server product. If the server
instance you wish to connect to is the default instance, you simply provide
the server's name. If a specific, named instance is to be used, this is provided
in the format <server-name>\<instance-name>. For example, the figure
above shows connection to the VIRTUALXP server's default instance. An
alternative option may be VIRTUALXP\Instance2.
Page 18
To connect to SQL Server in readiness for creating the JoBS database, select the
"Database Engine" server type and provide the details of the server name, and
optional instance name, and your authentication details. Click the Connect button
to make the connection. Once connected, the SSMS main screen will be
populated.
Page 19
3- System Databases
On starting SSMS, you will find that some databases already exist. These will be
separated into two distinct areas that are accessible by expanding the Databases
section of the Object Explorer tree. Within this branch of the tree is a further,
expandable node named "System Databases". Databases created by the system
appear here whilst user-created databases are displayed at the higher level.
All SQL Server instances include four system databases. These are named master,
model, msdb and tempdb.
4- Creating a Database
Creating a new database is a relatively simple task. However, there are a number
of options that can be set during creation to alter the behaviour of the new
Page 20
database. In the remainder of this article we will create the JoBS database whilst
examining the most relevant configuration items.
NB: In this tutorial we will be creating a database named JoBS. In the event that
your SQL Server instance already has a JoBS database, select a different name and
use this new name in place of JoBS for the rest of the tutorial.
b. Database Name
The first page of the New Database dialog box allows the new database to be
named. In the simplest scenarios, this is actually the only piece of information
that must be provided. All other configuration options can be left at their default
settings.
Page 21
The name of a database can contain letters, numbers, spaces and many symbols.
This gives a great deal of flexibility to the name but can also lead to difficult-to-
read names and names that cannot be used without adding delimiters. It is usual,
therefore, to use only letters and numbers and no spaces. A good convention is to
use Pascal case when joining multiple words and capitalisation for abbreviations,
as in the example database.
To set the name of the database, type JoBS into the Database name text box.
c. Database Owner
All databases are assigned an owner. This is a user that has full rights to modify
and maintain the new database and may be any Windows account or SQL Server
login with the rights to create databases. For new databases, the Owner textbox
shows <default>. This indicates that the owner will be the current user, as used to
log in to the database. The owner can be changed by typing a new name or
searching for a suitable login using the ellipsis button.
For the JoBS database, use the default option.
d. Full-Text Indexing
SQL Server includes a facility known as full-text indexing. This allows the
construction of special indexes that permit searching of textual data. This can be
useful when holding large sections of text that need to be read by a search engine
similar to Google, for example.
In this tutorial we will not be examining full-text indexing so leave this checkbox
blank.
e. Database Files
All databases use two files by default. These are the primary data file and the log
file. The primary data file keeps track of all other data files and holds some or all
of the information that is stored in the database. The log file contains details of
transactions that have occurred against the database and allows those
transactions to be recovered in the event of a disaster.
Each file is listed in a table in the dialog box. The table shows seven columns:
Logical Name. This is the name of the file that SQL Server will use. By
default, the name is based upon the database name.
File Type. The file type is either "Data" or "Log", depending upon its purpose.
Filegroup. The name of the filegroup that the file will be created within. See
below for more information on filegroups.
Initial Size. SQL Server files do not grow continuously in the same manner as
files in applications such as those in Microsoft Office. Instead, the files are
created with a specific size and grow in increments when the available space
is exhausted. This provides improved performance, which is critical in
Page 22
database systems. The Initial Size column shows the starting size of each file
in megabytes.
Autogrowth. When a data or log file becomes full, the autogrowth settings
determine the action that is taken by the database engine. If autogrowth is
disabled, the database simply raises an error to indicate that the file is full.
When enabled, the data file can be set to increase by a specific size in
megabytes or by a percentage of the current file size. Each file may be set to
grow indefinitely or can be limited to a specific maximum size.
Path. This column shows the path to the folder in which the file will be
created.
File. The last column in the table shows the name of the database file. For
new databases, this appears blank.
Additional files can be added to a database. These may be secondary data files or
additional log files. When extra data files are added, the information held in the
database is distributed amongst all of the available files. This allows files to be
spread over a series of disks for maximum performance.
For the JoBS database, use the default settings for the data and log files.
f. Filegroup Options
Filegroups are simply administrative containers for database files. Every data file
created is allocated to a single filegroup. In many databases, the default
"PRIMARY" filegroup is the only one in use.
Filegroups are useful because individual tables and indexes within a database can
be allocated to a particular filegroup. Using this facility, the disk location of groups
of tables can be determined by the database administrator (DBA). This allows
configurations where highly utilised data can be placed in a group of files that is
using very fast disks whilst archive data is held in a filegroup using slower
hardware.
New filegroups can be added to a database by clicking on the Filegroups option at
the left of the dialog box. Additional filegroups can be created by clicking the Add
button and providing a name. One filegroup can be marked as the default. This is
the one that will be used by new tables and indexes where a filegroup is not
specified.
For the JoBS database, we will use the single, default filegroup.
g. Database Options
The final set of database options can be viewed and modified by clicking
"Options" at the left of the dialog box. This page shows various general
configuration options that may be applied to the new database.
h. Collation
Page 23
The collation setting determines how information in the database is compared
and sorted by SQL Server. This is important as different languages and cultures
use different rules for comparing and sorting text. The collation also determines
whether capital letters and lower-case letters are considered to be equivalent.
When installing SQL Server, a default collation is required. If a collation is not
specified for each new database, the default collation is used instead.
For the JoBS database, use the default collation for the SQL Server instance.
NB: The JoBS database in the tutorial examples uses the Latin1_General_CI_AS
collation. If your collation is set differently, the results of comparisons and sort
operations may vary from those shown.
i. Recovery Model
SQL Server provides three recovery models that determine how a database's log
file is used. Each model has advantages and disadvantages, as explained below:
Full. The full recovery model is used to minimise the chance of data loss in a
disaster situation. In this model, all transactions are logged to the log file in
detail. The log file must be backed up along with all data files. In a recovery
situation, the data files and log files can be restored to a specific point in
time and usually no data is lost. It is possible that the log file can be
damaged in which case operations since the last log file backup must be re-
run. This model can lead to very large log files if they are not backed up and
truncated often. The full recovery model is useful for business-critical
systems.
Simple. In the simple recovery model the log file is automatically cleared by
the system to keep it small. There is no requirement, or benefit, to backing
up the log file. Instead, only the data files are backed up. This limits recovery
operations to restoring to the point at which the database was last backed
up. The simple model is useful for development and test databases, or for
non-critical live applications.
Bulk Logged. The bulk-logged model is similar to the full model except that
most bulk operations are logged in minimal detail. The lowers the size of the
log files at the cost of not being able to recover to a specific point in time.
The transaction logs must be backed up when using this option. The bulk-
logged model is useful for small periods of bulk transactions. Following these
periods, the database can be backed up and switched back to the full model.
As the JoBS database is to be used for example purposes only, choose the simple
recovery model.
j. Compatibility Level
The Management Studio can be used to create databases for earlier versions of
SQL Server. If an earlier version is to be targeted, an alternative option can be
Page 24
selected from the Compatibility Level drop-down list. Choosing an earlier version
will mean that less functionality is available.
For the JoBS database, ensure that the SQL Server 2005 option is selected.
k. Other Options
The final area of the New Database dialog box contains a set of further
configuration options that can be adjusted as required. These are beyond the
scope of this article and should be left at their default settings for the JoBS
database.
l. Creating the New Database
Now that the entire configuration for the new database is complete, it can be
generated. Click the OK button to complete the process. You will notice that the
status at the bottom-left of the dialog box changes briefly to "Executing" whilst
the model database is duplicated and the configuration settings are applied. Once
completed, the dialog box will close and the Object Browser will have a new tree
node named "JoBS" beneath the Databases branch.
Congratulations, you have now created the new database!
Page 25
Page 26
CHAPTER 3
Creating SQL Server Databases Part 2
1- Transact-SQL
As we have seen in the previous instalment of this tutorial, the SQL Server
Management Studio (SSMS) graphical user interface can be used to create and
configure new databases. This use of SSMS is ideal for development scenarios
where full access to the server is available. However, in many real-life situations
you may have no direct access to a SQL Server. For example, with off-the-shelf
software you may have to rely upon executing commands from your script or
program against a SQL Server installation that you have never seen.
To allow commands to be executed against a SQL Server instance or database, the
Transact-SQL (T-SQL) language is used. This is Microsoft's variant of the structured
query language (SQL). It contains textual commands, or statements, that are used
to create databases and their constituent parts, to query tables and views and to
manipulate data. All tasks that can be undertaken with SQL Server are controlled
using T-SQL.
NB: When creating a database using a graphical user interface such as SSMS, the
selections made are converted into an appropriate T-SQL statement for execution.
In many cases, you will find a "Script" button on the screen that allows you to view
the underlying T-SQL code.
Page 27
Windows in SSMS appear in a tabbed layout by default, allowing you to click
between tabs to show the other open windows. The tab at the top of the query
window shows the name of the current SQL server and the name of the database
that has been selected automatically. The database chosen will vary according to
your security privileges.
NB: It is very important to ensure that the database selected is the one that you
want to execute commands against. If the database is incorrect, you can quickly
change it using the drop-down list of databases in the toolbar.
To execute the command, ensure that no text is selected and then choose
Execute from the Query menu. You can also click the Execute button or press the
F5 key to run a command. On completion, the query window will update to show
the output of the print statement.
PRINT 'Hello'
PRINT 'world'
c. Executing a Selection
In the last two examples no selection was made when the Execute command was
given. When no text is selected, the entire content of the query window forms the
script to be run. One very useful aspect of SSMS is the ability to select one or
more lines, or even parts of lines, from a query window and only run that part.
You can try this by selecting either of the two print statements and pressing F5.
This is advantageous when you have multiple statements in a single window but
want to run them individually.
Page 28
3- The CREATE DATABASE Command
Now that we can run statements, we can investigate the CREATE DATABASE
command. This command allows a new database to be generated. In its simplest
form, only a name for the new database is required with all settings and file
locations automatically using their defaults.
To create a new database for the JoBS example, try executing the following
statement. After execution, right-click the "Databases" branch of the object
explorer tree and choose "Refresh" from the context-sensitive menu that appears
to see the new database in the list.
NB: In this tutorial we will be creating a database named JoBS. In the event that
your SQL Server instance already has a JoBS database, select a different name and
use this new name in place of JoBS for the entire tutorial.
If you have been following the tutorial examples so far, the above command will
fail with an error explaining that the database already exists. A database can be
deleted, or dropped, using the DROP DATABASE command. To drop the JoBS
database, execute the following statement.
NB: If you are using a different name than "JoBS", replace this in the drop
statement. Check this carefully to ensure that you do not delete the wrong
database!
You may receive an error when trying to drop a database if the database is in use.
Any connection to the database, including having the database selected as
current for any open query window, will prevent it from being deleted.
(NAME='DatabaseFile', FILENAME='C:\NewDatabase.mdf')
Page 29
To provide names and file details for both the data file and the log file for a new
database, two file specifications must be provided. The following sample shows
this with the first file specification provided for the primary data file and the
second for the log file. Executing this command will create both files in the root of
C: and use non-default naming.
As described in the first article regarding database creation, multiple files can be
created for both the data and log files. The information in the database or log is
then spread amongst the available files. To add more files for either the data or
the log, additional file specifications are included in the CREATE DATABASE
command, each separated by a comma. The following example demonstrates this
by using two data files and two logs files. The files are located on different disks
for a performance improvement.
(NAME='JoBSData2', FILENAME='d:\JoBS2.mdf')
(NAME='JoBSLog2', FILENAME='f:\JoBS2.ldf')
The data files used for a new database may be arranged in filegroups. These are
administrative containers for files that can aid performance as individual tables
and indexes can be assigned to a filegroup, and therefore can be targeted at a
specific disk drive or set of drives. A filegroup can be specified within the comma-
separated of data files with subsequent files in the list linked to that group.
Filegroups may not be specified for log files.
FILEGROUP SECONDARY
(NAME='JoBSData2', FILENAME='c:\JoBS2.mdf')
Page 30
When a new table or index is added to a database, the filegroup that it will be
stored in is set to the default filegroup unless otherwise specified. The default
filegroup is usually the primary group. However, this can be changed by adding
the DEFAULT keyword when creating a filegroup, as shown below.
(NAME='JoBSData2', FILENAME='c:\JoBS2.mdf')
A maximum file size may be set within a file specification. This uses a similar
syntax to assign a value to the MAXSIZE option. If the value is set to UNLIMITED,
the file will grow as required until the disk is full.
Finally, the growth options can be changed for each file specification individually.
The FILEGROWTH option determines the automatic size increase of a file when it
becomes full. The value can be set as a specific increment using a suffix of KB, MB,
Page 31
GB or TB. Alternatively, the file can be configured to increase by a percentage
value when required. The following example shows a percentage increment for
the data file and a specific value for the log file.
ON PRIMARY
LOG ON
6- Setting a Collation
The final element of the CREATE DATABASE command that we will consider in this
article is the ability to set the collation. This is achieved using the COLLATE clause.
This clause is appended to the command with the name of a valid SQL collation or
Windows collation. The selected collation name determines how data is ordered
and compared. The following example uses the "Latin1_General_CI_AS" collation,
which is case-insensitive and accent-sensitive.
COLLATE Latin1_General_CI_AS
Page 32
CHAPTER 4
Creating SQL Server Tables Part 1
1- What is a Table?
The table is the most important part of a relational database. Tables are the
storage location for all information within a database. Each table holds a single
set of structured information. For example, you may have a database that stores
customers, order headers and order lines. Three tables could be used in this
situation, one for each key entity.
2- Types of Data
All columns have a data type. The data type enforces a rule that restricts the kinds
of information that can be held in the table. Generally speaking, the data types
can be classified into five groups: numeric, character, date and time, large objects
and other miscellaneous types. The various data types are described in the
following sections.
When choosing the type of data for numeric columns, the type that uses the
minimum amount of storage space, whilst still allowing for the full range of
possible values, should be selected. If in the future the requirements for a column
change, it is possible to modify a column's data type to allow for a larger range.
Page 33
a. Bit Data Type
The bit data type holds a Boolean value. Though technically this may not be
considered a numeric data type, it is included in this section as often the values it
contains are referred to as either zero or one.
Page 34
1-9 5
10 - 19 9
20 - 28 13
29 - 38 17
g. Numeric Data Type
The numeric data type is functionally equivalent to the decimal data type.
Page 35
Character data types are used to store textual information. Depending upon the
type selected, the data is of fixed or variable length. Fixed length columns hold a
specific number of characters. If the string stored in the column is larger than that
available, the text is truncated. If the text is shorter than the fixed size, the data is
padded. Variable length data types have a maximum size defined and truncate
strings that are oversized.
Character data types can be used to store either Unicode or non-Unicode
information. Unicode strings use two bytes per character and allow letters,
numbers and symbols from many international languages to be stored. Non-
Unicode data types cannot store multilingual character sets but generally only use
a single byte per character.
The selection of textual data type is not at simple as choosing a numeric type.
Firstly, consider whether the data to be stored will contain information in multiple
languages. If it will, select a Unicode type. If not, select the smaller non-Unicode
equivalent. If the size of the information stored in a character-based column will
generally be approximately the same size for all rows in a table, select a fixed size.
If the text will vary in length considerably, select a variable size column type.
Page 36
is double the length of the data actually stored. The maximum size may be
between one and four thousand characters.
b. Timestamp
A timestamp column simply holds an eight-byte number that is unique to the
database. Timestamps change each time their containing row in the database is
modified. This makes timestamps ideal for determining a version number for a
row and determining if a row has changed between two points in time. It is
important to understand that, despite the name, timestamps do not store a date
or time.
Page 37
The binary data type is used to hold fixed length binary-formatted data. This is
useful for storing information that must be encrypted, such as passwords. The
size of binary columns is specified as between one and eight thousand bytes.
Page 38
d. NText Data Type
The ntext data type is the Unicode equivalent of text. For new database, use
nvarchar(MAX) instead of ntext.
Nullable Columns
In addition to the type of data held in a column, all table columns can be marked
as either nullable or not nullable. A column that is not nullable requires that a
value is present in every row. Nullable columns can be assigned the special value
of NULL. A null value in a row indicates that the value has not been set.
The null value gives an additional option to the range of values that a column can
contain. This is beneficial when all of the possible values that a data type gives
could potentially be used in a column but an additional, "undefined", option is
required. For example, consider a database that holds the resultant data from a
survey where interviewees answer yes or no to a series of questions. The answers
would be stored in nullable bit columns with a 0 indicating that the user answered
"no", a 1 for "false" and null to indicate that the question remains to be
answered.
3- Creating Tables
In this tutorial we will consider two methods for creating database tables. The
first is using the graphical user interface aspects of SQL Server Management
Page 39
Studio and is examined in this article. The second, using Transact-SQL (T-SQL)
commands and scripts will be described in the next tutorial instalment. We will
use these technologies to create tables within the sample JoBS database. If you
have not done so already, create the empty JoBS database using the techniques
described in the earlier article, Creating SQL Server Databases Part 1.
To create the new table, expand the tree in the Object Explorer so that the Tables
branch of the JoBS database is visible. Right-click this branch and select "New
Table..." from the context-sensitive menu that appears. A new design window will
be opened in the main area of SSMS and the Properties area will update to allow
modification of the new table's options. At this point, the new table has not yet
been created in the database.
Page 40
There are several properties that can be set for a new table. The key properties
that are of interest to this article are as follows:
Name. The name of the table. A new name for the table can be entered here
to replace the default name. If the name is not changed, a new name will be
requested when the table is saved.
Schema. Tables may be grouped into schemas using this option. Schemas
are beyond the scope of this article.
Regular Data Space Specification. This setting can be expanded and, using
the options within, you can set the filegroup for storage of the table's row
data. This is the storage location for normal column data, excluding large
objects.
Adding Columns
The Customers table will be used to hold various elements of data for each
customer row. These are as follows:
Page 41
will be named "CustomerNumber" to match the Pascal case standard. Note
that the space has been removed from between the words in the column
name. This is not strictly necessary but makes the column easier to work
with.
Contact Name. All customers will require a name. For business customers
this will be the primary contact for the business. For homeowners this will be
the name of the person who purchased a service or contract from
DisasterFix. The name will be held in two columns named "FirstName" and
"LastName". As these will be quite variable in size we will use a varchar
column for each. The maximum size for each will be 25 characters.
Business Name. For business customers, the name of the company will be
held. This will be stored as a variable length character string with a
maximum size of one hundred bytes. As homeowners will not have a
business name, the "BusinessName" column will be nullable.
Creation Date. The final column will hold the date of creation of the
customer. This will be used for auditing purposes with only the date being
required. We will use a non-nullable, smalldatetime column named
"CreationDate".
To create the table's columns, you use the grid in the main area of SSMS. The first
column in the grid determines the name of the new column. The second column
in the grid permits the selection of the data type for the new table column. Where
the data type requires a size to be specified, this is appended to the data type
name within parentheses (). The last column shows a checkbox for each table
column. If checked, the new column will allow null values to be stored. If not
checked, a value will always be required.
Add the five column definitions to the table designer grid. The final result should
look similar to the diagram below. If you make an error whilst adding the
columns, you can modify values after clicking on them. You can also insert and
delete items by right-clicking the small square at the left of the grid row and
choosing the appropriate option from the context-sensitive menu.
Once your design matches that shown, the table must be saved. To do so, select
"Save Customers" from the File menu, click the Save button on the toolbar or
Page 42
press Ctrl-S. The table will be committed to the database and will appear beneath
the Tables branch of the Object Explorer. NB: If the table does not appear, right-
click the Tables branch and select "Refresh".
The table designer may now be closed by clicking the X at the top-right of the
tabbed window. If you need to edit the table again at a later date, the designer
can be reopened by right-clicking the table name in the Object Explorer and
selecting the "Design" option.
Congratulations, you have now created the first JoBS database table!
Limitations
There are some limitations that you should be aware of when creating database
tables. The first of these is that there is a limit of two billion tables permitted per
database. However, a database containing such a large number of tables would
be unwieldy to say the least!
A single table may contain up to 1,024 columns. The total size of these columns
must not exceed 8,060 bytes, excluding large object data. This is a limitation on
the size of the actual information held in a row. It is, therefore, possible to create
a table containing variable length columns with a maximum size greater than
8,060 bytes. However, if you try to insert a row with combined data exceeding
this limit you will receive an error.
Page 43
Page 44
CHAPTER 5
Creating SQL Server Tables Part 2
Command Syntax
The simplest way to use the CREATE TABLE command is to provide a table name
and a comma-separated list of column specifications. This is shown in the syntax
below:
CREATE TABLE table-name
(
column-specification1,
column-specification2,
...
column-specificationN
)
Each column specification describes a column that will be created within the
table. Here we can provide a name for the column and specify the column type
and size to use. We can also determine if the column should permit the storage of
null values by appending either "NULL" or "NOT NULL" to the specification. These
three items are separated by space characters, making the syntax for a column
specification as follows:
column-name data-type [NULL | NOT NULL]
NB: The use of "NULL" is optional. If omitted, a column will be nullable by default.
Page 45
you have not already done so, start SQL Server Management Studio and open a
new query window.
We need to ensure that we are working against the correct database before
executing any of the commands below. In an earlier article I explained how to
change the active database using the drop-down menu in SSMS. You can also
change databases using T-SQL with the USE command. If your query window is
not showing JoBS as its active database, run the following script. You should see
the title of the query window and the database selection drop-down list update
accordingly. NB: If your tutorial database is not named "JoBS", substitute the
correct name in the USE command.
USE JoBS
We can now create our new table. This table will hold a list of all of the
geographical areas that are serviced by DisasterFix engineers. Each row will have
a two-letter area code and name of the geographical area, with both columns
being mandatory. Later in the tutorial we will use this information to link a
customer's address to a specific area so that customers can be matched to
appropriate engineers. Let's create the table using the script below:
After executing the statement, right-click the "Tables" branch in the Object
Explorer and choose "Refresh". The new table should now be visible. If you right-
click the table name, you can select "Design" to see the table's structure in the
designer, as you did when creating the Customers table in the previous article in
this tutorial.
Page 46
The JoBS database includes only a single filegroup. However, if it had a second
filegroup named "METADATA", we could target the GeographicalAreas table at
this location by modifying the T-SQL command as follows:
ON METADATA
EngineerId INT,
ComplaintTime DATETIME,
Page 47
If the JoBS database contained multiple filegroups, we may decide that the
information would generally be stored in the filegroup named CORE whilst the
large object data would be held in a separate filegroup named LARGE. You have
already seen how to specify the location of the regular data. To set the large
object filegroup, you use the TEXTIMAGE_ON command as shown below:
EngineerId INT,
ComplaintTime DATETIME,
ON CORE
TEXTIMAGE_ON LARGE
CustomerAddresses Table
The CustomerAddresses table holds details of addresses. By holding the addresses
in a separate table to the customers we can have multiple addresses for a single
customer without unnecessary repetition of data. This is known as a "one-to-
many" relationship.
Each address includes a reference to the customer number so that the two tables
can later be linked together. The address also includes a region code that will link
to the GeographicalAreas table. Each address includes a reference number.
Page 48
Column Name Data Type Nullable?
AddressId Int No
CustomerNumber Int No
Address1 VarChar(50) No
Address2 VarChar(50) Yes
Address3 VarChar(50) Yes
TownOrCity VarChar(30) No
AreaCode Char(3) No
Postcode Char(8) No
Contracts Table
The Contracts table holds information relating to single jobs and annual contracts.
Each contract is linked to a customer address; if a customer wishes to obtain an
insurance policy for multiple premises, they will actually create several contracts.
Recurring contracts include a renewal date and a flag that indicates if the
customer wishes to automatically renew or whether contact must be made
beforehand. Single jobs will hold a null value in the renewal columns. Both single
jobs and contracts include a monetary value.
NB: The CustomerAddressId column will be used to provide a link to the
CustomerAddresses table. Note that the name of the two matching columns need
not be identical but that the data type should be.
Skills Table
The Skills table will be used to store the various skills that may be required of an
engineer. A unique code and a name for each skill will be held.
Page 49
The JoBS system will hold a list of standard jobs that may be undertaken by
engineers. For simplicity, custom jobs are not permitted; if an unexpected job
must be undertaken it will be added to this table first. A standard job includes a
unique reference number, a name for the job and a standard price to charge.
RequiredSkills Table
Each standard job will require one or more skills to complete successfully. In order
that the correct engineer is despatched to do the work, the RequiredSkills table
holds links between the StandardJobs and Skills tables, determining which skills
are required. In a later article, we will create links between the tables to cement
this "many-to-many" relationship.
Engineers Table
We need a table to hold a list of people that may be employed to undertake work
on behalf of DisasterFix. This information will be stored in the Engineers table.
The table will hold an optional photograph to be used on engineer ID cards.
EngineerWorkingAreas Table
Page 50
The EngineerWorkingAreas table will provide a many-to-many link between
engineers and geographical areas. This information will determine which areas
the engineer is willing to work in. This data is necessary when matching customer
work to engineers.
EngineerSkills Table
In a similar manner to the EngineerWorkingAreas table, this table will provide a
link between the engineers and the skills they have. This will be used to ensure
that the engineer sent to a job is equipped to undertake the work.
Jobs Table
The Jobs table will hold information relating to all jobs that have been performed
or that are booked as future work. This table is central to the JoBS application.
Each job will be given a unique identifier. There will be links to the standard job
type and the contract under which the work will take place. The contract number
will be used to find the customer's address and other details.
Once a job has been booked with a customer, the engineer and the date of visit
will be recorded. These columns are nullable because they may not be known
when the job has been requested but not yet scheduled.
On completion of a job, the number of hours taken by the engineer will be
recorded along with the charge rate for the time This information will be copied
from the engineer's daily rate but will be stored in this table for historical
purposes, as daily rates may change.
Sometimes a job will not be completed in a single visit. This may be due to a lack
of parts or because the job takes longer than anticipated. In these cases the job
will be marked as requiring a follow up. Once a follow up visit is booked as a new
Job row in the table, a reference to the new JobId will be created so that the
history of a job may be tracked.
Page 51
StandardJobId Int No
ContractNumber Int No
EngineerId Int Yes
VisitDate DateTime Yes
HoursTaken Real Yes
HourlyRate Money Yes
FollowUpRequired Bit Yes
FollowUpJobId UniqueIdentifier Yes
Notes VarChar(MAX) Yes
Parts Table
The Parts table will hold the list of standard parts that may be used by an
engineer. Each part has a cost that will be used when calculating the total cost of
a job and the profit made. Parts that are not included in the list are not
reimbursed by DisasterFix to the engineer.
UsedParts Table
In order for DisasterFix to determine their costs, the parts used during each job
must be recorded. As multiple parts may be used for a single job, the UsedParts
table will be used in a one-to-many relationship. Each row in this table links a part
number to a job and records the unit cost of the part number and the number of
units used in each case. The unit cost is recorded at the time of the job so that the
historical data is not lost if the current cost changes. The number of units used is
held as a fixed-point number to allow the engineer to specify that they used a
fraction of a unit, for example 1.2 metres of copper piping.
Page 52
PartNumber Char(8) No
UnitsUsed Decimal(6,2) No
CostPerUnit Money No
EngineerStock Table
Each engineer is expected to hold a stock of parts that they will use during their
day-to-day work. The current stock level for each part held will be stored in the
final table so that replenishment can be made as required. This will also be used
in a report that details the value of stock that an engineer has in their possession.
Page 53
Page 54
CHAPTER 6
Basic SQL Server Data Manipulation
Creating Data
As the JoBS tutorial database currently holds no data, the first task is to generate
some. As with previous SQL Server processes that have been described in this
tutorial, we can add data using SQL Server Management Studio (SSMS). Furthermore,
we can create information using either the graphical interface tools or with Transact-
SQL (T-SQL) commands.
Page 55
make an error, you can simply modify the information to update a data row or
highlight the entire row and press the delete key to remove it completely.
Once you have added all four rows, your grid should look similar to the image below.
Click the close button at the top-right of the table's window to close the grid. You can
also select "Close" from the File menu.
NB: The final row showing two NULL values appears in readiness for a fifth data row
to be added. This empty row is not stored in the table.
If the command executes correctly, you will see a response that indicates that one
row was affected.
(1 row(s) affected)
You can see the new row in the table by opening the table in the data entry mode
described earlier. Alternatively, you can list the entire contents of the table using a
Page 56
SELECT statement. Try executing the following to see the data in the query window.
NB: The SELECT statement is described in detail in the next article in this series.
VALUES
Page 57
formats should be used with care, particularly where the application may be used in
more than one country. Depending upon the local settings for the SQL Server, the
format may be incorrectly interpreted. For example, '06/07/08' could be 6 July 2008,
7 June 2008 or 8 July 2006, depending upon the local interpretation. A safer option is
the ISO 8601 format. This specifies that dates be provided as YYYY-MM-DD, eg.
'2008-07-27' for 27 July 2008.
To comply with the ISO standard, the time portion of date and time information
should be supplied in 24-hour format with the hours, minutes and seconds separated
by colons, eg. '16:20:25'. If fractions of seconds are required, these can be included.
You may also decide to omit the seconds or the entire time portion if these are not
required.
To demonstrate the creation of date and time information try the following
statements, which will create complaints. Note the use of apostrophes to delimit the
dates.
VALUES
VALUES
UniqueIdentifer Data
Globally unique identifiers (GUIDs) are thirty-two digit hexadecimal numbers with
digits organised into five groups of eight, four, four, four and twelve digits
respectively. To insert a specific GUID into a uniqueidentifier column, these groups
must be separated using hyphens (-) and the entire value must be delimited using
apostrophes. For example:
Page 58
(JobId, PartNumber, UnitsUsed, Cost)
VALUES
You can also generate GUIDs automatically when you wish to create new unique
values.
Binary Data
It is usual to insert binary data into a database using a graphical user interface tool or
application that generates the binary information and appropriate scripts in the
background. However, it is possible to provide binary data directly in T-SQL scripts in
SSMS. To do so, the binary is represented as a continuous set of two-digit
hexadecimal bytes. To indicate that this format is being used, the string of digits is
prefixed with "0x", as with the photograph column in the following example:
VALUES
Page 59
(EngineerId, EngineerName, HourlyRate, Photograph)
VALUES
VALUES
Updating Data
Once you have information in your database you can modify it using the UPDATE
statement. This command requires that you specify the table to be modified
followed by a comma-separated list of the changes that you wish to make to
individual columns. Columns that are to remain unchanged are not included in the
list. The basic syntax of the statement is as follows:
UPDATE table-name
SET column1 = value1,
column2 = value2,
...,
columnN = valueN
Using this syntax, every row in the specified table is updated. For example, to
increase the hourly rate of every engineer by one pound, execute the following
command. Note the use of the column name in a calculation to the right of the
assignment operator (=). The new value for the column can be a literal value or can
be calculated.
Page 60
UPDATE Engineers
UPDATE Engineers
The WHERE clause will be considered in greater detail in the next article in this
series.
Deleting Data
The last T-SQL data manipulation command to consider is DELETE. This command
simply removes rows from a table. As with UPDATE, a DELETE statement can be
executed without a WHERE clause to empty a table, or with a WHERE clause to
delete rows that match specified criteria.
To demonstrate, we will delete all engineers with an hourly rate over £29.50:
NB: Note the use of the "FROM" keyword. This is optional but some SQL developers
prefer to include it.
Page 61
Page 62
CHAPTER 6
BASIC SQL SERVER DATA MANIPULATION
In this chapter we will create, update and delete information in SQL Server
tables.
Page 63
Once you have added all four rows, your grid should look similar to the image
below. Click the close button at the top-right of the table's window to close the
grid. You can also select "Close" from the File menu.
NB: The final row showing two NULL values appears in readiness for a fifth
data row to be added. This empty row is not stored in the table.
The INSERT Statement
When developing software, the above graphical user interface-based method
of creating information is useful for test data. However, in the software itself
you are far more likely to want to generate T-SQL commands that create
information in response to user actions. To do so, you can use the INSERT
statement.
The simplest form of the INSERT statement is used when adding a full row of
data to a table. The syntax for the command is as follows:
INSERT INTO table-name VALUES (value1, value2, ..., valueN)
The table-name specifies the table that you wish to add a row to. The data to
be created is then provided in a comma-separated list. The list of values must
be in the same order as the columns within the table so that information is not
added in the wrong columns.
We can use this syntax to add a row to the Parts table in the database. This
table has three columns to hold an alphanumeric part number, a description of
the part and the unit cost. Using a new query window in SSMS, run the
following command. Note the use of apostrophes around the data for the first
two columns. These delimiters are required for character-based information.
Page 64
SELECT * FROM Parts
VALUES
Page 65
To comply with the ISO standard, the time portion of date and time information
should be supplied in 24-hour format with the hours, minutes and seconds separated
by colons, eg. '16:20:25'. If fractions of seconds are required, these can be included.
You may also decide to omit the seconds or the entire time portion if these are not
required.
To demonstrate the creation of date and time information try the following statements,
which will create complaints. Note the use of apostrophes to delimit the dates.
VALUES
VALUES
UniqueIdentifer Data
Globally unique identifiers (GUIDs) are thirty-two digit hexadecimal numbers with
digits organised into five groups of eight, four, four, four and twelve digits
respectively. To insert a specific GUID into a uniqueidentifier column, these groups
must be separated using hyphens (-) and the entire value must be delimited using
apostrophes. For example:
VALUES
Page 66
background. However, it is possible to provide binary data directly in T-SQL scripts
in SSMS. To do so, the binary is represented as a continuous set of two-digit
hexadecimal bytes. To indicate that this format is being used, the string of digits is
prefixed with "0x", as with the photograph column in the following example:
VALUES
VALUES
Page 67
INSERT INTO Engineers
VALUES
Updating Data
Once you have information in your database you can modify it using the UPDATE
statement. This command requires that you specify the table to be modified followed
by a comma-separated list of the changes that you wish to make to individual
columns. Columns that are to remain unchanged are not included in the list. The basic
syntax of the statement is as follows:
UPDATE table-name
SET column1 = value1,
column2 = value2,
...,
columnN = valueN
Using this syntax, every row in the specified table is updated. For example, to
increase the hourly rate of every engineer by one pound, execute the following
command. Note the use of the column name in a calculation to the right of the
assignment operator (=). The new value for the column can be a literal value or can be
calculated.
UPDATE Engineers
Page 68
UPDATE Engineers
Page 69
Page 70
CHAPTER 7:
Basic T-SQL Queries
This chapter describes the use of the SELECT statement of the structured query
language. This command allows the creation of database queries that return table
rows that meet specified criteria.
Querying Data
The power of a database management system such as SQL Server is the ability to
create both simple and complex queries. A query interrogates one or more tables,
finding all of the rows that match the specified criteria. The rows are gathered into a
set of results, formatted into rows and columns, which are returned to the process that
performed the query.
Querying of data tables in SQL Server is achieved using the SELECT statement. The
command appears deceptively simple, allowing basic queries to be generated very
quickly. However, its flexibility permits the creation of very complex queries that
combine information from many tables that match detailed criteria. It allows
calculations to be performed upon the data as it is gathered, including row-by-row
calculations and aggregation of information based upon groups of rows.
In this article we will concentrate on simple querying of a single table. More advanced
querying techniques that join the results of multiple tables, calculate aggregate
information and perform grouping will be described in later articles.
This article's examples require that you are using the JoBS tutorial database and that
the database is correctly populated with information. If you have not already created
this database, download the script from the link at the top of the page and use it to
create the database and sample data. If you have used the tutorial database, you should
drop the old version and replace it using the downloaded script to ensure that the data
is correct.
The SELECT Statement
The SELECT statement can include a variety of clauses that control the behaviour of
the command and the results that are returned. The most basic form of the statement
simply returns literal values that are not sourced from a table. These values are
provided after the command as a comma-separated list.
Try executing the following command in a SQL Server Management Studio query
window:
Page 71
other styles of query. For example, when combining the results of multiple queries
you may want to add a predefined row or a column categorising the results according
to the query from which they originated.
Selecting Data From a Table
When selecting information from a table, the name of the table must be provided. This
is specified using the FROM clause. You must also provide details of which columns
to return from the database. If you wish to return data from every column in the table,
you can use an asterisk (*) as shown below. This query retrieves every column of each
of the twenty-five rows in the Engineers table.
Page 72
The JoBS tutorial database does not include any columns that include spaces in their
names. If you are using a database with columns that have names that include spaces,
the name in the column list must be delimited using square brackets. For example:
SELECT
EngineerId AS Id,
HourlyRate AS Cost
FROM
Engineers
Page 73
SELECT TOP (10) PERCENT EngineerId, EngineerName, HourlyRate FROM Engineers
Eliminating Duplicates
In all of the above examples, each use of the SELECT statement has resulted in the
return of every row. Had there been duplicate rows in the tables, you would have seen
duplicate rows in the results. We can demonstrate this by switching to a different
table. In this case, we will use the EngineerWorkingAreas table to find the list of areas
that are serviced by DisasterFix.
Page 74
columns or can include literal strings surrounded by apostrophes, as in the next
sample, which combines the part names and numbers from the Parts table.
Type Casting
Operations may require that the values involved be of specific types. If the column
values are not of the correct type, they can be changed using the CAST function. This
function has two parameters. The first is the value to be converted and the second is
the new type, including a size where appropriate.
A simple example of the function is when concatenating text. For this combination to
work correctly, the two operands must be character types. Concatenating numeric
values is not permitted unless they are first cast to fixed or variable length character
fields. The following statement generates a full parts list including a formatted price.
The price is held in a money column that must be cast before concatenation with the
part name.
Page 75
Let's try some of these operators using the Engineers table. Firstly, we can find all of
the engineers that have an hourly rate of exactly £20.00. There should be five such
engineers.
SELECT
EngineerId AS Id,
HourlyRate AS Cost
FROM
Engineers
WHERE
HourlyRate = 20
To find all of the engineers who earn £20.00 per hour or less, try this query:
SELECT
EngineerId AS Id,
HourlyRate AS Cost
FROM
Engineers
WHERE
HourlyRate <= 20
Page 76
SELECT
FirstName,
LastName,
CreatedDate
FROM
Customers
WHERE
SELECT DISTINCT
EngineerId
FROM
EngineerWorkingAreas
WHERE
Page 77
There are several wildcards that can be included in the pattern. The first is a
percentage sign (%), which indicates that the value can include any number of any
characters in place of the wildcard, including no characters at all. This is useful for
"begins with", "ends with" or "contains" queries. For example, executing the
following statement retrieves all customers with a last name that starts with the letter
"M".
SELECT
FirstName,
LastName,
BusinessName
FROM
Customers
WHERE
SELECT
FirstName,
LastName,
BusinessName
FROM
Customers
WHERE
Page 78
position exists within the brackets. The following statement demonstrates this by
finding customers with a first name of "Dita" or "Rita".
SELECT
FirstName,
LastName,
BusinessName
FROM
Customers
WHERE
SELECT
FirstName,
LastName,
BusinessName
FROM
Customers
WHERE
SELECT
Page 79
HourlyRate AS Cost
FROM
Engineers
WHERE
SELECT
EngineerId AS Id,
HourlyRate AS Cost
FROM
Engineers
WHERE
Page 80
SELECT DISTINCT
EngineerId
FROM
EngineerWorkingAreas
WHERE
SELECT
FirstName,
LastName,
BusinessName
FROM
Customers
WHERE
Page 81
Developers who are new to SQL Server may expect that changing this query to use
the "not equal" operator, as shown below, would reverse the results. They may expect
to see the other nineteen rows. However, as the comparison returns unknown for all
comparisons with NULL, only two rows are included in the result set. This means that
two apparently opposite queries do not return every row when combined.
SELECT FirstName,
LastName,
BusinessName
FROM
Customers
WHERE
SELECT
FirstName,
LastName,
BusinessName
FROM
Customers
WHERE
BusinessName IS NULL
The opposite of "IS NULL" is "IS NOT NULL". This evaluates to true only when the
column value does not contain a null value. Therefore, you can find all of the
customers that are associated with businesses by executing the following.
Page 82
SELECT
FirstName,
LastName,
BusinessName
FROM
Customers
WHERE
Page 83
SELECT
FirstName,
LastName,
BusinessName,
CreatedDate
FROM
Customers
WHERE
BusinessName IS NULL
AND
OR Operator
The OR operator combines predicates using a logical OR operation. This allows you
to build a result set containing results that meet at least one of several criteria. Again,
consideration must be given to the effects of a predicate being unknown.
Operand 1 Operand 2 Result
true true true
true false true
false true true
false false false
true unknown true
false unknown unknown
unknown true true
unknown false unknown
unknown unknown unknown
We can modify the previous query to use the OR operator instead of AND. This time,
we will obtain all of the rows in the Customers table where the customer is not linked
to a business or where the customer was created on or after 1 January 2008.
Page 84
SELECT
FirstName,
LastName,
BusinessName,
CreatedDate
FROM
Customers
WHERE
BusinessName IS NULL
OR
SELECT
FirstName,
LastName,
BusinessName,
CreatedDate
FROM
Customers
WHERE
Page 85
BusinessName IS NOT NULL
AND
Sorting Data
The final clause to be considered in this article is ORDER BY. When added to the end
of a SELECT statement, the clause specifies the columns that should be used to sort
the results of the query. The results may be sorted by a single column or by multiple
columns specified in a comma-separated list. For example, we can sort the customers
by business name and then last name using the following query. If the business names
match, the matching rows are sorted by last name. NB: Note that NULL values appear
first in the list.
SELECT
FirstName,
LastName,
BusinessName,
CreatedDate
FROM
Customers
ORDER BY
BusinessName,
LastName
Adding the DESC keyword to one or more of the columns reverses the sort order.
This time, we will sort the customers by business name in descending order. For
duplicate business names, we will sort by first name in ascending order. NB: Note the
use of the ASC keyword for ascending order sorting. This is optional as ascending
order is the default.
SELECT
FirstName,
Page 86
LastName,
BusinessName,
CreatedDate
FROM
Customers
ORDER BY
BusinessName DESC,
LastName ASC
SELECT TOP 9
FirstName,
LastName,
BusinessName,
CreatedDate
FROM
Customers
ORDER BY
CreatedDate DESC
This does cause an interesting effect. In the case of the above query, the ninth and
tenth newest customers actually have the same CreatedDate value. In the query, SQL
Server arbitrarily selects one of these for inclusion in the results and discards the
other. This may be acceptable in some cases but often you will want to retrieve
additional results where such duplication exists. This is possible if you add the WITH
TIES clause to the TOP keyword. Now if there are duplicates that would normally be
discarded, they are included in the result set.
Page 87
The final sample statement shows the WITH TIES clause. Although the top nine
results are requested, a duplicate means that the query returns ten rows.
FirstName,
LastName,
BusinessName,
CreatedDate
FROM
Customers
ORDER BY
CreatedDate DESC
Page 88
CHAPTER 8
SQL Server Primary Keys
This chapter explains the use and creation of primary keys. A primary key is a
column, or group of columns, within a database table that provides a unique
reference for each row added to that table.
Natural Keys
A natural key is a primary key that uses naturally unique information from the table in
its constituent columns. Many database developers prefer this type of key as the use of
naturally unique data makes reading the raw data in linked tables easier. In the Parts
table below, the PartNumber column provides a naturally unique value that is an ideal
candidate for a primary key.
Page 89
15COPIPE 15mm Copper Pipe 1.52
22COPIPE 22mm Copper Pipe 2.97
10MICCOP 10mm Microbore Copper Tube 1.17
25STPTAP 25x25mm Stop Tap 5.99
This table defines a set of parts that engineers may use when performing jobs for
customers. With the part number being used as the primary key, a related table can be
defined that holds details of the stock that engineers have in their possession. In the
EngineerStock table below, the PartNumber column provides the link to the Parts
table and is easily readable by a developer who understands the data. For example, we
can see immediately that the engineer with an EngineerId of "1" has 18 units of 15mm
copper pipe.
Primary key data should be immutable. This means that once set, the information
should never be changed. Although it is possible to modify the information in a
primary key column, it is inadvisable. This is because the change would need to
be propagated to every linked table to maintain referential integrity. This usually
prevents the use of data such as a car's registration number as the primary key; if
a car's registration number is changed, the primary key information would need
to be updated.
The natural key must be unique for every row in the table and must not include
nullable information. This would prevent the use of a person's name as a primary
key as names are not unique. It is also possible that a person's name changes
through marriage or a legal name change, breaking the first guideline.
Page 90
co-pilot reference number, aircraft registration and departure date and time may
be unique but could be considered too large for use in a primary key.
SQL Server does not permit the use of large object columns in primary keys. If
you are holding binary file data in a database it may be unique but it will not be
possible to use it in a primary key.
Surrogate Keys
When no suitable natural key exists for a table, a surrogate key can be used instead. A
surrogate key is generally a single-column primary key containing a unique value.
Often the value is generated automatically. The use of such a key guarantees a unique
and immutable value for every row in the table. However, it can be more difficult to
read the information in a table when used in a foreign key.
If the previously described Parts table were updated to include an integer-based
surrogate key, the data may appear as follows:
PartsKey PartNumber PartName UnitCost
1 15COPIPE 15mm Copper Pipe 1.52
2 22COPIPE 22mm Copper Pipe 2.97
3 10MICCOP 10mm Microbore Copper Tube 1.17
4 25STPTAP 25x25mm Stop Tap 5.99
If the new surrogate key is used in relationship in the EngineerStock table, the stock
data is more difficult to read. To determine the number of units of 15mm copper pipe,
the data in both tables must be compared.
EngineerId PartsKey UnitsHeld
1 1 18
1 2 15
2 2 9
2 4 1
Surrogate keys add a small number of bytes to the storage requirement for every row
in the table in which they are defined. This can increase the overall size of the
database. However, if the size of the surrogate key is small, this size increase may be
offset by the decrease in size of related tables, when compared to the use of larger
natural keys.
Surrogate keys are often defined as integers or globally unique identifiers (GUIDs). In
the case of integers, the size of integer column selected is dependant upon the
anticipated number of rows that will be required in the table. The number usually
Page 91
starts at one and increments by one for each row added to the table. This is ideal for
databases that are only used in on-line scenarios. If a database includes
synchronisation for offline use, a GUID may be more appropriate to reduce the risk of
duplicated values being created during disconnected use.
Creating a Primary Key
Primary keys can be added to tables using various methods. In this article we will
concentrate on two options: adding a primary key using the SQL Server Management
Studio graphical user interface tools and creating a key using Transact-SQL
statements.
The examples below add primary keys to all of the tables in the JoBS database that we
have created over the course of the tutorial. If you have not followed the tutorial,
download the script using the link at the top of this page and use it to create the
database and some sample data. You should also use this script to recreate the
database if you have followed the tutorial so far, as is includes modifications to the
schema and data not described in the previous article.
Creating a Primary Key Using the SQL Server Management Studio
During development of a new database, SQL Server Management Studio (SSMS)
tools can be used to create primary keys. To demonstrate, we will add a primary key
to the Engineers table. To begin, open SSMS and expand the Object Explorer so that
you can see the Engineers table. Right-click the table name and choose "Design" to
open the table designer.
To add a single-column primary key, you simply select the column in question and
choose "Set Primary Key" from the Table Designer menu. You can also use the Set
Primary Key toolbar button. To try this, ensure that the EngineerId column is selected
by clicking the first row selection button. Use the menu option or toolbar button to set
the primary key. You will see a small key icon alongside the column definition.
NB: The EngineerId column provides a surrogate key as this information has been
added to give a unique reference.
If you mistakenly add a primary key, you can remove it using the same menu option
or toolbar button. You will notice that the description of the menu item, or the tool tip
for the button, is updated to show that they will remove the primary key. Once the key
is correct, save the changes to the table design and close the table designer window.
Now that the Engineers table has a primary key, it is impossible to create rows with
duplicates in the EngineerId column. To see what happens when you try, open a new
query window for the JoBS database and execute the following T-SQL statement:
Page 92
INSERT INTO Engineers VALUES (1, 'Bill Jones', 19.85, null)
The primary key prevents the new row being created and reports an error. NB: Note
the default name of PK_Engineers that has been applied to the new primary key
constraint and the error includes a multipart name.
Violation of PRIMARY KEY constraint 'PK_Engineers'. Cannot insert duplicate
key in object
'dbo.Engineers'. The statement has been terminated.
If you review the contents of the table you will see that the new row was not stored.
Creating a Composite Key Using the Table Designer
A composite key is a primary key containing more than one column. Once defined,
the combined value of all of the primary key columns must be unique but the
individual columns may include duplicates. This is the type of key that is required for
the EngineerStock table. In this table the EngineerId may be repeated, as may the
values in the PartNumber column. However, we want to restrict the data so that the
combination of these columns is always unique. This will ensure that there is only one
stock figure per engineer and part.
In the table designer for the EngineerStock, select both the EngineerId and
PartNumber column. To do this, select the EngineerId column first and then click the
row selector button for the PartNumber column whilst holding the Ctrl key. Once both
columns are highlighted, click the Set Primary Key toolbar button or menu item. A
key icon will appear alongside both columns to indicate that the primary key has been
set. You can now save the table and close the designer window.
Page 93
NB: This is a simplified syntax for the creation of keys with default settings.
Additional settings related to the primary key's index exist and will be examined in a
later article describing indexes.
To use the above syntax within the creation of the Engineers table, you could execute
the following statement:
Photograph VARBINARY(MAX)
EngineerId
)
NB: The above statement will fail as the Engineers table already exists.
Altering an Existing Table
A primary key can be added to an existing table using the ALTER TABLE statement
with the ADD clause. The syntax for the constraint element is the same as for when
creating tables:
ALTER TABLE table-name ADD
CONSTRAINT constraint-name PRIMARY KEY CLUSTERED (column1, column2, ...,
columnN)
We can test this syntax by adding a primary key to the Parts table. This table has a
natural key in the unique part number. To set this column to be the primary key,
execute the following statement:
ALTER TABLE Parts ADD CONSTRAINT PK_Parts PRIMARY KEY CLUSTERED (PartNumber)
Page 94
The remainder of the primary keys can now be added to the JoBS tutorial database.
Use a combination of the graphical user interface tools and Transact-SQL to add the
following primary keys.
Table Column(s)
Contracts ContractNumber
Customers CustomerNumber
CustomerAddresses AddressId
CustomerComplaints ComplaintReference
EngineerSkills EngineerId, SkillCode
EngineerWorkingAreas EngineerId, AreaCode
GeographicalAreas AreaCode
Jobs JobId
RequiredSkills StandardJobId, SkillCode
Skills SkillCode
StandardJobs StandardJobId
UsedParts JobId, PartNumber
Page 95
Page 96
Chapter 9
1
Page 97
Foreign keys create one-to-many relationships. These are so-named because one row
in the parent table can be linked to many rows in the child table. However, each entry
in the child table may only be linked to a single row in the parent. Another type of
relationship is the many-to-many link. In this scenario, many rows in either the parent
or child table may link to many rows in the other table.
Many-to-many relationships cannot be created directly in most relational database
management systems, including SQL Server. To create this type of link, a third table
is required. This table, known as a junction table, includes a foreign key to each of the
tables that you wish to link. The combination of two one-to-many relationships has
the effect of a single many-to-many link.
You can see how a many-to-many link operates below. In this case the Employees
table links to the Skills table via a junction table. This allows each employee to have
multiple skills and each skill to be assigned to any number of employees. In the
example, Chris Green is proficient in both C# and Visual Basic. The C# skill can be
provided by two employees: Arthur Brown and Chris Green. Note also that it is
possible for a row to have no related items, as in the case of the SQL Server skill.
No Action. This is the default setting, which does not allow cascading updates.
Cascade. When updates are set to cascade, a change in the candidate key of the
parent table causes all matching rows in the child table to be updated with the
same value, correctly retaining all links.
2
Page 98
Set NULL. With this setting, any change to a parent's key changes matching
foreign key values to NULL.
Set Default. When using this option, changes to a parent's key causes matching
foreign key values to be changed to their default values.
Cascading deletes are similar to cascading updates. When configured, deleting a row
in the parent table causes matching rows in the child table to be deleted, or matching
foreign key columns to be set to NULL or default values. The actual action depends
upon the setting selected.
3
Page 99
To create the new foreign key relationship we will first specify the linking columns.
To do so, click the Tables and Columns Specification option. An ellipsis button will
appear in the property box. Click the button to display the Tables and Columns
selection dialog box.
The Tables and Columns dialog box is used to select the parent and child tables and
the candidate key and foreign key columns to be linked. The foreign key table cannot
4
Page 100
be changed, as it is the table that is currently being edited. From the primary key table
drop-down box, select "Customers".
With the two tables selected, you must select one or more columns from the parent
table in the left column of the grid and then select the matching foreign key columns
in the right column. In this case, select CustomerNumber in both columns, as shown
in the figure above. Click the OK button to store the selections and return to the
previous dialog box.
The remaining options for the new foreign key relationship are as follows:
Name. The Name property allows you to create a unique name for the foreign
key. Replace the provided name with "FK_CustomerAddresses_Customers".
Description. The Description property allows you to add some comments to the
relationship. This property does not modify the behaviour of the relationship but
can be useful when other developers or database administrators are viewing the
foreign key definition.
Enforce for Replication. This option determines whether the foreign key rules
should be enforced when changes to the data are made by SQL Server replication
processes. Replication is beyond the scope of this tutorial. Set this option to Yes.
INSERT and UPDATE Specification. This property contains two settings that
determine the cascade options when updating and deleting information in the
parent table. Set both of these options to "No Action".
To accept the settings for the new foreign key, click the Close button. Next, save the
table definition and close the table designer. You may be asked to confirm the
changes to the two tables. If so, click the Yes button to complete the save.
We can test the foreign key by attempting to create invalid data or by trying to delete
rows from the parent table that are referenced in the child table. As a first test, try
executing the following insert statement. This references a customer with a customer
number of 99. As there is no such customer, the INSERT fails.
5
Page 101
(CustomerNumber, Address1, TownOrCity, AreaCode, Postcode)
VALUES
6
Page 102
Engineers EngineerId Jobs EngineerId
GeographicalAreas AreaCode CustomerAddresses AreaCode
GeographicalAreas AreaCode EngineerWorkingAreas AreaCode
Jobs JobId Jobs FollowUpJobId
Jobs JobId UsedParts JobId
Parts PartNumber EngineerStock PartNumber
Parts PartNumber UsedParts PartNumber
Skills SkillCode EngineerSkills SkillCode
Skills SkillCode RequiredSkills SkillCode
StandardJobs StandardJobId Jobs StandardJobId
StandardJobs StandardJobId RequiredSkills StandardJobId
7
Page 103
Page 104
Chapter 10
Transact-SQL Joins
Joins
In the previous two articles we have examined the process of normalising a database
and creating foreign key relationships between tables that are logically linked. When
you are working with a normalised relational database, it is essential that you can
perform queries that gather data from several linked tables but return the matching
data in a single set of results. This is achieved with the use of joins.
A join is a clause within a SELECT statement that specifies that the results will be
obtained from two tables. It usually includes a join predicate, which is a conditional
statement that determines exactly which rows from each of the tables will be joined by
the query. Usually the join will be based upon a foreign key relationship and will only
return combined results from the two tables when the key values in both tables match.
However, it is possible to join tables based upon non-key values or even perform a
cross join that has no predicate and returns all possible combinations of values from
the two tables.
Usually normalisation results in a database with many related tables. You may
therefore want to join more than two tables for a single set of results. In this case, you
can include multiple joins in a SELECT statement. The first join combines the first
two tables. The second join combines the results of the first join with a third table, and
so on.
JoBS Database
The examples in this article use the JoBS tutorial database. If you do not have a copy
of the JoBS database available, download the script using the link at the top of this
page. You can execute the script to create and populate a new database.
Performing Queries with Joins
Inner Joins
Possibly the most common form of join that you will use is the inner join. With this
type of join, the two tables are combined based upon a join predicate. Wherever a row
in one table matches a row in the other, the two rows are combined and added to the
outputted results. If a row in either table matches several in the other, each
combination will be included in the results. If a row in either table does not match any
in the other table, it will be excluded from the results altogether.
The join clause, second table name and join predicate are included in the SELECT
statement immediately after the name of the first table. The join uses the INNER
JOIN clause and the predicate uses the ON clause. The basic syntax for the SELECT
statement is as follows:
8
Page 105
SELECT columns FROM table-1 INNER JOIN table-2 ON predicate
As an example, we can join the Jobs and Engineers tables. The following query
returns a list of jobs and the details of the engineer that performed the work. In this
case, the predicate specifies that the tables will be joined only where the EngineerId in
both tables is a match. These columns are used in a foreign key relationship definition
for the two tables, although this is not a requirement for the join to be executed.
However, if non-key columns are frequently used in join operations, you should
consider adding appropriate indexes to improve query performance.
SELECT
EngineerId,
JobId,
VisitDate,
EngineerName
FROM
Jobs
INNER JOIN
Engineers
ON
Jobs.EngineerId = Engineers.EngineerId
9
Page 106
This query fails because the EngineerId column appears in both tables. You must
therefore specify which table's EngineerId column you require by prefixing it with the
table name and a full-stop (period) character. The following query resolves the
problem:
SELECT
Jobs.EngineerId,
JobId,
VisitDate,
EngineerName
FROM
Jobs
INNER JOIN
Engineers
ON
Jobs.EngineerId = Engineers.EngineerId
When using joins, you can also use WHERE clauses, ORDER BY clauses, etc. As
with the column selection, you must specify the table name for any columns with
ambiguous names. For example:
SELECT
Jobs.EngineerId,
JobId,
VisitDate,
EngineerName
FROM
Jobs
INNER JOIN
Engineers
10
Page 107
ON
Jobs.EngineerId = Engineers.EngineerId
WHERE
Jobs.EngineerId = 4
OR
Jobs.EngineerId = 8
ORDER BY
Jobs.EngineerId
SELECT
Jobs.EngineerId,
JobId,
VisitDate,
EngineerName
FROM
Jobs, Engineers
WHERE
Jobs.EngineerId = Engineers.EngineerId
Generally speaking, the syntax that you use for inner joins is a matter of personal
preference or will be proscribed by your organisation's coding standards. Many people
prefer the explicit syntax as the join predicates are separated from those in the
WHERE clause and are kept closer to the names of the tables that they are acting
upon. It is also easier to change the type of join with the explicit syntax. It is useful to
11
Page 108
understand both variations, as you are likely to encounter each syntax in real-world
scenarios.
Using Table Aliases
Providing the full name of the table as a prefix to column names can lead to long
query statements. This is especially true if, to make the query more easily understood,
you prefer to prefix every column name rather than just those that are ambiguous. In
such circumstances, it is useful to replace the table names with table aliases.
A table alias provides a short code, usually of one or two letters, that can be used in
place of a table name. Each alias is defined in the query after the full table name. In
the next example, the "J" alias represents the Jobs table and the Engineers table has
the alias "E". This makes the query much more readable.
SELECT
J.EngineerId,
J.JobId,
J.VisitDate,
E.EngineerName
FROM
Jobs J
INNER JOIN
Engineers E
ON
J.EngineerId = E.EngineerId
Self-Referencing Joins
Table aliases are always required when creating self-referencing joins, where the two
tables that are being joined are actually the same table. In the following example, the
Jobs table is being joined to itself so that the initial job and follow up job can be
combined in a single row in the results. The initial job's table has an alias of "J",
whilst the follow up job uses the "JF" alias.
SELECT
J.JobId,
12
Page 109
J.StandardJobId,
J.EngineerId,
JF.JobId AS FollowUpJobId,
JF.StandardJobId AS FollowUpStandardJobId,
JF.EngineerId AS FollowUpEngineerId
FROM
Jobs J
INNER JOIN
Jobs JF
ON
J.FollowUpJobId = JF.JobId
SELECT
E.EngineerName,
E.HourlyRate,
E.OvertimeRate,
S.SkillName
FROM
Engineers E
INNER JOIN
13
Page 110
EngineerSkills ES
ON
E.EngineerId = ES.EngineerId
INNER JOIN
Skills S
ON
ES.SkillCode = S.SkillCode
ORDER BY
S.SkillName
Outer Joins
Outer joins use a similar syntax to explicit inner joins but provide different results in
some cases. The key difference is that outer joins include all of the rows from one or
both tables, even if there are no matching rows defined by the join predicate. Where
information is missing, NULL values are substituted in the columns of the returned
results. The easiest way to explain the difference is with an example.
Firstly, let's consider an inner join. The query below joins the CustomerComplaints
table to the Engineers table so that each complaint can be shown alongside the
engineer that was at fault. Although there are three complaints in the database, only
one row is returned by the query. This is because the other two rows define
complaints that were deemed not to be the engineer's fault. In these cases the
EngineerId column in the CustomerComplaints table is set to NULL.
SELECT
E.EngineerName,
C.Complaint
FROM
CustomerComplaints C
INNER JOIN
Engineers E
14
Page 111
ON
C.EngineerId = E.EngineerId
EngineerName Complaint
Joey Ohara Engineer did not have appropriate parts and was
rude.
This may not be the information that we require from the query. If we actually want to
list every complaint in the database, showing the engineer's name only where an
engineer is associated with the complaint, we must use an outer join. In this case, we
will use a left outer join. This indicates that all of the values from the table to the left
of the join clause will be returned. If there is no associated row from the table on the
right of the join, the columns from that table will be included but will contain NULL
values.
To demonstrate, execute the following command, noting that the join clause has been
modified to LEFT JOIN:
SELECT
E.EngineerName,
C.Complaint
FROM
CustomerComplaints C
LEFT JOIN
Engineers E
ON
C.EngineerId = E.EngineerId
The results from this query contain all three complaints. For the two complaints not
associated with an engineer, the EngineerName column's value is NULL.
EngineerName Complaint
NULL Customer has received an incorrect charge on their direct debit account.
NULL Customer does not wish to receive direct marketing information.
Joey Ohara Engineer did not have appropriate parts and was rude.
The opposite of a left outer join is a right outer join. As you may imagine, this returns
all of the rows from the table on the right of the join clause with null values for
15
Page 112
columns from the table to the left of the clause when no matching row exists. If you
alter the previous query to use a right join, all twenty-five engineers will be returned.
One of these engineers, Joey Ohara, will have a complaint listed, whilst all of the
others will have NULL in the Complaint column.
SELECT
E.EngineerName,
C.Complaint
FROM
CustomerComplaints C
RIGHT JOIN
Engineers E
ON
C.EngineerId = E.EngineerId
The third style of outer join is the full outer join. This type of join ensures that every
row from both tables is included in the final results. If any row in either table does not
have a matching partner, the row is populated with NULL values for the missing data.
Try modifying the query to use a full join as shown below. This time you will see
twenty-seven returned rows. One will be a complaint with a matching engineer, two
will be complaints without engineers and twenty-four will be for engineers with no
complaints made against them.
SELECT
E.EngineerName,
C.Complaint
FROM
CustomerComplaints C
FULL JOIN
Engineers E
ON
16
Page 113
C.EngineerId = E.EngineerId
Cross Joins
Cross joins are the simplest form of join and possibly the least used. A cross join does
not define a join predicate. This means that the results contain every possible
combination from the two tables. This is the Cartesian product of the table so this type
of join is often called a Cartesian join.
If we modify the outer join query to provide a cross join, all combinations of data are
returned. As there are twenty-five rows in the Engineers table and three rows in the
CustomerComplaints table, the resultant list contains seventy-five rows.
SELECT
E.EngineerName,
C.Complaint
FROM
CustomerComplaints C
CROSS JOIN
Engineers E
NB: Cross joins should be used with care, particularly with large tables as the
number of results can grow very quickly.
Implicit Cross Joins
As with inner joins, cross joins have an implicit syntax variation. This is the same
syntax as for inner joins but with no join predicate in the WHERE clause:
17
Page 114