SQL 2012 - Good Tut With TOC

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 97

Table of Contents

Part I

1. Table of Contents
2. Introduction
3. Installing the sample database
4. Defining a query
5. Our first query; the SELECT statement
o 5.1. Aliasing
6. Eliminating duplicates; DISTINCT
7. Filtering data; the WHERE clause
o 7.1. The NOT keyword
o 7.2. Combining predicates
o 7.3. Filtering strings
o 7.4. Filtering dates
o 7.5. The IN and BETWEEN keywords
8. Sorting data; ORDER BY
9. Further limiting results; TOP and OFFSET-FETCH
o 9.1. TOP
o 9.2. OFFSET-FETCH
10. Aggregating data; GROUP BY and HAVING
11. Selecting from multiple tables; using JOINS
o 11.1. CROSS JOIN
o 11.2. INNER JOIN
o 11.3. OUTER JOIN
o 11.4. Self JOIN
o 11.5. Multiple JOINS
12. Multiple groups; GROUPING SETS
o 12.1. GROUPING SETS
o 12.2. CUBE
o 12.3. ROLLUP
o 12.4. GROUPING
o 12.5. GROUPING_ID
13. Windowing functions; the OVER clause
o 13.1. Aggregate functions
o 13.2. Framing
o 13.3. Ranking functions
o 13.4. Offset functions
14. That's not it yet...

Part II
1. Table of Contents
2. Welcome back!
3. Queries in your queries; Subqueries
o 3.1. Writing the same queries differently
o 3.2. More filtering options; IN, ANY, SOME, ALL and EXISTS
 3.2.1. IN
 3.2.2. ANY and SOME
 3.2.3. ALL
 3.2.4. EXISTS
4. Querying from subqueries; Derived tables
5. Common Table Expressions a.k.a. CTEs
6. Set operators; UNION, INTERSECT and EXCEPT
o 6.1. Combining sets; UNION and UNION ALL
o 6.2. Recursion with CTEs and UNION ALL
o 6.3. INTERSECT
o 6.4. EXCEPT
7. Pushing over tables; PIVOT and UNPIVOT
o 7.1. Pivoting
o 7.2. Unpivoting
8. More uses for Table Expressions; APPLY
o 8.1. CROSS APPLY
o 8.2. OUTER APPLY
9. Other aspects of querying
o 9.1. Converting types; CAST and CONVERT, PARSE and FORMAT
 9.1.1 CAST and CONVERT
 9.1.2 PARSE
 9.1.3 FORMAT
o 9.2. VARCHAR functions
o 9.3. DATETIME functions
o 9.4. CASE and IIF
 9.4.1 CASE
 9.4.2 IIF
o 9.5. COALESCE, ISNULL and NULLIF
10. Conclusions

2. Introduction
I am currently studying for my Querying Microsoft SQL Server 2012 exam (70-461) (Update: I passed
the exam, whoohoo! :-) ). I've had a couple of years experience with SQL Server (2005 and 2008), but
I never really got the chance to really get into it. As a result I can find my way around querying
tables, applying where clauses, grouping data etc., but I never got really good at it. Whenever stuff
gets a little harder I have to ask Google or a coworker for help. However it seems I need to use SQL
Server more and more on a daily basis and as such I decided to at least read a book about it. The
book of my choosing was Querying Microsoft SQL Server 2012.
So I am now, at the time of this writing, studying hard for my exam. And what better way is there to
learn than by writing about the subject? So that's exactly what I decided to do. Write about it.
Everything I am going to write here is discussed in the book I am reading (actually chapters 1
through 5), it is probably discussed in many other books as well and it is discussed on the internet,
on CodeProject and countless other websites.

So why should you still read THIS article and not the book I just mentioned or one of those other
articles? Well, for starters, my article is free and most books are not. Second, this article also reads
faster than most books. Third, this article discusses many query clauses in a structured manner
(something many websites lack). Next, perhaps the views and writings of someone who is still
learning might be well suited for people who are still learning themselves. Following, this article has
many examples that you can copy/paste into your SQL Server query window to try it out. And last,
but not least, I even put in some screenshots!

So what is this article all about? It is about selecting data from a database. That may sound boring,
but there is actually a lot to say about what you can do in a single SELECT statement. So much
actually that I have decided to split this article in to two parts.
The first part, which you are reading now, focuses on selecting data from the database using a
single SELECTstatement. We will get a little more advanced with grouping and windowing functions.
At the end of this part of the article you should be able to create queries that select, filter, group and
order data from one or multiple tables.
The second part is about using more than one SELECT statement while still returning a single result
set.
You can read the second article on CodeProject: Querying SQL Server 2012: Part II.

In this article I assume some very basic knowledge about databases. Knowing how to install a
database, opening up a query window, running queries (F5) and browsing tables is a benefit.
Creation of tables, indexes and other objects as well as inserting, modifying and deleting data are
not covered in this article.

3. Installing the sample database


As said I am studying for a SQL Server 2012 certificate. So of course you are going to need SQL
Server 2012. I have downloaded the free SQL Server 2012 Express Edition which is also what I have
used for writing this article. Unfortunately I cannot help you with installing SQL Server. However,
chances are you already have it installed anyway. If you have not you can find it by clicking the
following link: Download SQL Server 2012 Express. Click the big red button that says "Download SQL
Server 2012 Express" and choose the version you want. I am not certain what the differences are
(after all I am only learning this myself), but I chose "Sql Server Express with Advanced Services"
because it looked like it has everything one might need. So download it, install it and read on when
you are done... Go ahead, I will wait.

Done? Next thing you need to do is install a sample database for you to work with. It is probably
best if you install the same database I am using, which is the AdventureWorks2012 sample database.
Getting this was actually harder than installing SQL Server 2012. You can get it on the SQL Server
Sample Database CodePlex website. At the time of this writing you can choose 14(!) downloads.
What the...!? So I picked a few and got successfull with the AdventureWorks2012-Full Database
Backup.zip (near the bottom of list of downloads). I would have included it in this article, but CP
kindly asks me to keep files below 10 MB.

So download the zip file and unzip the .bak file inside. Luckily I can help you with getting this into
SQL Server. Place the .bak file in your SQL Server backup folder. If you have not changed anything
during installation it should be in C:\Program Files\Microsoft SQL
Server\MSSQL11.SQL2012EXPRESS\MSSQL\Backup. Now go to your SQL Server 2012 instance, open
up the Object Explorer (if it is not opened already) and right click on the Database folder. In the
context menu go to Restore Database.

You should now be in the Restore database window.


Click on "Device" and then click on the button behind it. You should now see the Select backup
devices window. Click "Add" and browse to the .bak file you just unzipped (if you don't see it
immediately). Select the .bak file and click "OK". Make sure you have no other files selected in the
Select backup devices window. Click "OK" and you should be back in the Restore database window.
You can review your options and click "OK" again. The AdventureWorks database should now be
restored and ready for use.

If you have better alternatives for getting the database up and running please let me know in the
comments section.

4. Defining a query
So now that you have AdventureWorks up and running we can actually begin writing our first query.
Before we do that I want you to think a moment about what a query is. Seriously, think about it.
Wikipedia has the following to say about it; "In general, a query is a form of questioning, in a line of
inquiry". Alright, so we are going to question the database.

So how exactly are we going to question the database? When you ask a friend to get you something,
say a drink, you would probably ask your friend something like "Can you get me a drink?". If you
want to be polite you would insert the word "please" somewhere in that sentence, but it would not
really alter the meaning of your question, or query. Of course a database would not understand such
a question. A database can only 'understand' a predefined set of statements in a particular order, it
is structured. Now guess what, SQL actually stands for Structured Query Language! This language is
actually the part about databases I will be explaining to you in this article.
I should mention that SQL Server does not use the ANSI and ISO SQL standard syntax completely.
Instead it uses T-SQL (Transact-SQL), which is one of the dialects of the SQL standard. This basically
means that the queries in the article might also work in other SQL databases such as Oracle, MySQL,
PostgreSQL and Firebird, but that this is not guaranteed (also because all these other databases have
their own dialects as well). This article is not about differences in SQL dialects, but I did want to
mention it here.

Before we start we need to know what we are going to query. As you might know SQL Server is
a relational database. It is called this because it stores data in relations (and not because there is a
relation between one piece of data and another, as many think). A relation is a mathematical term for
a collection of unique and unordered tuples, or sets of values (or attributes). In SQL Server these are
represented in tables, a collection of rows and columns. These tables are exactly what we are going
to query. The following picture shows a part of the Production.Product table from the database we
just installed.

Let us take a more practical look at this. Let us assume we want to store our CD collection in a
database. One table could be a collection of CD objects. For simplicity let us say that a CD has an
artist, a title and a release date. So our table is now a collection of tuples holding two names (an
artist name and a title) and a date (release date). We can now query the database for all CD's by the
artist John Williams or that were released in 2012. Or let us say a CD also has an attribute 'number of
songs'. We might want to know the sum of all songs of all CD's by John Williams. Or perhaps the
average number of songs on a CD (by artist or in total).

How to create, fill and maintain tables is not in the scope of this article. The reader is expected to
have at least some knowledge about what a table is and what it looks like.

5. Our first query; the SELECT statement


I hear you thinking "when do we FINALLY get to see some SQL!?". Well, now.

Hide Copy Code


SELECT 'Hello, query!'

Remember when you asked your friend for a drink? You could have said "GET Drink". He probably
would not have gotten it with that tone (which is commanding rather than asking). Luckily SQL
Server does anything you ask. Except we need to replace the GET in this example with SELECT.
The SELECT keyword is the first word you will use in most queries. Actually any query consists of at
least the SELECT statement. In the above query the database simply returns a table with one row
and one column with the value 'Hello query' (excluding the apostrophes).

When selecting data we need to seperate the 'things' we want to select with a comma. The next
example produces one row with two columns. In the SELECT list the two values are seperated by a
comma.

Hide Copy Code


SELECT 'Hello, query!', 'A second value'

Of course the above query is not very useful. We want to query actual data stored by the database.
In order to get this data we need to first indicate where the database should get the data. This is
done by using the FROMstatement. Let us say we want to select some data about products from the
AdventureWorks2012 database.

Hide Copy Code


SELECT ProductID, Name, ProductNumber FROM Production.Product

Actually that query makes a lot of sense, doesn't it? ProductID, Name and ProductNumber are
columns of the Production.Product table (Production is actually the namespace of the
table Product, the namespace and table name need to be unique across a database). Notice again
that the columns are seperated by a comma. The result of the query is a table with three columns
and as many rows as you have products in the Production.Product table.

A shortcut to select all columns from a table is the following syntax.


Hide Copy Code
SELECT * FROM Production.Product

This will return everything from the Production.Product table. It is strongly recommended to
NOT use this syntax though. This has mainly to do with underlying table definitions and is outside
the scope of this article. However, I have written about a little real-life experience using SELECT *,
so if you are interested you can read it: The Evil That is "Select *". Having said that I am still going to
use this syntax a lot in this article, just don't use it in production code!

The list of columns to select can appear in any order and do not have to be in the order they appear
in columns. The following query shows an example.

Hide Copy Code


SELECT Color, ProductNumber, Name
FROM Production.Product

So let us take a look at the general SELECT... FROM syntax. In general you can say it look like this:

Hide Copy Code


SELECT <List of columns> FROM <Table name>

5.1 Aliasing

There are a few more things you can do with even the simplest of queries. One of them is aliasing.
We are actually going back to the very first query, SELECT 'Hello, query!'. Take a look at the output,
do you notice something?

The one column returned by this query has no name! Of course it has no name. When we select
columns from a table SQL Server takes over the names of the selected columns in the result by
default. We did not select from a table in this example though, and as a result SQL Server does not
know how to name the column. See what happens when you select an additional value. You now
have two columns without a name. Luckily we are not forced to talk about columns like He Who
Must Not Be Named. We can actually assign aliases to columns. There are a few ways in which you
can do this and I am going to show you one (because it is the recommended syntax).
The AS <alias> syntax can give new names to your result columns. Notice that the AS keyword is
optional.

Aliasing is also allowed when selecting columns from a table.

Hide Copy Code


SELECT
ProductID AS ID,
Name AS ProductName,
ProductNumber AS ProductNumber,
Color
FROM Production.Product

As you can see it is possible to give new names to your result columns (such
as ID and ProductName), alias with the same name as the column (ProductNumber simply
stays ProductNumber) or not alias at all (Color is not aliased). Aliasing columns becomes a must
when you are going to manipulate data or use functions, which we will see later on in the article.
Column aliases cannot be used in other parts of your query such as WHERE, HAVING and ON clauses
(which are discussed later in this article). The only exception to this rule is the ORDER BY list (which
will also be discussed later).

It is also possible to alias tables. This is especially useful when selecting from multiple tables. I will
come back to that later.

So the general syntax for querying tables now looks as follows:

Hide Copy Code


SELECT
<Column name> (AS Alias)
...
FROM <Table name> (AS Alias)

We have now looked at the very basics of querying in SQL Server 2012. In the next chapter we are
going to look at how we can limit our results.

6. Eliminating duplicates; DISTINCT


Sometimes queries can return duplicate rows. Many products have the same color. If we would select
only the color from the products table we would get the color of each individual product. This looks
as follows:

Surely this is not what we intended. This is also not relational. A relation is a collection of unique and
unordered values, remember? Luckily it is fairly easy to remove duplicate results using
the DISTINCT keyword. The DISTINCT keyword is placed right after SELECT. The following query
demonstrates this.
All products have nine colors in total (NULL is not a color, this is discussed in the next section).

7. Filtering data; the WHERE clause


Of course you will want to write more advanced queries. You might want to select all the products of
a specific color, products that are more expensive than $1000 or products that do not have a price at
all. This is called filtering data.

Filtering in SQL Server can be done in three ways, using the ON, WHERE and HAVING clauses. In this
chapter I will discuss the WHERE clause. The ON and HAVING clauses will be discussed later in this
article.

Filtering in SQL is based on predicates. A predicate is another mathematical term meaning a function
that returns TRUE or FALSE (also called Boolean). For example, imagine a function that checks if a
person is older than 18. Such a function would return true of false, someone is either older than 18
or he is not.

Let us look at a simple example in SQL. We want a list of all products that are more expensive than
$1000. We can do this using the WHERE clause.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE ListPrice > 1000

Running this query returns 86 of the 504 products in the Production.Products table. Notice how
you can use the following signs for comparison:

Hide Copy Code


> = greater than
< = less than
>= = greater than or equal to
<= = less than or equal to
<> = not equal to (equal to != in C-based languages)
= = equal to (equal to == in C-based languages)

You are probably already used to this kind of syntax. SQL supports it too (among others). In this
example ListPrice > 1000 is the predicate. When using the WHERE clause SQL discards all results
where the predicate returns FALSE (in this case the predicate returns FALSE when a
products ListPrice is not greater than 1000).

There is a catch however. Consider the following query:

Hide Copy Code


SELECT *
FROM Production.Product
WHERE Weight >= 0

We have 504 products in the Production.Product table and there are no products with a negative
weight. Still this query only returns 205 results. What happened to 399 products!? If you take a look
at the product table you might notice that many cells have the value NULL. This is where stuff gets
tricky. NULL is actually not a value. It is an indication that the attribute of this tuple (or column of this
row) has no value. So in the above query how are we going to compare NULL (or no value) to 0? We
might argue that in this case NULL is equal to 0, but SQL Server simply treats NULL as not equal to
anything else. NULL is not even equal to NULL. Although the following syntax is not covered in this
article it should be pretty straightforward.

Luckily not all is lost and we can use the word IS to compare values with NULL.

So I now have to correct myself. Earlier I said a predicate is a function that returns TRUE or FALSE.
When working with NULLs it can return an additional value, UNKNOWN. This is called Three-Valued
Logic. SQL discards results where a predicate in the WHERE clause returns FALSE or UNKNOWN.

The following query returns all products where Weight has no value (some might say "where the
value is NULL" although that is not completely correct).

Hide Copy Code


SELECT *
FROM Production.Product
WHERE Weight IS NULL
It is possible to filter data by columns that are not in your SELECT list. Consider the following query
which is completely valid and also returns a correct result (that is the result we would expect it to
return).

Hide Copy Code


SELECT ProductNumber, Name
FROM Production.Product
WHERE ListPrice > 1000

7.1 The NOT keyword

To negate a predicate you can use the NOT keyword. For example, say you want products that are
not red. You now have two options, using <> for comparison or using = combined with
the NOT keyword.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE Color <> 'Red'

Or:

Hide Copy Code


SELECT *
FROM Production.Product
WHERE NOT Color = 'Red'

Both queries return exactly the same results.

You can also use NOT in an IS NULL statement as IS NOT NULL. So the following queries return the
same results

Hide Copy Code


SELECT *
FROM Production.Product
WHERE Weight IS NOT NULL

Returns the same result as:

Hide Copy Code


SELECT *
FROM Production.Product
WHERE NOT Weight IS NULL

The following query shows perfectly valid syntax, but you should beware of such queries. It returns all
rows where Weight has no value, but the double negative makes it confusing.
Hide Copy Code
SELECT *
FROM Production.Product
WHERE NOT Weight IS NOT NULL

7.2 Combining predicates

We can combine predicates to create more advanced filters. We can do this using
the AND or OR keywords. The following query returns all products where the ListPrice is greater
than 1000, but only if it also has the color red.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE ListPrice > 1000
AND Color = 'Red'

Try it and see the results for yourself. We know that 86 products are more expensive than $1000. This
query returns 20 rows, so 20 products are more expensive than $1000 and have a red color.

Next to using the AND keyword we can also chain predicates by using OR. The following query returns
all products that are more expensive than $1000 or have a red color. This means that products that
are $1000 or less expensive are also returned in the result, but only if they are red.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE ListPrice > 1000
OR Color = 'Red'

Now look at the following query. Will it return black and red products that are more expensive than
$1000 or will it return red products that are more expensive than $1000 and also any priced black
products?

Hide Copy Code


SELECT *
FROM Production.Product
WHERE ListPrice > 1000
AND Color = 'Red'
OR Color = 'Black'

When you run this query you can see that it is the second, the query returns red products that are
more expensive than $1000 and also any priced black products. The AND keyword takes precedence
over the ORkeyword. Because this is not always clear it is considered best practice to use parenthesis,
even when the query already does what you want. Parenthesis always have the highest precedence.
That means that what is between parenthesis is evaluated first.
So let us say we did not want red products more expensive than $1000 and all black products.
Instead we wanted all products more expensive than $1000 that are either red or black. With
parenthesis this is easily accomplished.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE ListPrice > 1000
AND (Color = 'Red'
OR Color = 'Black')

Run this query and check that all products are now more expensive than $1000.

One last thing to mention is that NOT takes precedence over AND and OR. So in the following
example the results will contain only red products that are not more expensive than $1000 and all
black products.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE NOT ListPrice > 1000
AND Color = 'Red'
OR Color = 'Black'

Now suppose you want all products except red and black products that are more expensive than
$1000. You can use nested parenthesis and the NOT keyword to simply accomplish this task.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE NOT (ListPrice > 1000
AND (Color = 'Red'
OR Color = 'Black'))

7.3 Filtering strings

In the previous examples we have filtered numeric values and strings (text). This was easy enough. A
numeric value does not need any special treatment and a string value should be placed between
apostrophes.

So how will we filter parts of strings? For example we want all products that begin with the letter 'B'.
This can be done with the LIKE operator. When using the LIKE operator you can use the % sign to
indicate 'none, one or more characters'. Consider the following query that returns all products
starting with a 'B'.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE Name LIKE 'B%'
So the Name should start with B and then have 'none, one or more characters'. So how would we
query all products that end with a 'B'? We want none, one or more random characters and then a 'B'.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE Name LIKE '%B'

Notice that this query actually returns all products that end with 'b' or 'B'. SQL is not case sensitive by
default, although you could make it case sensitive. This is not covered in this article.

And of course we can use LIKE to find certain words or characters in a string. The following query
returns all products that are locks (at least according to their names).

Hide Copy Code


SELECT *
FROM Production.Product
WHERE Name LIKE '%Lock%'

And the following query returns all gear for road bicycles that are black.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE Name LIKE '%Road%Black%'

The % sign is called a wildcard. The following wildcards can be used in the LIKE operator:

Hide Copy Code


% = none, one or more characters
_ = a single character
[<character list>] = a single character from the list
[<character range>] = a single character from the range
[^<character list or range>] = a single character that is not in the list or range

Here are some examples of all wildcards. Notice that the first and second example return the same
results.

Hide Copy Code


-- First character is anything
-- Second character is an L
-- Third character is a space
SELECT *
FROM Production.Product
WHERE Name LIKE '_L %'

-- First character is an M, H or L
-- Second character is an L
SELECT *
FROM Production.Product
WHERE Name LIKE '[MHL]L%'

-- First character is an A, B or C
SELECT *
FROM Production.Product
WHERE Name LIKE '[A-C]%'

-- First character is not an A, B or C


SELECT *
FROM Production.Product
WHERE Name LIKE '[^A-C]%'

7.4 Filtering dates

Another point of interest is filtering dates. SQL Server supports multiple date types such
as DateTime, SmallDateTime and DateTime2. The differences between them is outside the scope
of this article. There are several ways of filtering options for dates. A date is represented by a string in
a certain format. I am from the Netherlands and over here we would write a date as follows: 31-12-
2013 (that is 31 december 2013). Other countries have their own format. In America the same date
would be written as 12/31/2013. This is problematic because you don't want a query to return
different results in different countries. For that reason it is recommended you always use the
following format: 20131231, that is four digits for year, two for month and two for the day (also
yyyyMMdd). January first would be written as 20130101. This format is not specific for any country,
or culture neutral. So let us try it out.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE ModifiedDate = '20080311'

Running this query returns no results. What went wrong!? Another 'problem' with dates is that they
often have times (other than 00:00:00.000). The following query would give you results (appearently
all products were modified in exactly the same milisecond).

Hide Copy Code


SELECT *
FROM Production.Product
WHERE ModifiedDate = '20080311 10:01:36.827'

Of course such syntax is tedious and selecting all data from a day or a range of dates would become
impossible. For that reason you should search for records where the date is greater than or equal to
the date you are searching for and less than the day after that date. In this example we would use
the following query.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE ModifiedDate >= '20080311'
AND ModifiedDate < '20080312'

A last word of caution when filtering dates. The format 2013-12-31 (or yyyy-MM-dd) is an ISO
standard and might seem to work well, but for historic reasons this format is NOT independent of
culture for the DateTimeand SmallDateTime data types.

There are other ways to filter dates such as using functions, but they are not discussed until part II of
this article.

7.5 The IN and BETWEEN keywords

You can also filter data using the keywords BETWEEN and IN. Both let you filter for a range of values.
The difference is that BETWEEN uses consecutive values while IN lets you use any set of values.

The following query returns all products where the price is between $100 and $200 (including both
$100 and $200).

Hide Copy Code


SELECT *
FROM Production.Product
WHERE ListPrice BETWEEN 100 AND 200

The query is equivalent to the following:

Hide Copy Code


SELECT *
FROM Production.Product
WHERE ListPrice >= 100
AND ListPrice <= 200

The IN operator allows you to specify the values you want to find delimited by a comma. Let's say
we want all products with product model 15, 25 or 30. The following query would return the proper
results.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE ProductModelID IN (15, 25, 30)

The query is equivalent to the following:

Hide Copy Code


SELECT *
FROM Production.Product
WHERE ProductModelID = 15
OR ProductModelID = 25
OR ProductModelID = 30

8. Sorting data; ORDER BY


A lot of times you will want to sort your data, that means presenting your data in a specific sorting
order. For example, you might want to create an overview of all orders that were made last month
and view your latest orders first. As I already mentioned data in SQL Server is stored in relations,
which is an unordered collection of (unique) rows. Practically this might not be the case and SQL
Server will always show rows in order of the primary key (that what makes a record unique). This also
means that whenever you query a table the rows are almost always returned in that order too. There
is a catch though. SQL Server can never guarantee that order. That might sound strange if you query
a database a thousand times and the order of the returned rows is the same every time. Still, the
order is never guaranteed and if you want a specific ordering you will have to explicitly indicate this.
This can be done by using the ORDER BY clause.

So let us look at an example. The following query shows all products in alphabetical order.

Hide Copy Code


SELECT *
FROM Production.Product
ORDER BY Name

Check out the results and you might notice that this is (probably) the first time a result set is not
ordered by ProductID. Since we did not indicate an order direction SQL sorts the data in ascending
order. You can explicitly indicate ascending order by using ASC, but this is not required. It is also
possible to order data in descending order. In this case you do have to use the direction order
explicitly. You can use the DESC keyword to indicate descending ordering. The following query
returns all products from Z to A.

Hide Copy Code


SELECT *
FROM Production.Product
ORDER BY Name DESC

It is also possible to order by multiple columns. This is of course only useful when the first column
you order by has multiple rows with the same value. If each value in the first column would be
unique (such as Name) then it is not possible to order the following columns within that group. The
following query returns all products ordered by color name and then by price descending. The first
color in the alphabet we have is Black. So the most expensive black product is returned first, after
that the second most expensive black product etc. After the least expensive black product comes the
most expensive Blue product etc.

Hide Copy Code


SELECT *
FROM Production.Product
ORDER BY Color, ListPrice DESC

When running this query you might notice something. It is not Black that is the color with the
'lowest' value. It is actually NULL (or no value). That NULL certainly makes things hard, huh? Luckily
this is pretty obvious, even if you did not expect it or disagree with it. When ordering 'no
value' comes before any value.

You might remember that the ORDER BY list can contain column aliases too. This has to do with the
order in which SQL Server internally processes queries. The ORDER BY is actually the only statement
that is internally handled after the SELECT (even though SELECT appears as the very first keyword in
a query) making it possible for ORDER BY to use aliases. I will show an example where the ORDER
BY takes a column alias as parameter.

Hide Copy Code


SELECT
Name,
ProductNumber AS Number
FROM Production.Product
ORDER BY Number

Just like with a WHERE clause it is possible to order data by a column that is not in your SELECT list.
There is one exception to this rule however. When using DISTINCT in your query you cannot order
by columns that are not in the SELECT list. The reason is that when duplicates are removed the result
rows do not necessarily map to the source rows in a one-to-one manner. The following query shows
this.

This query would return ten rows (as we saw earlier), but the table contains 504 unique names. SQL
Server can not order ten rows based on 504 values. Based on what product names should it order
the colors black and yellow? That might be a problem...

It is also possible to order by the (1-based) index of the column, but this is considered bad practice.
The following example shows this. In this query the result is ordered by Name (because it is the first
column in the SELECT list).

Hide Copy Code


SELECT Name, ProductNumber
FROM Production.Product
ORDER BY 1

Now imagine that someone would change the order in the SELECT list or would add columns
before Name? Suddenly Name is not the first column in the SELECT list anymore, but the ORDER
BY still orders by the first column. This can have unexpected results. When using column names in
the ORDER BY this would never be a problem.

If you want to specify a WHERE clause and an ORDER BY the ORDER BY comes last in the query.

Hide Copy Code


SELECT *
FROM Production.Product
WHERE Color = 'Red'
ORDER BY Name

9. Further limiting results; TOP and OFFSET-


FETCH
9.1 TOP

You might not always want to return all rows from a query. For example because returning all rows
would return so many rows that your application would be unresponsive for hours while fetching the
data. Or maybe you are only interested in the top ten most expensive products or perhaps the top
ten percent. Of course this is also possible in SQL using the TOP statement, which can be used as
follows.

Hide Copy Code


SELECT TOP 10 *
FROM Production.Product

This query returns the first ten products. This makes little sense though. Remember that SQL Server
guarantees no order in its results? Simply selecting a top ten theoretically means we are selecting
any ten random products. More useful is to combine TOP with an ORDER BY clause. The next query
gets the top ten most expensive products.

Hide Copy Code


SELECT TOP 10 *
FROM Production.Product
ORDER BY ListPrice DESC

When you run this query you might notice a lot of duplicate prices. We are selecting a top ten, but
actually the next three results (which are not returned) are just as expensive as the last row that is
returned. And even if we returned those three rows we would still only have our three highest prices
(which might or might not be what you want). For this example we really do want the most expensive
products even if they only make up for the three highest prices (there are methods to actually get
the top ten highest prices, but that is not discussed here). We do want to return the three rows that
have the same price as the last row returned though. This can be done by adding WITH TIES to the
query.

Hide Copy Code


SELECT TOP 10 WITH TIES *
FROM Production.Product
ORDER BY ListPrice DESC

We can also use TOP to select a certain percentage of our result. This can be done by using
the PERCENTkeyword.

Hide Copy Code


SELECT TOP 10 PERCENT *
FROM Production.Product
ORDER BY ListPrice DESC

This query returns 51 results. The Product table actually has 504 rows. Ten percent of 504 is actually
50.4. We would not be satisfied if SQL Server returned 0.4 row though, so instead it returns the
ceiling (always rounds up). 50.4 thus becomes 51. WITH TIES can be added to the query again,
which would return 55 rows instead of 51.

One thing I should mention is that the number in TOP can be parameterized. This is outside the
scope of this article, but I will show an example nonetheless.

Hide Copy Code


DECLARE @Top AS INT
SET @Top = 10

SELECT TOP (@Top) WITH TIES *


FROM Production.Product
ORDER BY ListPrice DESC

9.2 OFFSET-FETCH

What if you wanted to skip a couple of rows instead of selecting a top? This is a common scenario
for paging. For this purpose you can use the OFFSET-FETCH clause. Let us skip the top ten most
expensive products and select the next 25 most expensive. With <code>OFFSET we can specify how
many rows we want to skip and with FETCH we can specify how many rows we do want to return.
The following query illustrates this.

Hide Copy Code


SELECT *
FROM Production.Product
ORDER BY ListPrice DESC
OFFSET 10 ROWS FETCH NEXT 25 ROWS ONLY
You might notice a few things. First of all I started off with including an ORDER BY right away.
The ORDER BY is actually mandatory when using OFFSET-FETCH. Second, the OFFSET-FETCH clause
actually consists of two parts, the OFFSET and the FETCH part.

Let us take a closer look at the two parts first. The OFFSET part is mandatory if you want
a FETCH part. The FETCH part is optional however. Not including the FETCH part makes you skip a
number of rows and return all the other rows. You might want to skip 100 rows and return all the
remaining rows.

Hide Copy Code


SELECT *
FROM Production.Product
ORDER BY ListPrice DESC
OFFSET 100 ROWS

You can also specify an OFFSET of 0. This would select all rows.

Using an OFFSET of 0 and specifying a FETCH is equivalent to a TOP statement. The following query
selects the top 25 rows using OFFSET-FETCH.

Hide Copy Code


SELECT *
FROM Production.Product
ORDER BY ListPrice DESC
OFFSET 0 ROWS FETCH FIRST 25 ROWS ONLY

You might notice I used the keyword FIRST instead of <code>NEXT. They are completely
interchangeable, but since we are here selecting a top 25 it might be more correct to
use FIRST instead of NEXT. The ROWS keyword can be replaced by ROW, which would look better if
you have an OFFSET or FETCH of one.

When using OFFSET-FETCH an ORDER BY is required. Since order is not guaranteed in SQL Server it
would be logical to use an ORDER BY, but maybe you really do want to skip a few random rows after
which you select a few other random rows. In this case you can use the following ORDER BY clause.

Hide Copy Code


SELECT *
FROM Production.Product
ORDER BY (SELECT NULL)
OFFSET 1 ROW FETCH FIRST 25 ROWS ONLY

Since OFFSET-FETCH is a SQL standard and TOP is not it is recommended to use OFFSET-
FETCH whenever possible. There are no WITH TIES or PERCENT equivalents for OFFSET-
FETCH though. So if you need that functionality TOP would be your only choice.

10. Aggregating data; GROUP BY and HAVING


Quite often you will want to group data. You might want to know how much money you made from
all orders, from all orders in a specific year or from each customer. The following query simply counts
the rows in the Sales.SalesOrderHeader table. The query returns a single value only.

Hide Copy Code


SELECT COUNT(*)
FROM Sales.SalesOrderHeader

COUNT is known as an aggregate function. COUNT(*) specifically means count rows. What this query
did was group all rows into a single group and count the rows for that group. Now suppose you
want to know how much orders each customer has placed. We want to create a single group for
every customer and then count the number of rows in that group. This is accomplished by using
the GROUP BY clause. The following query shows this scenario.

Hide Copy Code


SELECT CustomerID, COUNT(*)
FROM Sales.SalesOrderHeader
GROUP BY CustomerID

In this query the GROUP BY clause is mandatory. Because COUNT can only be applied to groups
and CustomerIDby itself is not a group you either need to remove CustomerID so a single group is
assumed or explicitly create a group per CustomerID by using the GROUP BY clause.

Let us slightly modify that last query so it shows us how many sales persons each customer has. This
can be done by replacing the * in COUNT(*) with the name of the column you want to count.

Hide Copy Code


SELECT CustomerID, COUNT(SalesPersonID)
FROM Sales.SalesOrderHeader
GROUP BY CustomerID

Take a little time to look at the result and make sure the result is correct. We see a lot of customers
with no sales persons. COUNT ignores NULLS and simply does not count them. So was the result
correct? No, it was not! For example, the customer with ID 29484 only has one distinct sales person,
yet our results indicate there were seven! Actually COUNT does not count the number of unique
values, it simply counts values. So the query above gives us the number of orders per customer
where sales persons has a value. Luckily we can easily fix this.

Hide Copy Code


SELECT CustomerID, COUNT(DISTINCT SalesPersonID)
FROM Sales.SalesOrderHeader
GROUP BY CustomerID

By adding DISTINCT inside the COUNT function we get a count of all unique values (still
excluding NULLS). We can also make groups based on multiple columns. For example we want to
know the number of sales persons per customer per ship to address. In that example we could see
that a single customer uses one or more shipping addresses and the number of sales persons that
serve a single shipping address. In this database there is only one shipping address per customer
which you can check by running the following query.

Hide Copy Code


SELECT
CustomerID,
ShipToAddressID,
COUNT(DISTINCT SalesPersonID)
FROM Sales.SalesOrderHeader
GROUP BY CustomerID, ShipToAddressID

We still see a lot of customers that do not have a sales person. For our purposes we are not
interested in those. We can still use a WHERE clause to order out the NULL values.

Hide Copy Code


SELECT
CustomerID,
COUNT(DISTINCT SalesPersonID)
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL
GROUP BY CustomerID

Notice that the GROUP BY clause comes after the WHERE clause. So in this example the WHERE clause
filters out all rows where SalesPersonID has no value and then the remaining rows are grouped. But
what if we wanted to filter groups? Let us say we want only customers that have more than one sales
person. The WHERE clause filters on a row basis and not on a group basis. The following query is not
valid syntax.

The error message already gives us the solution. We need to use the HAVING clause in order to filter
groups. The following query solves our problem.

Hide Copy Code


SELECT
CustomerID,
ShipToAddressID,
COUNT(DISTINCT SalesPersonID)
FROM Sales.SalesOrderHeader
GROUP BY CustomerID, ShipToAddressID
HAVING COUNT(DISTINCT SalesPersonID) > 1

HAVING does largely the same as WHERE, except that it filters groups and as such can have aggregate
functions. You can use a <code>WHERE and HAVING clause in the same query. Make sure you
understand the difference between them though.

There are other grouping functions as well. The following table lists the most common ones. All can
be combined with DISTINCT to use only unique values. Like COUNT the other functions also
ignore NULLs.

Hide Copy Code


COUNT(*) = counts the number of rows
COUNT(column) = counts the number of values in the column
AVG(column) = gives the average value of the values in the column
MAX(column) = gives the maximum value in the column
MIN(column) = gives the minimum value in the colum
SUM(column) = adds all values in the column

The following query shows how the functions can be used.

Hide Copy Code


SELECT
CustomerID,
COUNT(*) AS NoOfOrders,
AVG(SubTotal) AS AverageSubTotal,
MAX(SubTotal) AS MaxSubTotal,
MIN(SubTotal) AS MinSubTotal,
SUM(SubTotal) AS TotalSubTotal
FROM Sales.SalesOrderHeader
GROUP BY CustomerID

In this last example I aliased all of the aggregated columns. You might have noticed, but SQL Server
cannot give names to aggregated columns. It is considered best practice to alias these columns.
When we order results by aggregated columns we now have two choices. We can either order by the
aggregate function, which makes our query rather cluttered or we can order by alias. The following
queries show the difference.

Hide Copy Code


SELECT
CustomerID,
COUNT(*)
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY COUNT(*)

SELECT
CustomerID,
COUNT(*) AS NoOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY NoOfOrders

Just like with DISTINCT the ORDER BY clause can now not contain columns that are not present in
the SELECTlist.

11. Selecting from multiple tables; using JOINS


You might have noticed that in the last part we worked with CustomerID's and SalesPersonID's
without knowing what these ID's mean. The CustomerID in the SalesOrderHeader table has a
corresponding row in the Customer table, enforced by a Foreign Key Constraint, where we can get
more information about the customer. Take for example the first order in
the SalesOrderHeader table.

Hide Copy Code


SELECT TOP 1
SalesOrderID,
SalesOrderNumber,
CustomerID,
SubTotal
FROM Sales.SalesOrderHeader
ORDER BY SalesOrderID

If we wanted to know the name of this customer we could look it up in a seperate query and get the
customer with that ID. We do not want that however. We need the name of the customer in the
same result set as the order information. We can select from multiple tables using two methods.

The first method to select from multiple tables, which is not the recommended one, is to add more
tables to the FROM clause. The following example shows how this can be done. Do NOT run the
following query.

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader, Sales.Customer

This query returns the carthesian product of the SalesOrderHeader and the Customer table. That
means that every order is combined with every customer. The query would run for hours on my
machine and should return 623636300 rows (31456 orders * 19820 customers). Clearly this is not
what we want. We only want to return a subset of the carthesian product. So which rows do we want
to see? We only want to return the rows where the CustomerID from the Customer table matches
the CustomerID from the SalesOrderHeader table. That is fairly simple using a WHERE clause.
Note that if you would have orders that do not have a value for CustomerIDthis order would not be
returned.

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader, Sales.Customer
WHERE Sales.SalesOrderHeader.CustomerID = Sales.Customer.CustomerID

Now that we are selecting from multiple tables we might need to use the column names including
table names. After all if we only specify CustomerID do we mean the CustomerID from
the SalesOrderHeader table or the CustomerID of the Customer table? This is where table
aliasing comes in handy. You have already seen the syntax for aliasing, so this should not be a
problem. Here is the query written with aliases.

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader AS s, Sales.Customer AS c
WHERE s.CustomerID = c.CustomerID

Notice how for table aliases we use short names, like a single character (for example the first letter of
the table name). When using table aliases you must use the alias. Using the full table name is not
allowed anymore. In some cases aliasing is mandatory like when you select from the same table
twice. Unfortunately the Customertable does not have the Name we are looking for. A Customer has
a PersonID which corresponds to a row in the Person table. This Person has the name we need.
So we are going to need another table to select from and make sure we select only the correct
person for every row in the WHERE clause.

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader AS s,
Sales.Customer AS c,
Person.Person AS p
WHERE s.CustomerID = c.CustomerID
AND c.PersonID = p.BusinessEntityID

So we now select all columns from the SalesOrderHeader, Customer and Person tables. As you
can see our result has multiple columns with the same name
like CustomerID, rowguid and ModifiedDate. So let us select only the columns we want.

Hide Copy Code


SELECT
s.SalesOrderID,
s.SalesOrderNumber,
s.CustomerID,
p.FirstName,
p.LastName,
s.SubTotal
FROM Sales.SalesOrderHeader AS s,
Sales.Customer AS c,
Person.Person AS p
WHERE s.CustomerID = c.CustomerID
AND c.PersonID = p.BusinessEntityID
As mentioned this is not the recommended method to select from multiple tables, so I will not
elaborate on this further.

11.1 CROSS JOIN

Instead we should use the JOIN operator. There are multiple kinds of JOINs. The simplest is
the CROSS JOIN, which returns the carthesian product. Once again, do not run this query for it will
return over 600 million records and takes hours to run (unless you add a where clause of course).

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader
CROSS JOIN Sales.Customer

11.2 INNER JOIN

The next JOIN, and probably the one you will use most, is the INNER JOIN. Actually INNER JOIN is
the most used JOIN and the INNER keyword is optional. The INNER JOIN operator uses
the ON clause to match rows instead of WHERE. The following example illustrates this.

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.Customer AS c ON s.CustomerID = c.CustomerID

When using the INNER JOIN results from the main table that do not have at least one match in the
join table are discarded. When the join table has multiple matches then rows from the main table are
duplicated in the result. The following query illustrates that rows are discarded.
The SalesPersonID column in the SalesOrderHeadertable contains NULLs. Obviously the rows
where there is no value for SalesPersonID cannot be matched with a sales person from
the SalesPerson table. If you run the following query you will see that these rows are not included
in the result.

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader
JOIN Sales.SalesPerson ON SalesPersonID = BusinessEntityID

The following query shows that the SalesOrderHeader rows are duplicated when joined with
the SalesOrderDetail table. An order can have more than one detail and for each detail the sales
order header is duplicated.

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID
ORDER BY soh.SalesOrderID

The ON clause can contain more advanced filters, just like the WHERE clause. We can, for example, join
only on details with a discount.

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID
AND sod.UnitPriceDiscount > 0
ORDER BY soh.SalesOrderID

An alert reader might have noticed that the sod.UnitPriceDiscount > 0 part could just as well
be in the WHERE clause of the query. Indeed, in an INNER JOIN the ON and WHERE clause are
interchangeable. The following query is equivalent to the query above.

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID
WHERE sod.UnitPriceDiscount > 0
ORDER BY soh.SalesOrderID

Any row that does not have a match with a row from the other table is discarded in the result, so it
does not matter if the ON clause filters a row in the SalesOrderDetail table which is then also
discarded from the SalesOrderHeader table or if the WHERE clause filters the rows after a match
was found in the ON clause.

The ON clause is always mandatory when working with INNER JOINS. That does not mean you have
to check for equality between two rows. The following is also perfectly valid syntax and is equal to
a CROSS JOIN (or carthesian product) because every row in the SalesOrderDetail table matches
any row in the SalesOrderHeader table (because the 1 = 1 predicate always returns <code>TRUE).

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader
JOIN Sales.SalesOrderDetail ON 1 = 1

11.3 OUTER JOIN

We just saw that with an INNER JOIN rows that do not have a match in the joined table are
discarded. With an OUTER JOIN we can preserve these results. There are three kinds of OUTER
JOINs, one that preserves rows from the main table if no match is found, one that preserves rows
from the joined table and one that preserves both. Let us join the SalesOrderHeader table with
the SalesPerson table again, but this time we want to preserve sales orders that do not have a
sales person.
Hide Copy Code
SELECT *
FROM Sales.SalesOrderHeader
LEFT OUTER JOIN Sales.SalesPerson ON SalesPersonID = BusinessEntityID

As you can see the LEFT OUTER JOIN preserves SalesOrderHeaders (the LEFT part of the JOIN)
when no SalesPerson row can be found. In the result set all SalesPerson columns have no value
for these rows.

The following query joins orders with customers. Not every customer has placed an order though, so
we can choose to keep these customers in the result set and return empty orders for these
customers.

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader AS soh
RIGHT OUTER JOIN Sales.Customer AS c ON c.CustomerID = soh.CustomerID
ORDER BY c.CustomerID

As you can see the RIGHT OUTER JOIN preserves Customers (the RIGHT part of the JOIN) when
no SalesOrderHeaders rows can be found.

The last type of OUTER JOIN is the FULL OUTER JOIN and preserves rows from both sides of
the JOIN, so basically it is both a LEFT OUTER JOIN and a RIGHT OUTER JOIN. In
the SalesOrderHeader table there are orders that have a CurrencyRateID, but there are also
orders without a CurrencyRateID. That would at least qualify for a LEFT OUTER JOIN, but there
are also CurrencyRates that do not have orders, which would also qualify a join between the tables
for a RIGHT OUTER JOIN. In this case, if we want to keep both orders without a currency rate and
currency rates without orders we can use the FULL OUTER JOIN.

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader AS soh
FULL OUTER JOIN Sales.CurrencyRate AS c ON c.CurrencyRateID = soh.CurrencyRateID
ORDER BY soh.SalesOrderID

The OUTER keyword is optional in all OUTER JOINS. So LEFT JOIN, RIGHT JOIN and FULL
JOIN are also valid syntax.

Remember that with an INNER JOIN the ON clause and WHERE clause had the same filtering
function? Well, that is not the case for the OUTER JOIN. The OUTER JOIN discards rows on one side
of the join, but keeps them on the other side. So the ON clause is used for matching purposes only.
When adding an extra predicate to the ONclause it just means the right or left side of the join will
have no values if the predicate results in FALSE (in contrary to the INNER JOIN where the result was
discarded as a whole). For example, the following two queries both return very different results.

Hide Copy Code


SELECT *
FROM Sales.SalesOrderHeader AS soh
FULL JOIN Sales.CurrencyRate AS c ON c.CurrencyRateID = soh.CurrencyRateID
AND c.AverageRate > 1
ORDER BY c.AverageRate

SELECT *
FROM Sales.SalesOrderHeader AS soh
FULL JOIN Sales.CurrencyRate AS c ON c.CurrencyRateID = soh.CurrencyRateID
WHERE c.AverageRate > 1
ORDER BY c.AverageRate

The first query returns all orders, but discards the currency rate for an order if it has an average rate
that is higher than one. The second query only returns orders that have an average rate that is higher
than one.

11.4 Self JOIN

A table can be joined with the same table. Examples of why you would want to do this might be
because an Employee has a ManagerID, which corresponds to a manager who is also an Employee.
Unfortunately I could not find an example in the AdventureWorks2012 database where a
self JOIN would make sense. That does not mean I cannot show you. The next example selects from
the Person.Person table and joins with the Person.Person table, selecting the Person with the
current Persons ID + 1. Of course this is not really a valid business case and the join fails when
ID's are not consecutive, but it shows how you can join a table with itself.

Hide Copy Code


SELECT
p1.BusinessEntityID AS CurrentID,
p1.Title AS CurrentTitle,
p1.FirstName AS CurrentFirstName,
p1.LastName AS CurrentLastName,
p2.BusinessEntityID AS NextID,
p2.Title AS NextTitle,
p2.FirstName AS NextFirstName,
p2.LastName AS NextLastName
FROM Person.Person AS p1
LEFT JOIN Person.Person AS p2 ON p2.BusinessEntityID = p1.BusinessEntityID + 1
ORDER BY CurrentID, CurrentFirstName, CurrentLastName

11.5 Multiple JOINS

In previous examples we selected Customer data based on the CustomerID in


the SalesOrderHeader and then we selected Person data using the PersonID in
the Customer table. We can also use JOINs for this task. In fact, it is possible to use all kinds of joins
in a single query. Beware that one JOIN may influence the other. In the following example you might
think that only orders that ultimately have a customer with a person will be selected (because
the JOIN on Person is an INNER JOIN). But because the JOIN with customer is a RIGHT JOIN and
the person is joined on customer not every person has an order.
Hide Copy Code
SELECT
s.SalesOrderID,
s.SalesOrderNumber,
c.CustomerID,
p.BusinessEntityID,
p.FirstName,
p.LastName,
s.SubTotal
FROM Sales.SalesOrderHeader AS s
RIGHT JOIN Sales.Customer AS c ON c.CustomerID = s.CustomerID
JOIN Person.Person AS p ON p.BusinessEntityID = c.CustomerID
ORDER BY p.BusinessEntityID

12. Multiple groups; GROUPING SETS


12.1 GROUPING SETS

We have taken a little break from grouping operators to explore the JOIN options that are available
to us. There is more to grouping than we have so far discussed though. A query can have multiple
groupings in a single query. We might want to know the number of orders in total (or avg, max, min
or sum), the number of orders per customer, per sales person and per customer and sales person.
This can be achieved with the GROUPING SETSclause. The following query shows an example.

Hide Copy Code


SELECT
CustomerID,
SalesPersonID,
COUNT(*) AS NoOfOrders
FROM Sales.SalesOrderHeader
GROUP BY GROUPING SETS
(
(CustomerID ),
(SalesPersonID ),
(CustomerID, SalesPersonID ),
( )
)
ORDER BY SalesPersonID,
CustomerID

The results of this query are a little difficult to interpret. The first results look as follows:
The first result we see is the total amount of orders. The second row looks like it has the same data
except the NoOfOrders is different from the first. This is actually the number of orders that do not
have a sales person (SalesPersonID is NULL). The rows after that look exactly the same. One
represents a sales order for a specific customer and a specific SalesPersonID (which just happens
to be NULL) and the other represents a sales order for a specific customer regardless
of SalesPersonID.

When we look at later results where SalesPersonID actually has a value this may become clearer.
Here we see that the first three rows contain the number of orders for a specific customer with a
specific sales person. The fourth row has no customer and thus shows the number of orders for that
sales person regardless of customer. The rows that follow break down his sales numbers per
customer until another row follows where customer is empty and the next sales person starts.

How can we know if SalesPersonID is part of a group if it has no value? You might be tempted to
remove orders that have no sales person from our query, but beware! The following query gives a
much better overview of what each sales person has sold, but the totals for many customers and the
grand total are not correct anymore! After all, orders without a sales person are not counted for
customers and for the grand total.

Hide Copy Code


SELECT
CustomerID,
SalesPersonID,
COUNT(*) AS NoOfOrders
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL
GROUP BY GROUPING SETS
(
(CustomerID ),
(SalesPersonID ),
(CustomerID, SalesPersonID ),
( )
)
ORDER BY SalesPersonID,
CustomerID

Check that this is actually not the result you expected. The total number of orders has suddenly
shrunk to only 3806, but this is only the total number of orders that have a sales person! This might
or might not be what you want. You have been warned. I will get back to this problem later in this
section.

12.2 CUBE

The CUBE clause is a shortcut for the GROUPING SET clause. In the CUBE clause you can specify the
columns you want to group by and CUBE creates all possible sets including the empty one.
The CUBE equivalent of the previous query is the following.

Hide Copy Code


SELECT
CustomerID,
SalesPersonID,
COUNT(*) AS NoOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CUBE(CustomerID, SalesPersonID)
ORDER BY SalesPersonID,
CustomerID

CUBE now groups by nothing (so taking the count of all rows), it groups by CustomerID (taking the
count of rows per customer), it groups by SalesPersonID (taking the count of rows per sales
person) and it groups by the combination of CustomerID and SalesPersonID (so taking the count
of rows for a customer with a specific sales person). You can check that the result is the same as the
query above (without filtering the rows with no sales person).

12.3 ROLLUP

ROLLUP is a little more complex than CUBE. Like CUBE it is also a shortcut for the GROUPING
SET clause, but unlike CUBE you use it when you need a certain hierarchy. For example each order
has a BillToAddressID. Each BillToAddressID maps to a row in the Person.Address table.
The Address table has a City field and a StateProviceID which maps to a row in
the Person.StateProvice table. The StateProvince table has a Name and
a CountryRegionCode which maps to a row in the Person.CountryRegion table which has
a Name. So performing a few JOINS we are able to get
the City, StateProvinceName and CountryName for each billing address of an order. The
following query shows how to get there.

Hide Copy Code


SELECT
soh.SalesOrderID,
soh.SalesOrderNumber,
soh.BillToAddressID,
a.City,
sp.Name AS StateProvinceName,
cr.Name AS CountryName
FROM Sales.SalesOrderHeader AS soh
JOIN Person.Address AS a ON a.AddressID = soh.BillToAddressID
JOIN Person.StateProvince AS sp ON sp.StateProvinceID = a.StateProvinceID
JOIN Person.CountryRegion AS cr ON cr.CountryRegionCode = sp.CountryRegionCode
ORDER BY CountryName,
StateProvinceName,
City

This query is not actually so difficult. If you do not understand it right away take your time and break
it down into little pieces (one JOIN at a time).

We need to know how much orders were billed per country, per province and per city. However,
multiple countries or provinces may have the same cities. So counting the number of orders placed
in a single city is not interesting if we do not know its province and country. That creates a hierarchy.
The following example with GROUPING SET shows exactly what I mean.

Hide Copy Code


SELECT
a.City,
sp.Name AS StateProvinceName,
cr.Name AS CountryName,
COUNT(*) AS NoOfOrder
FROM Sales.SalesOrderHeader AS soh
JOIN Person.Address AS a ON a.AddressID = soh.BillToAddressID
JOIN Person.StateProvince AS sp ON sp.StateProvinceID = a.StateProvinceID
JOIN Person.CountryRegion AS cr ON cr.CountryRegionCode = sp.CountryRegionCode
GROUP BY GROUPING SETS
(
(cr.Name, sp.Name, a.City ),
(cr.Name, sp.Name ),
(cr.Name ),
( )
)
ORDER BY CountryName,
StateProvinceName,
City

We do not group by City or Province alone, simply because they are not interesting without the
province and/or country. Even though the concept may be a little confusing the results are pretty
straightforward actually.
Reading from the first row a total of 31465 orders were made. The second row indicates 6843 of
those orders are from Australia. The next row indicates that 3009 orders are from New South Wales
(in Australia). The next rows continue to show the number of orders per city (in New South Wales in
Australia). Until row 22 where City is empty and the province Queensland (in Australia) starts. 1461
orders are from Queensland and then come the number of orders per city in Queensland. After all
the cities and provinces in Australia have been counted we move on to the next country, Canada. All
the way down to Rock Springs in Wyoming, USA.

So now let us look at the shortcut by using ROLLUP.

Hide Copy Code


SELECT
a.City,
sp.Name AS StateProvinceName,
cr.Name AS CountryName,
COUNT(*) AS NoOfOrder
FROM Sales.SalesOrderHeader AS soh
JOIN Person.Address AS a ON a.AddressID = soh.BillToAddressID
JOIN Person.StateProvince AS sp ON sp.StateProvinceID = a.StateProvinceID
JOIN Person.CountryRegion AS cr ON cr.CountryRegionCode = sp.CountryRegionCode
GROUP BY ROLLUP (cr.Name, sp.Name, a.City)
ORDER BY CountryName,
StateProvinceName,
City

You can check that the result is the same as the previous query that used GROUPING SETS.

12.4 GROUPING

I want to address a problem we had earlier when using groups. The query we used in the first
example (and in the CUBE example) returned two rows where
both CustomerID and SalesPersonID had no value. One row represented all orders and the other
row represented all orders that did not have a sales person. How can we know which row is which?
For this we can use GROUPING and GROUPING_ID.

GROUPING takes a column as argument and returns 1 in the result if this column is not part of the
grouping and 0if it is part of the grouping. This might be counter-intuitive. A 1 means it is NOT and
a 0 means it IS. The following example shows how to use this.

Hide Copy Code


SELECT
CustomerID,
SalesPersonID,
GROUPING(SalesPersonID) AS IsSalesPersonNotPartOfGroup,
COUNT(*) AS NoOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CUBE(CustomerID, SalesPersonID)
ORDER BY SalesPersonID,
CustomerID,
IsSalesPersonNotPartOfGroup

The result is not somewhat easier to read.

We can now see if a group includes SalesPersonID. And with this function we can also exclude
rows where SalesPersonID is part of the grouping, but also has no value. To accomplish this we
need to only include groups where IsSalesPersonNotPartOfGroup = 1 or
where IsSalesPersonNotPartOfGroup = 0 and where SalesPersonID IS NOT NULL. The
following query demonstrates this (note that I am not selecting IsSalesPersonNotPartOfGroup).

Hide Copy Code


SELECT
CustomerID,
SalesPersonID,
COUNT(*) AS NoOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CUBE(CustomerID, SalesPersonID)
HAVING GROUPING(SalesPersonID) = 1
OR (GROUPING(SalesPersonID) = 0
AND SalesPersonID IS NOT NULL)
ORDER BY CustomerID,
SalesPersonID

Run this query and check the results. One thing to notice is that the total number of sales orders is
now correct (we could not accomplish this using the WHERE clause).

12.5 GROUPING_ID

Another function you can use is the GROUPING_ID. This function works by turning bits on or off
dependent on whether a column is part of a grouping. The following query shows an example.

Hide Copy Code


SELECT
GROUPING_ID(CustomerID, SalesPersonID) AS GroupID,
CustomerID,
SalesPersonID,
COUNT(*) AS NoOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CUBE(CustomerID, SalesPersonID)
ORDER BY SalesPersonID,
CustomerID

Now this is actually not as practical and easy as you might hope. We pass two parameters
to GROUPING_ID, meaning GROUPING_ID uses two bits that it turns on or off. If SalesPersonID is
part of the group then the bit at the location that is the same as SalesPersonID in the list of
parameters (right most) is turned on. The results show what this looks like.
The first row shows GroupID 2. This requires knowledge of calculating with bits. We passed two
parameters to GROUPING_ID, so it has two bits, 00. When a column is not part of a group a bit is
turned on. In this case 2 in bit notation would be 10, that means CustomerID is not part of the
group and SalesPersonID is. The next value is 3, or 11 in bit notation, meaning both columns are
not part of the group. The next value, 0 (bit notation 00), indicates that both columns are part of the
group. The next value, 1 (bit notation 01) indicates that SalesPersonID is not part of the group.

13. Windowing functions; the OVER clause


Sometimes you want to show the results of grouping data, like the number of orders, without
actually grouping the data in your result. This is possible using window functions. A window function
works like the GROUP BYclause we have seen earlier, except it calculates the result and returns a
single value, making it possible to use these values in your query without grouping it. The 'window' is
defined by your result set. So the result set is passed into a window function, the window function
does the calculation (by grouping and ordering the window) and returns a single value for each row
in your actual result set.

Window functions are internally executed after the WHERE and HAVING clauses. That means you
cannot use them in filters or other clauses except the ORDER BY. Common Table
Expressions or CTE's offer the solution, but they are not discussed until part II of this article.

13.1 Aggregate functions

We can use the aggregate functions COUNT, AVG, MAX, MIN and SUM in a window and apply them to
each row in your result set by using the OVER clause. The following query shows an example of each.

Hide Copy Code


SELECT
SalesOrderID,
SalesOrderNumber,
COUNT(*) OVER() AS NoOfOrders,
COUNT(SalesPersonID) OVER() AS OrdersWithSalesPerson,
AVG(SubTotal) OVER() AS AvgSubTotal,
MAX(SubTotal) OVER() AS MaxSubTotal,
MIN(SubTotal) OVER() AS MinSubTotal,
SUM(SubTotal) OVER() AS TotalSubTotal
FROM Sales.SalesOrderHeader

As you can see we can apply aggregating functions without


grouping SalesOrderID and SalesOrderNumber. Each row now simply shows the number of
orders, the number of orders that have a sales person and the average, maximum, minimum and
total subtotal.

That is pretty useful, but often you want to show these results per group. For example you need to
show the total subtotal for the customer in the row. This can be done using PARTITION BY in
the OVER clause. The following query shows how to use this.
Hide Copy Code
SELECT
SalesOrderID,
SalesOrderNumber,
CustomerID,
SUM(SubTotal) OVER(PARTITION BY CustomerID) AS TotalSubTotalPerCustomer,
SUM(SubTotal) OVER() AS TotalSubTotal
FROM Sales.SalesOrderHeader

When looking at the result you can see the TotalSubTotal is still the same for every row, but
the TotalSubTotalPerCustomer is the same for each row with the same customer and different
for each row with another customer. Your result set, or window, is grouped by CustomerID (defined
in PARTITION BY) and a single result is returned for each row.

And you can specify multiple columns to partition by. For example you can show the total subtotal
per customer per sales person.

Hide Copy Code


SELECT
SalesOrderID,
SalesOrderNumber,
CustomerID,
SalesPersonID,
SUM(SubTotal) OVER(PARTITION BY CustomerID, SalesPersonID)
AS TotalSubTotalPerCustomerPerSalesPerson,
SUM(SubTotal) OVER() AS TotalSubTotal
FROM Sales.SalesOrderHeader
ORDER BY CustomerID, SalesPersonID

13.2 Framing

The window aggregate functions support another filtering option called framing. I had a hard time
getting my head around the concept so I will try to explain this as easy as possible. In the previous
examples we have seen that a window function can calculate a value, for example the sum of a
column, for a specific group, for example customers. With a framing function it is possible to make
this calculation dependent on previous or next rows. This makes it possible to show cumulative
results. Let me show you an example.

Hide Copy Code


SELECT
SalesOrderID,
SalesOrderNumber,
OrderDate,
CustomerID,
SubTotal,
SUM(SubTotal) OVER(PARTITION BY CustomerID
ORDER BY OrderDate
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS TotalSubTotalSoFarForCustomer
FROM Sales.SalesOrderHeader
ORDER BY CustomerID, OrderDate
So again this query gives us the sum of the SubTotal column, however it gives the sum of all
preceding rows instead of of all rows. This is indicated by the ROWS BETWEEN UNBOUNDED
PRECEDING part, which means sum all sub totals of the previous rows in the current group. The last
part AND CURRENT ROW means until the current row. An ORDER BY clause is mandatory because
without it the ROWS BETWEEN ... AND ... would have no meaning (since it would mean any
random rows). So let us look at the result of the above query.

As you can see in the first row the SubTotal and the TotalSubTotalSoFarForCustomer have the
same value. In the second row the TotalSubTotalSoFarForCustomer has the value of the
previous row plus the SubTotal of the current row. The next row again has the value of the previous
row plus the SubTotal of the current row. The next row starts a new group (a new CustomerID)
and so the SubTotal and the TotalSubTotalSoFarForCustomer are equal again. So using this
syntax we have created cumulative results.

I should mention that I order the partition by OrderDate. That means the window is first ordered by
this column before it makes any calculations. It does not mean your actual result set is ordered in
that same way. If I would not have added the ORDER BY CustomerID, OrderDate at the end of
the query the results would have looked kind of odd (although they would still be correct).

It is possible to have more framing functions in a single query and you can partition and order them
any way you like. If you run the following query however you will see that presenting such data in a
logical order is difficult or even impossible.

Hide Copy Code


SELECT
SalesOrderID,
SalesOrderNumber,
OrderDate,
CustomerID,
SalesPersonID,
DueDate,
SubTotal,
SUM(SubTotal) OVER(PARTITION BY CustomerID
ORDER BY OrderDate
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS TotalSubTotalSoFarForCustomer,
MAX(SubTotal) OVER(PARTITION BY SalesPersonID
ORDER BY DueDate
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS MaxSubTotalBySalesPersonSoFar
FROM Sales.SalesOrderHeader
ORDER BY CustomerID, OrderDate

You can also switch the UNBOUNDED PRECEDING and CURRENT ROW around. Doing this will result in
the inverse of cumulating rows. The first row will now have the sum of all rows for that customer,
while the second row will have the value of the first row minus the SubTotal of the first row.

Hide Copy Code


SELECT
SalesOrderID,
SalesOrderNumber,
OrderDate,
CustomerID,
SubTotal,
SUM(SubTotal) OVER(PARTITION BY CustomerID
ORDER BY OrderDate
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING) AS InverseCumulative
FROM Sales.SalesOrderHeader
ORDER BY CustomerID, OrderDate

Notice that instead of <code>PRECEDING we need to use FOLLOWING because we now need to look
at the next rows instead of the previous.

We can also specify a number of rows that should be looked back to or looked ahead. The following
query gives the sum of the previous row and the current row (even if there are more rows preceding
the current one).

Hide Copy Code


SELECT
SalesOrderID,
SalesOrderNumber,
OrderDate,
CustomerID,
SubTotal,
SUM(SubTotal) OVER(PARTITION BY CustomerID
ORDER BY OrderDate
ROWS BETWEEN 1 PRECEDING
AND CURRENT ROW) AS SumOfLastAndCurrentRowForCustomer
FROM Sales.SalesOrderHeader
ORDER BY CustomerID, OrderDate

PARTITION BY is optional in a framing function. The following query just returns a value that is
cumulative independent of customer. Be aware that I removed the CustomerID from the ORDER
BY clause as well.

Hide Copy Code


SELECT
SalesOrderID,
SalesOrderNumber,
OrderDate,
CustomerID,
SubTotal,
SUM(SubTotal) OVER(ORDER BY OrderDate
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS TotalSubTotalSoFar
FROM Sales.SalesOrderHeader
ORDER BY OrderDate

13.3 Ranking functions

Ranking functions can rank rows in a window based on a specified ordering. There are four ranking
functions within SQL Server; ROW_NUMBER, RANK, DENSE_RANK and NTILE. An ORDER BY clause is
mandatory, a PARTITION is not (if it is not specified the entire window is considered one group). The
following example shows how to use the functions.

Hide Copy Code


SELECT
SalesOrderID,
SalesOrderNumber,
CustomerID,
ROW_NUMBER() OVER(ORDER BY CustomerID) AS RowNumber,
RANK() OVER(ORDER BY CustomerID) AS [Rank],
DENSE_RANK() OVER(ORDER BY CustomerID) AS DenseRank,
NTILE(5000) OVER(ORDER BY CustomerID) AS NTile5000
FROM Sales.SalesOrderHeader
ORDER BY CustomerID

ROW_NUMBER speaks for itself. It simply returns the number of the current row.
RANK and DENSE_RANK assign a number to each row with a unique order value. That means that
rows with the same order value (in this case CustomerID) get the same number. The difference
between RANK and DENSE_RANK is that RANK assigns the current rank number plus the number of
rows that have the same order value to the next row while DENSE_RANK always assigns the previous
rank number plus one, no matter how many rows had the same order value.
NTILE partitions, or tiles, the rows in groups of equal size. In this case the returned result had 31465
rows and we requested 5000 tiles of equal size. 31456 divided by 5000 equals 6 with a remainder of
1456. That means the NTILE value is increased after every six rows. Because there is a remainder of
1456 the first 1456 tiles get an additional row.

The following result shows the ranking functions output.


In this case the ordering by CustomerID is not unique. As such the order within this window is not
guaranteed by SQL Server and neither is your row number. That means that the record that is now
row number two might be row number one or three the next time you run this query. If you want
your row numbering to be guaranteed repeatable you should add another order by column which
makes the ordering unique, for example SalesOrderID. This goes for all your windowing functions
including aggregating functions, framing and offset functions.

Hide Copy Code


SELECT
SalesOrderID,
SalesOrderNumber,
CustomerID,
ROW_NUMBER() OVER(ORDER BY CustomerID, SalesOrderID) AS RowNumber,
RANK() OVER(ORDER BY CustomerID) AS [Rank],
DENSE_RANK() OVER(ORDER BY CustomerID) AS DenseRank,
NTILE(5000) OVER(ORDER BY CustomerID, SalesOrderID) AS NTile5000
FROM Sales.SalesOrderHeader
ORDER BY CustomerID

Notice that in this last example I have not added the SalesOrderID ordering to the RANK functions.
These return the same value for rows with the same CustomerID no matter their ordering within
their group. Adding an extra ordering column will actually change the meaning of
the RANK functions!

13.4 Offset functions

Offset functions can return a value from the first or last row in a window or from a row a specified
rows away from the current row. There are four offset
functions LAG, LEAD, FIRST_VALUE and LAST_VALUE.
LAG and LEAD work in much the same way, except LAG looks at rows that come before the current
row while LEAD looks at rows that come after the current row. The syntax is actually much like that of
an aggregate function. The next query shows this.

Hide Copy Code


SELECT
SalesOrderID,
SalesOrderNumber,
LAG(CustomerID) OVER(ORDER BY OrderDate) AS PreviousCustomer,
CustomerID AS CurrentCustomer,
LEAD(CustomerID) OVER(ORDER BY OrderDate) AS NextCustomer
FROM Sales.SalesOrderHeader
ORDER BY OrderDate

If you run this query you will see that each row has a value indicating the customer of the previous
order and a value indicating the customer of the next order. Mind your ordering! Using such
functions do not make sense if your ordering is incorrect in one of your windowing functions or in
your result set. We see the results here.

You can see that the first row has no value for the previous customer. That is because there was no
previous row and so no previous customer. You will see the same happening in the last row for the
next customer.

So what if we wanted to look two rows ahead, or more? We can pass a second parameter to
the LAG and LEADfunctions specifying how many rows we should skip. The following query shows us
a bit about the frequency of orders per customer. We show the shipdate of the order before the last
two orders of the customer (if there is any) and also the shipdate of the order after the next order of
the customer (if there is any).

Hide Copy Code


SELECT
SalesOrderID,
SalesOrderNumber,
CustomerID,
OrderDate,
ShipDate,
LAG(ShipDate, 3) OVER(PARTITION BY CustomerID
ORDER BY ShipDate) AS ShipDateThreeOrdersAgo,
LEAD(ShipDate, 2) OVER(PARTITION BY CustomerID
ORDER BY ShipDate) AS ShipDateOfOrderAfterNext
FROM Sales.SalesOrderHeader
ORDER BY OrderDate
Notice that I am ordering my result set by order date while I order my windows by ship date. That
actually does not matter in this case. When looking at a single row I can simply see if the customer
has at least three orders and when that order was shipped. And if the customer has at least two
orders after this one and when that was shipped.

We can also do something about those NULLs when the specified row was not found.
The LAG and LEADfunctions can take a third parameter that specifies a placeholder for NULLs. Make
sure you specify a value of the same type as your column or you will get an error unless you make
a CAST (which is discussed in the next part of the article). The following query shows the previous or
next sales order number for each customer. When there is no previous or next order the text "No
previous/next orders" is displayed (instead of NULL).

Hide Copy Code


SELECT
SalesOrderID,
CustomerID,
LAG(SalesOrderNumber, 1, 'No previous orders')
OVER(PARTITION BY CustomerID
ORDER BY OrderDate) AS PreviousOrderForCustomer,
SalesOrderNumber AS CurrentOrderNumber,
LEAD(SalesOrderNumber, 1, 'No next orders')
OVER(PARTITION BY CustomerID
ORDER BY OrderDate) AS NextOrderForCustomer
FROM Sales.SalesOrderHeader
ORDER BY OrderDate

So now let us take a look at FIRST_VALUE and LAST_VALUE. The function names are self
explanatory. These functions return the specified value of the first or last row in a window.

Hide Copy Code


SELECT
SalesOrderID,
CustomerID,
SalesOrderNumber AS CurrentOrder,
FIRST_VALUE(SalesOrderNumber) OVER(ORDER BY SalesOrderNumber) AS FirstOrder,
LAST_VALUE(SalesOrderNumber) OVER(ORDER BY SalesOrderNumber) AS LastOrder
FROM Sales.SalesOrderHeader
ORDER BY OrderDate

As you can see in the result each row now shows the first and the last order number. There is
something fishy about the last order number though, it always has the value of the current row, as if
that is the last row that is evaluated. That is actually true. Just like aggregate
functions <code>FIRST_VALUE and LAST_VALUE support framing. So if we wanted to show the last
order ever we would have to explicitly indicate this by framing. As you can see the following query
does return the expected result.

Hide Copy Code


SELECT
SalesOrderID,
CustomerID,
SalesOrderNumber AS CurrentOrder,
FIRST_VALUE(SalesOrderNumber)
OVER(ORDER BY SalesOrderNumber
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS FirstOrder,
LAST_VALUE(SalesOrderNumber)
OVER(ORDER BY SalesOrderNumber
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING) AS LastOrder
FROM Sales.SalesOrderHeader
ORDER BY OrderDate

Of course we can also use partitions to get the first and last order per customer.

Hide Copy Code


SELECT
SalesOrderID,
CustomerID,
SalesOrderNumber AS CurrentOrder,
FIRST_VALUE(SalesOrderNumber)
OVER(PARTITION BY CustomerID
ORDER BY SalesOrderNumber
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS FirstOrder,
LAST_VALUE(SalesOrderNumber)
OVER(PARTITION BY CustomerID
ORDER BY SalesOrderNumber
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING) AS LastOrder
FROM Sales.SalesOrderHeader
ORDER BY OrderDate
Table of Contents
Part I

1. Table of Contents
2. Introduction
3. Installing the sample database
4. Defining a query
5. Our first query; the SELECT statement
o 5.1. Aliasing
6. Eliminating duplicates; DISTINCT
7. Filtering data; the WHERE clause
o 7.1. The NOT keyword
o 7.2. Combining predicates
o 7.3. Filtering strings
o 7.4. Filtering dates
o 7.5. The IN and BETWEEN keywords
8. Sorting data; ORDER BY
9. Further limiting results; TOP and OFFSET-FETCH
o 9.1. TOP
o 9.2. OFFSET-FETCH
10. Aggregating data; GROUP BY and HAVING
11. Selecting from multiple tables; using JOINS
o 11.1. CROSS JOIN
o 11.2. INNER JOIN
o 11.3. OUTER JOIN
o 11.4. Self JOIN
o 11.5. Multiple JOINS
12. Multiple groups; GROUPING SETS
o 12.1. GROUPING SETS
o 12.2. CUBE
o 12.3. ROLLUP
o 12.4. GROUPING
o 12.5. GROUPING_ID
13. Windowing functions; the OVER clause
o 13.1. Aggregate functions
o 13.2. Framing
o 13.3. Ranking functions
o 13.4. Offset functions
14. That's not it yet...

Part II

1. Table of Contents
2. Welcome back!
3. Queries in your queries; Subqueries
o 3.1. Writing the same queries differently
o 3.2. More filtering options; IN, ANY, SOME, ALL and EXISTS
 3.2.1. IN
 3.2.2. ANY and SOME
 3.2.3. ALL
 3.2.4. EXISTS
4. Querying from subqueries; Derived tables
5. Common Table Expressions a.k.a. CTEs
6. Set operators; UNION, INTERSECT and EXCEPT
o 6.1. Combining sets; UNION and UNION ALL
o 6.2. Recursion with CTEs and UNION ALL
o 6.3. INTERSECT
o 6.4. EXCEPT
7. Pushing over tables; PIVOT and UNPIVOT
o 7.1. Pivoting
o 7.2. Unpivoting
8. More uses for Table Expressions; APPLY
o 8.1. CROSS APPLY
o 8.2. OUTER APPLY
9. Other aspects of querying
o 9.1. Converting types; CAST and CONVERT, PARSE and FORMAT
 9.1.1 CAST and CONVERT
 9.1.2 PARSE
 9.1.3 FORMAT
o 9.2. VARCHAR functions
o 9.3. DATETIME functions
o 9.4. CASE and IIF
 9.4.1 CASE
 9.4.2 IIF
o 9.5. COALESCE, ISNULL and NULLIF
10. Conclusions

2. Welcome back!
As you can read in the title of this article this is actually the second and last part of an article
dedicated to querying a Microsoft SQL Server database. I strongly recommend that you read the first
article, Querying SQL Server 2012 Part I, if you have not done so already.
The first article focused on building up your query. It started of simple with the SELECT statement
and built up difficulty by adding filtering, grouping and windowing functions to your queries.
This second part of the article focuses on combining multiple SELECT statements and still returning
a single result set. In addition we will see how to use functions and manipulate data so we can
combine values from our database and apply functions to values.

3. Queries in your queries; Subqueries


So as I said this article will focus on using multiple SELECT statements within a query. The easiest
and most common way to do this is by using a subquery. A subquery is a query within a query that
returns a result that is expected at the place where the subquery is placed. Some of the windowing
functions we used in part I of this article return results that could have been returned by using a
subquery. Say we want to query the orders table and for each order we want to show the most
expensive order and least expensive order as well. To get this result we could use the following
query, as we have seen in part I.

Hide Copy Code

SELECT
SalesOrderID,
SalesOrderNumber,
CustomerID,
SubTotal,
MIN(SubTotal) OVER() AS LeastExpensive,
MAX(SubTotal) OVER() AS MostExpensive
FROM Sales.SalesOrderHeader
ORDER BY SubTotal

We could get this same result using subqueries. The subquery returns a result for each row. So let me
show you the above query rewritten with subqueries.

Hide Copy Code

SELECT
SalesOrderID,
SalesOrderNumber,
CustomerID,
SubTotal,
(SELECT MIN(SubTotal)
FROM Sales.SalesOrderHeader) AS LeastExpensive,
(SELECT MAX(SubTotal)
FROM Sales.SalesOrderHeader) AS MostExpensive
FROM Sales.SalesOrderHeader
ORDER BY SubTotal

That is a lot more code to get the same result you might think. Besides, this method is more error
prone than using a windowing function. What if we would forget our aggregate function? The
subquery would return allsubtotals from the order table, but we cannot put more than one value in,
well, a single value! See what happens for yourself. The error message is pretty clear I think.
So if subquerying requires more code and is more error prone to boot then why would we use a
subquery? Well, subqueries can be a lot more complex than I have just shown you. You can query
anything from anywhere as long as it returns a result set that is expected at the place where you use
it. In a SELECT statement that would be a single value. You could query from the Person table and
still include the most and least expensive orders.

Hide Copy Code

SELECT
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName,
(SELECT MIN(SubTotal)
FROM Sales.SalesOrderHeader) AS LeastExpensive,
(SELECT MAX(SubTotal)
FROM Sales.SalesOrderHeader) AS MostExpensive
FROM Person.Person
ORDER BY FirstName, MiddleName, LastName

Now that is already pretty neat, but wait, there is more. The subqueries I have just shown you
are self-contained subqueries. That means the subquery can be executed without the outer query and
still return a result. These can be useful, but a lot of the times you might want to select something
based on a value of the outer query. In this case we can use correlated subqueries. That simply means
that we use a value from our outer query in our subquery. Let us use the query above, but now show
the least and most expensive orders for that particular person. Remember that a person does not
make an order. A customer makes an order and a customer is linked to a person. We are going to
need a join for this... Or another subquery!

Hide Copy Code

SELECT
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName,
(SELECT MIN(SubTotal)
FROM Sales.SalesOrderHeader AS s
WHERE s.CustomerID = (SELECT
c.CustomerID
FROM Sales.Customer AS c
WHERE c.PersonID = p.BusinessEntityID)
) AS LeastExpensive,
(SELECT MAX(SubTotal)
FROM Sales.SalesOrderHeader AS s
WHERE s.CustomerID = (SELECT
c.CustomerID
FROM Sales.Customer AS c
WHERE c.PersonID = p.BusinessEntityID)
) AS MostExpensive
FROM Person.Person AS p
ORDER BY FirstName, MiddleName, LastName

Allright, now that is looking complex! It really is not. Both subqueries are the same except for
the MIN and MAXfunctions, so focus on one subquery. Notice that I have used column aliases
because CustomerID is a column in both SalesOrderHeader and Customer. So let us isolate the
subquery and take a closer look at it. Note that you cannot run it, because it references a column
from the outer query.

Hide Copy Code

(SELECT MAX(SubTotal)
FROM Sales.SalesOrderHeader AS s
WHERE s.CustomerID = (SELECT
c.CustomerID
FROM Sales.Customer AS c
WHERE c.PersonID = p.BusinessEntityID)
) AS MostExpensive

So that is it. Notice that you can actually use a subquery in a WHERE clause? Now that is pretty cool!
Remember that windowing functions cannot be used in any clause except ORDER BY. So in
the WHERE clause we select the CustomerID from the Customer table where PersonID is equal to
the BusinessEntityID from the outer query. So far so good, right? The CustomerID returned by
the query is used to select the most (or least) expensive order from that customer from
the SalesOrderHeader table. Notice that when the subquery does not return a value the row is not
discarded, but a NULL is shown instead.

3.1 Writing the same queries differently

Here is a nice challenge, if you are up for it. Rewrite the last query using no subselects. There are
actually multiple ways to tackle this. I used JOINS instead of subqueries. There are a lot of other
methods as well, but they are not covered in this article or not covered yet. Here are two possible
solutions, one using a GROUP BY clause and the same solutions, but using windowing
and DISTINCT instead of GROUP BY.

Hide Shrink Copy Code

SELECT
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName,
MIN(s.SubTotal) AS LeastExpensive,
MAX(s.SubTotal) AS MostExpensive
FROM Person.Person AS p
LEFT JOIN Sales.Customer AS c ON c.PersonID = p.BusinessEntityID
LEFT JOIN Sales.SalesOrderHeader AS s ON s.CustomerID = c.CustomerID
GROUP BY
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName
ORDER BY FirstName, MiddleName, LastName

SELECT DISTINCT
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName,
MIN(s.SubTotal) OVER(PARTITION BY s.CustomerID) AS LeastExpensive,
MAX(s.SubTotal) OVER(PARTITION BY s.CustomerID) AS MostExpensive
FROM Person.Person AS p
LEFT JOIN Sales.Customer AS c ON c.PersonID = p.BusinessEntityID
LEFT JOIN Sales.SalesOrderHeader AS s ON s.CustomerID = c.CustomerID
ORDER BY FirstName, MiddleName, LastName

So if all three queries return the same result which one should you use!? query optimization is
outside the scope of this article, but I do want to mention this. You can see IO statistics by running
the following statement in your query window.

Hide Copy Code

SET STATISTICS IO ON

Next you can turn on the option 'Include actual query plan' under the 'Query' menu option.
Now run your query again. In the messages window you will see detailed IO information. As a rule of
thumb goes the lesser reads the better. You will now also find a third window, the 'Execution plan',
which shows exactly what steps SQL Server had to perform to get to your result. Read it from right to
left. Less steps is not always better, some steps require more work from SQL Server than others.

So which query did actually perform best? The one with the GROUP BY. And which was worst (by
far)? The one including the subqueries.
That does not automatically mean subqueries are slow or evil. It simply means there are more
queries to get to the same result and what is best in one scenario is not always the best in another.
In addition, sometimes subqueries can be used where other functions cannot be used.

3.2 More filtering options; IN, ANY, SOME, ALL and EXISTS

3.2.1 IN

I have already shown a subquery in a WHERE clause. You can use it to check if values are greater than,
less than, like another value etc. The following query shows a somewhat lame example of using a
subquery and the LIKEoperator. It selects all people whos first name begins with a 'Z'.

Hide Copy Code

SELECT *
FROM Person.Person
WHERE FirstName LIKE (SELECT 'Z%')

Now remember the IN function? It can actually take a range of values as input. We can easily
combine that with a subquery. We just have to make sure our subquery returns a single column and
whatever number of rows. So far we have only seen subqueries that return a single value. Let us say
we want to find all people who are also a customer. If a Person is a Customer we know that there is
a Customer with the PersonID of that Person. So the next query will easily solve this.

Hide Copy Code

SELECT *
FROM Person.Person AS p
WHERE p.BusinessEntityID IN (SELECT PersonID
FROM Sales.Customer)

Which is equivalent to:

Hide Copy Code

SELECT *
FROM Person.Person AS p
WHERE p.BusinessEntityID = x OR p.BusinessEntityID = y OR p.BusinessEntityID = z... etc.

And if we want to know which people are not a customer we use the keyword NOT. In this case we
also have to check for NULL.

Hide Copy Code

SELECT *
FROM Person.Person AS p
WHERE NOT p.BusinessEntityID IN (SELECT PersonID
FROM Sales.Customer
WHERE PersonID IS NOT NULL)

Remember that NULL can also be explained as UNKNOWN, so if a single NULL is returned from your
subquery SQL Server returns no rows because it does not know if a value is or is not contained in the
result. After all, it might be that NULL value. The previous query is equivalent to the following.

Hide Copy Code

SELECT *
FROM Person.Person
WHERE BusinessEntityID <> x AND BusinessEntityID <> y AND BusinessEntityID <> z... etc.

3.2.2 ANY and SOME

The ANY operator works much like the IN operator, except in that you can use
the >, <, >=, <=, = and <>operators to compare values. ANY returns true if at least one value returned
by the subquery makes the predicate true. So the following query returns all persons except that
with BusinessEntityID 1, because 1 > 1 returns FALSE.

Hide Copy Code

SELECT *
FROM Person.Person
WHERE BusinessEntityID > ANY (SELECT 1)

In queries such as these ANY may not be that useful, even when you specifiy a more meaningful
subquery. In some cases ANY can be useful though. For example if you want to know if all orders
(from a specific date) have at least a certain status. The following query illustrates this.

Hide Copy Code

DECLARE @OrderDate AS DATETIME = '20050517'


DECLARE @Status AS TINYINT = 4
IF @Status > ANY(SELECT Status
FROM Purchasing.PurchaseOrderHeader
WHERE OrderDate = @OrderDate)
PRINT 'Not all orders have the specified status!'
ELSE
PRINT 'All orders have the specified status.'

If @Status is bigger than any result from the subquery then the result is TRUE (because there are
some/any orders that are not at least status 4. The query prints "Not all orders have the
specified status!".

Instead of ANY you can use SOME, which has the same meaning.

Hide Copy Code

DECLARE @OrderDate AS DATETIME = '20050517'


DECLARE @Status AS TINYINT = 4
IF @Status > SOME(SELECT Status
FROM Purchasing.PurchaseOrderHeader
WHERE OrderDate = @OrderDate)
PRINT 'Not all orders have the specified status!'
ELSE
PRINT 'All orders have the specified status.'

3.2.3 ALL

Unlike ANY, ALL looks at all results returned by a subquery and only returns TRUE if the comparison
with all results makes the predicate true. The previous query could have been rewritten as follows.

Hide Copy Code

DECLARE @OrderDate AS DATETIME = '20050517'


DECLARE @Status AS TINYINT = 4
IF @Status < ALL(SELECT Status
FROM Purchasing.PurchaseOrderHeader
WHERE OrderDate = @OrderDate)
PRINT 'All orders have the specified status.'
ELSE
PRINT 'Not all orders have the specified status!'

3.2.4 EXISTS

EXISTS can be used like ANY and ALL, but returns true only if at least one record was returned by
the subquery. It is pretty useful and you will probably use this more often. Let us say we want all
customers that have placed at least one order.

Hide Copy Code

SELECT *
FROM Sales.Customer AS c
WHERE EXISTS(SELECT *
FROM Sales.SalesOrderHeader AS s
WHERE s.CustomerID = c.CustomerID)

Now we might be more interested in customers that have not placed any orders.

Hide Copy Code

SELECT *
FROM Sales.Customer AS c
WHERE NOT EXISTS(SELECT *
FROM Sales.SalesOrderHeader AS s
WHERE s.CustomerID = c.CustomerID)

Notice that the EXISTS functions only returns TRUE or FALSE and not any columns. For that reason
it does not matter what you put in your SELECT statement. In fact, this is the only place where you
can use SELECT *without worrying about it!

4. Querying from subqueries; Derived tables


In the previous chapter we have seen subqueries. We have seen them in our SELECT statement, in
our WHEREstatement and passed as parameters to functions. You can put subqueries almost
anywhere you want, including HAVING and ORDER BY clauses. This also includes the FROM clause.

When we use subqueries in our FROM clause the result is called a derived table. A derived table is a
named table expression and, like a subquery, is only visible to its outer query. It differs from
subqueries in that they return a complete table result. This can actually solve some problems we had
earlier! Remember that we could not use windowing functions anywhere except in the ORDER
BY clause? What if we selected the values first and then filtered them in an outer query? This is
perfectly valid!

Hide Copy Code


SELECT *
FROM (SELECT
SalesOrderID,
SalesOrderNumber,
CustomerID,
AVG(SubTotal) OVER(PARTITION BY CustomerID) AS AvgSubTotal
FROM Sales.SalesOrderHeader) AS d
WHERE AvgSubTotal > 100
ORDER BY AvgSubTotal, CustomerID, SalesOrderNumber

There are a few things you will have to keep in mind. The result of a subquery needs to
be relational. That means every column it returns must have a name. AVG(SubTotal)... would not
have a name, so we MUST alias it. We must also alias the derived table itself.

Also, a relation is not ordered, so we cannot specify an ORDER BY in the derived table. There is an
exception to this last rule. Whenever you specify a TOP or OFFSET-FETCH clause in your derived
table you can use ORDER BY. In this case the query will not return an ordered result, but it returns
the top x rows that should be in the result IF the result was ordered. So the ORDER BY is used as a
filter rather than ordering. The next query illustrates this.

Hide Copy Code

SELECT *
FROM (SELECT TOP 100 PERCENT
SalesOrderID,
SalesOrderNumber,
CustomerID,
Freight
FROM Sales.SalesOrderHeader
ORDER BY Freight) AS d

And here is the result.

Notice that I am making a select on the entire table, because it is kind of hard to NOT have SQL
Server return sorted data. In every other case it would have to sort the data first before it can check
which rows should and should not be returned. And when the data is sorted SQL Server does not
unsort them before returning the result. In this case a sort is not necessary because the entire table
needs to be returned anyway. The bottom line here is that the ORDER BY did not actually order our
result set.

The next query illustrates that the result MAY be ordered though.

Hide Copy Code

SELECT *
FROM (SELECT TOP 10000
SalesOrderID,
SalesOrderNumber,
CustomerID
FROM Sales.SalesOrderHeader
ORDER BY CustomerID) AS d

And here are the results.

It may look like these results are ordered, but remember that SQL Server cannot guarantee ordering.
In this case SQL Server needed to sort the rows on CustomerID before returning the rows. Just
remember that ORDER BY in any relational result is used for filtering, not ordering.
One thing you can not do using derived tables is joining with the derived table. You might want to
do something like I did in my example on self JOINs in part I of the article. This is not valid however.

Hide Copy Code

SELECT *
FROM (SELECT
SalesOrderID,
SalesOrderNumber,
CustomerID,
AVG(SubTotal) OVER(PARTITION BY CustomerID) AS AvgSubTotal
FROM Sales.SalesOrderHeader) AS d
JOIN d AS d2 ON d2.CusomerID = d.CustomerID + 1
WHERE AvgSubTotal > 100
ORDER BY AvgSubTotal, CustomerID, SalesOrderNumber

This is NOT valid syntax and the only way you will be able to join on your derived table is by
duplicating your derived table in your JOIN clause!

Hide Copy Code

SELECT *
FROM (SELECT
SalesOrderID,
SalesOrderNumber,
CustomerID,
AVG(SubTotal) OVER(PARTITION BY CustomerID) AS AvgSubTotal
FROM Sales.SalesOrderHeader) AS d
JOIN (SELECT
SalesOrderID,
SalesOrderNumber,
CustomerID,
AVG(SubTotal) OVER(PARTITION BY CustomerID) AS AvgSubTotal
FROM Sales.SalesOrderHeader) AS d2 ON d2.CustomerID = d.CustomerID + 1
WHERE d.AvgSubTotal > 100
ORDER BY d.AvgSubTotal, d.CustomerID, d.SalesOrderNumber

Yikes! This also exposes another problem with derived tables and subqueries in general. They can
make your query big, complex and difficult to read.

5. Common Table Expressions a.k.a. CTEs


Like derived tables a Common Table Expression, also commonly abbreviated as CTE, is a named table
expression that is only visible to the query that defines it. The CTE makes up for some of the
shortcomings of the derived table though. For starters, a CTE is defined at the beginning of your
query, or actually at the top. This makes it more readable than a derived table. You first name and
define your CTE and then work with it in a following query. The first example I showed using a
derived table can be rewritten using a CTE.

Hide Copy Code


WITH CTE
AS
(
SELECT
SalesOrderID,
SalesOrderNumber,
CustomerID,
AVG(SubTotal) OVER(PARTITION BY CustomerID) AS AvgSubTotal
FROM Sales.SalesOrderHeader
)
SELECT *
FROM CTE
WHERE AvgSubTotal > 100
ORDER BY AvgSubTotal, CustomerID, SalesOrderNumber

The syntax as clear and concise. But that is not all a CTE can do. You can use multiple CTEs and join
them together in your final SELECT statement. The following is actually a case I have had a few times
in production. We have a header and a detail table. The detail has some sort of total (a total price or
weight) and the total of all lines is stored in the header. Somethimes things do not go the way they
should and your header total does not actually reflect the total of all of your details.

Hide Copy Code

WITH SOH
AS
(
SELECT
s.SalesOrderID,
s.SalesOrderNumber,
s.CustomerID,
p.FirstName,
p.LastName,
s.SubTotal
FROM Sales.SalesOrderHeader AS s
JOIN Sales.Customer AS c ON c.CustomerID = s.CustomerID
JOIN Person.Person AS p ON p.BusinessEntityID = c.PersonID
),
SOD AS
(
SELECT
SalesOrderID,
SUM(LineTotal) AS TotalSum
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
)
SELECT *
FROM SOH
JOIN SOD ON SOD.SalesOrderID = SOH.SalesOrderID
WHERE SOH.SubTotal <> SOD.TotalSum

Notice a few things. The first CTE focuses on getting the data we need and performing necessary
joins. The second CTE focuses on getting the sum for all salesorder details. The final query joins the
two CTEs and filters only those where the total of the order is not equal to that of the details. We
could have written this without CTEs. The following query shows how.
Hide Copy Code

SELECT
s.SalesOrderID,
s.SalesOrderNumber,
s.CustomerID,
p.FirstName,
p.LastName,
s.SubTotal,
sd.SalesOrderID,
SUM(sd.LineTotal) AS TotalSum
FROM Sales.SalesOrderHeader AS s
JOIN Sales.Customer AS c ON c.CustomerID = s.CustomerID
JOIN Person.Person AS p ON p.BusinessEntityID = c.PersonID
JOIN Sales.SalesOrderDetail AS sd ON sd.SalesOrderID = s.SalesOrderID
GROUP BY
s.SalesOrderID,
s.SalesOrderNumber,
s.CustomerID,
p.FirstName,
p.LastName,
s.SubTotal,
sd.SalesOrderID
HAVING SUM(sd.LineTotal) <> s.SubTotal

So which one should you use? Actually it really does not matter. Both queries perform exactly the
same in terms of reads. The only difference is that the query plan for the last query performs an extra
step for the HAVINGclause.

You could go crazy with CTEs and even perform your last join in a seperate CTE. This illustrates a nice
use of CTEs. You can refer to a CTE within another CTE. The following query compiles to the exact
same query as the one with 'only' two CTEs.

Hide Shrink Copy Code

WITH SOH
AS
(
SELECT
s.SalesOrderID,
s.SalesOrderNumber,
s.CustomerID,
p.FirstName,
p.LastName,
s.SubTotal
FROM Sales.SalesOrderHeader AS s
JOIN Sales.Customer AS c ON c.CustomerID = s.CustomerID
JOIN Person.Person AS p ON p.BusinessEntityID = c.PersonID
),
SOD AS
(
SELECT
SalesOrderID,
SUM(LineTotal) AS TotalSum
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
),
TOTAL AS
(
SELECT
SOH.SalesOrderID,
SOH.SalesOrderNumber,
SOH.CustomerID,
SOH.FirstName,
SOH.LastName,
SOH.SubTotal,
SOD.TotalSum
FROM SOH
JOIN SOD ON SOD.SalesOrderID = SOH.SalesOrderID
WHERE SOH.SubTotal <> SOD.TotalSum
)
SELECT *
FROM TOTAL

Unlike a derived table a CTE can be used in JOINs and can also make self JOINs. I still have no use
case for self JOINs in AdventureWorks2012, so I will use the same example I used in part I, but this
time using a CTE.

Hide Copy Code

WITH CTE AS
(
SELECT
BusinessEntityID,
Title,
FirstName,
LastName
FROM Person.Person
)
SELECT
CTE1.BusinessEntityID AS CurrentID,
CTE1.Title AS CurrentTitle,
CTE1.FirstName AS CurrentFirstName,
CTE1.LastName AS CurrentLastName,
CTE2.BusinessEntityID AS NextID,
CTE2.Title AS NextTitle,
CTE2.FirstName AS NextFirstName,
CTE2.LastName AS NextLastName
FROM CTE AS CTE1
LEFT JOIN CTE AS CTE2 ON CTE2.BusinessEntityID = CTE1.BusinessEntityID + 1
ORDER BY CurrentID, CurrentFirstName, CurrentLastName

CTEs actually have another use within SQL, which is recursion. I will get to that in the next section.

6. Set operators; UNION, INTERSECT and


EXCEPT
6.1 Combining sets; UNION and UNION ALL
There are a few operators that can be used to combine multiple result sets into a single set.
The UNION operator is one of those. UNION takes two result sets and glues them together. There are
two types of UNIONs, the UNIONand the UNION ALL. The difference between the two is
that UNION eliminates rows that are in both sets (duplicates that is) while UNION ALL keeps these
rows. The following example makes clear how to use UNIONand UNION ALL and what the difference
is.

Hide Copy Code

SELECT TOP 1
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName
FROM Person.Person

UNION

SELECT TOP 2
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName
FROM Person.Person
ORDER BY BusinessEntityID

And the result.

Notice that we actually selected a total of three records, yet the result shows only two. That is
because the first record was eliminated by the UNION because all attributes had the same value (they
were duplicates). Notice that NULLs are considered as equal.

If we wanted to keep that third row we could use UNION ALL.

Hide Copy Code

SELECT TOP 1
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName
FROM Person.Person

UNION ALL
SELECT TOP 2
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName
FROM Person.Person
ORDER BY BusinessEntityID

This time the duplicate row was not discarded from the result.

A UNION operator can work on any two sets that have the same number of columns with the same
type at each column index.

Hide Copy Code

SELECT
'Sales order' AS OrderType,
SalesOrderID AS OrderID,
SalesOrderNumber,
CustomerID AS CustomerOrVendorID,
SubTotal,
NULL AS RevisionNumber
FROM Sales.SalesOrderHeader
WHERE SubTotal < 2

UNION

SELECT
'Purchase order',
PurchaseOrderID,
NULL,
VendorID,
SubTotal,
RevisionNumber
FROM Purchasing.PurchaseOrderHeader
WHERE RevisionNumber > 5
ORDER BY OrderType, OrderID

As you can see we can select from any two tables, but the number of columns must be the same for
both queries, as well as the type of the returned value (text, numeric, date, etc).

Other than that there are a few things to notice. The column names of the first query are used. For
columns that are not applicable in the current select we can use NULLs (or any other value of the
correct type) as placeholders. Each query has its own WHERE clause (and any other clauses)
except ORDER BY. ORDER BY is placed at the end to order the entire result set.
You can use more than one UNION to combine even more sets. The following query adds an
additional row for each order types total subtotal.

Hide Shrink Copy Code

SELECT
'Sales order' AS OrderType,
SalesOrderID AS OrderID,
SalesOrderNumber,
CustomerID AS CustomerOrVendorID,
SubTotal,
NULL AS RevisionNumber
FROM Sales.SalesOrderHeader
WHERE SubTotal < 2

UNION

SELECT
'Purchase order',
PurchaseOrderID,
NULL,
VendorID,
SubTotal,
RevisionNumber
FROM Purchasing.PurchaseOrderHeader
WHERE RevisionNumber > 5

UNION

SELECT
'Sales order total',
NULL,
NULL,
NULL,
SUM(SubTotal),
NULL
FROM Sales.SalesOrderHeader
WHERE SubTotal < 2

UNION

SELECT
'Purchase order total',
NULL,
NULL,
NULL,
SUM(SubTotal),
NULL
FROM Purchasing.PurchaseOrderHeader
WHERE RevisionNumber > 5

ORDER BY OrderType, OrderID

6.2 Recursion with CTEs and UNION ALL

I already mentioned that CTEs can be used for recursive functions. Remember the self JOIN example
from part I? An Employee might have a ManagerID which refers to another Employee. Of course
any manager can have his own manager all the way up to the top manager. There is no way for us to
tell how many managers are up the hierarchy. For this we can use recursion. Just give us the manager
of the manager of the manager... up to the point where a manager has no manager. This can be
achieved using UNION ALL. Such a recursive CTE consists of two or more queries, one being the
anchor member and the other as the recursive member. The anchor member is invoked once and
returns a relational result. The recursive member is called until it returns an empty result set and has
a reference to the CTE.

Unfortunately I have no use case for recursion in AdventureWorks2012 so I am just going to use
recursion to select any Person from the Person table with a BusinessEntityID that is one lower
than the previous until there is no ID that is one lower anymore

Hide Copy Code

WITH REC AS
(
SELECT
BusinessEntityID,
FirstName,
LastName
FROM Person.Person
WHERE BusinessEntityID = 9

UNION ALL

SELECT
p.BusinessEntityID,
p.FirstName,
p.LastName
FROM REC
JOIN Person.Person AS p ON p.BusinessEntityID = REC.BusinessEntityID - 1
)
SELECT *
FROM REC

And here is the result for running this query for BusinessEntityID 9.
And to prove this really works, here is the result of the same query, but
with BusinessEntityID 1704. It stops at 1699 because appearently there is no Person with
a BusinessEntityID of 1698.

A small word of caution: the maximum recursion in SQL Server is 100. So the following query will run
without problems.

Hide Copy Code

WITH REC AS (
SELECT 100 AS SomeCounter

UNION ALL

SELECT SomeCounter - 1
FROM REC
WHERE SomeCounter - 1 >= 0
)
SELECT *
FROM REC

Adding one will result in an overflow though!

Using the query hint MAXRECURSION can help overcome this limitation. The following query will run
fine again (up to 200 recursion depth).
Hide Copy Code

WITH REC AS (
SELECT 101 AS SomeCounter

UNION ALL

SELECT SomeCounter - 1
FROM REC
WHERE SomeCounter - 1 >= 0
)
SELECT *
FROM REC
OPTION (MAXRECURSION 200)

6.3 INTERSECT

INTERSECT is another set operator and the syntax and rules are the same as that of
the UNION operator. The difference between the two is the results that are returned. UNION returns
all rows and discards duplicate rows. INTERSECT returns only duplicate rows (once). Let us take the
first example I used for UNION, but replace the UNION with an INTERSECT.

Hide Copy Code

SELECT TOP 1
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName
FROM Person.Person

INTERSECT

SELECT TOP 2
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName
FROM Person.Person
ORDER BY BusinessEntityID

Since only the TOP 1 record is returned by both queries this is also the result
that INTERSECT returns.

Like with UNIONs in INTERSECT NULLs are considered as equal.

6.4 EXCEPT
EXCEPT is the third set operator and the syntax and rules are also the same as for that of
the UNION operator. EXCEPT returns only the records from the first query that are not returned by
the second query. In other words, EXCEPT returns rows that are unique to the first query. We should
notice here that with UNION and INTERSECT it does not matter which query comes first and which
query comes second, the result remains the same. With EXCEPT the order of queries does matter. I
will show this by using the same example I used for UNION and INTERSECT, but use EXCEPT instead.
The following query returns no rows.

Hide Copy Code

SELECT TOP 1
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName
FROM Person.Person

EXCEPT

SELECT TOP 2
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName
FROM Person.Person
ORDER BY BusinessEntityID

No rows were returned because the first query returned no rows that were not returned by the
second query. Now let us switch the TOPs. The second row is now only returned by the first query.

Hide Copy Code

SELECT TOP 2
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName
FROM Person.Person

EXCEPT

SELECT TOP 1
BusinessEntityID,
Title,
FirstName,
MiddleName,
LastName
FROM Person.Person
ORDER BY BusinessEntityID

In this query a result is returned.


You can use UNION, INTERSECT and EXCEPT in the same query. In this case INTERSECT takes
precedence over UNION and EXCEPT. UNION and EXCEPT are considered equal and are performed in
the order in which they appear.

7. Pushing over tables; PIVOT and UNPIVOT


7.1 Pivoting

Pivoting and unpivoting are specialized cases of grouping and aggregating data. Pivoting is the
process of creating columns from rows while unpivoting is the process of creating rows from
columns. We start with pivoting data. For example, we want to know the total subtotal that
customers ordered grouped by sales person. However, for some reason we want to make columns
out of the sales persons. So we'd get the columns CustomerID, SalesPerson1, SalesPerson2,
etc. and the rows showing the ID for a customer and then the total subtotal ordered
for SalesPerson1, SalesPerson2, etc. This is exactly what the PIVOT operator is for. To
understand this better let us look at an example.

Hide Copy Code

WITH PivotData AS
(
SELECT
s.CustomerID,
s.SalesPersonID AS SpreadingCol,
s.SubTotal AS AggregationCol
FROM Sales.SalesOrderHeader AS s
)
SELECT
CustomerID,
[274] AS StephenJiang,
[275] AS MichaelBlythe,
[276] AS LindaMitchell
FROM PivotData
PIVOT(SUM(AggregationCol) FOR SpreadingCol IN ([274], [275], [276])) AS P
WHERE [274] IS NOT NULL OR [275] IS NOT NULL OR [276] IS NOT NULL

And here is a part of the result (notice I'm not showing from the first line).
So that is looking pretty difficult. Let us break that up in simpler parts. First of all I am using a CTE to
identify the columns I want to use in my PIVOT. I then make a rather weird SELECT statement where
I select for, what seem to be, random numbers. Actually these are values from the so called spreading
column defined in the PIVOTclause. So 274, 275 and 276 are actually SalesPerson ID's that I want
to see as columns rather than values in rows. In the PIVOT clause I indicate the aggregation operator
I want to use, in this case SUM, and the column I want to aggregate. Then I specify the spreading
column, or the column which values should be columns instead of values in rows. Last, but not least,
your PIVOT must have an alias assigned, even when you do not use it. The WHERE clause is optional.
That is quite something so take a while to study the syntax and let it sink in.

You might have noticed I do not do anything with CustomerID in the PIVOT clause. This is because
with PIVOTthe grouping element(s) are identified by elimination. Since I am not
using CustomerID in either an aggregate function or as speading element it automatically becomes
the grouping element. This is also the reason it is best to use a CTE. If you had just selected from
the Sales.SalesOrderHeader table directly the SalesOrderHeaderID would become part of
the grouping and you would have gotten one row per order instead of per CustomerID!

Now there are a few limitations to using the PIVOT operator. One of those is that you cannot use
expressions to define your aggregation or spreading colum values. Another limitation is
that COUNT(*) is not allowed as aggregation function used by PIVOT. Instead you need to
use COUNT(ColumnName). You can work around this limitation by selecting a constant value in
your CTE and using COUNT on that column. Furthermore you can use only one aggregate function.
The last limitation, and in my opinion the limitation that makes you not want to use PIVOT to often,
is that the spreading values must be a list of static values. In our example these values
are SalesPerson ID's. So what does this mean? It means you are actually hard coding values in your
query! I only showed three SalesPersonsbecause listing them all would make the query a lot
bigger. Just presume I queried for our top three sales people. What if next month someone else sells
better? We would have to alter our query! Or what if a SalesPerson left the company? Back to
rewriting your query... Actually I had to find the names of the SalesPersons manually and use them
as column aliases to make sense of the columns (although I could have joined on the names and use
those as spreading values). You can come around this limitation by using Dynamic SQL, but that is
outside the scope of this article. And of course this is no problem at all when you are writing queries
for values that are really (sort of) static, like VAT percentages or status ID's.

7.2 Unpivoting

Unpivoting, in a sense, is the opposite, or inverse, of pivoting. Instead of making columns out of row
values we can make row values out of column values. In fact the starting point for an UNPIVOT is,
usually, pivoted data. So let us take the PIVOT example and UNPIVOT it again. For this example I
will wrap the result of the PIVOTexample in a CTE and use this to unpivot the data. this is not very
useful in a real world example (after all, why would you PIVOT first and UNPIVOT directly after?),
but this becomes more useful when the pivoted data is stored in a table or the result of
a VIEW or STORED PROCEDURE.

Hide Copy Code

WITH DataToPivot AS
(
SELECT
s.CustomerID,
s.SalesPersonID AS SpreadingCol,
s.SubTotal AS AggregationCol
FROM Sales.SalesOrderHeader AS s
),
DataToUnpivot AS
(
SELECT
CustomerID,
[274] AS StephenJiang,
[275] AS MichaelBlythe,
[276] AS LindaMitchell
FROM DataToPivot
PIVOT(SUM(AggregationCol) FOR SpreadingCol IN ([274], [275], [276])) AS P
WHERE [274] IS NOT NULL OR [275] IS NOT NULL OR [276] IS NOT NULL
)
SELECT
CustomerID,
SalesPerson,
SubTotal
FROM DataToUnpivot
UNPIVOT(SubTotal FOR SalesPerson IN(StephenJiang, MichaelBlythe, LindaMitchell)) AS U
ORDER BY CustomerID

I wrapped the result of the PIVOT example in the DataToUnpivot CTE. Notice that in this example
you should get three times as many rows as in the result of the PIVOT. After all each row from
the PIVOT result is now applied to the StephenJiang value, the MichaelBlythe value and
the LindaMitchell value. This is not the case however, since UNPIVOT removes rows where
the SubTotal would be NULL for a given SalesPerson. The following example of the same
customer for the pivoted and the unpivoted result might clear things up.
In the unpivoted result LindaMitchell was removed because in the pivoted result her value was
missing.

Let us take another look at an example where we unpivot a table that is not actually pivoted.
The Sales.SpecialOffer table has special offers with a Description, a MinQty and a MaxQty.
Instead of showing all three in one row we want a seperate row for both MinQty and MaxQty.

Hide Copy Code

SELECT
SpecialOfferID,
Description,
QtyType,
Qty
FROM Sales.SpecialOffer
UNPIVOT(Qty FOR QtyType IN (MinQty, MaxQty)) AS U

As you can see QtyType gets the value of either MinQty or MaxQty (previous column names) and
the Qty now shows the value that was previously either in the MinQty column or in
the MaxQty column, dependent on QtyType.

Again it is not possible to use expressions for the value of the new values column or the new column
name values column.

8. More uses for Table Expressions; APPLY


The APPLY operator can be used to 'join' a table expression to your query. I use the term 'join'
because an APPLY actually looks like a JOIN in that you can merge multiple tables into the same
result set. There are two types of APPLY operators, being the CROSS APPLY and the OUTER APPLY.
The interesting part about the APPLYoperator is that it can have a table expression that references
values from the outer query. One thing you should remember is that CROSS APPLY actually looks
most like the INNER JOIN while the OUTER APPLY works like an OUTER JOIN. Other than that
an APPLY operator may perform much better than a JOIN when the JOINconditions are rather
complex.

8.1 CROSS APPLY

The CROSS APPLY operator works like an INNER JOIN in that it can match rows from two tables
and leaves out rows that were not matched by the other table in the result. So let us look at an
example. We want to select all Persons that have a SalesOrder and show some order information
for the most expensive order that Personhas made.

Hide Copy Code

SELECT
p.BusinessEntityID,
p.FirstName,
p.LastName,
a.*
FROM Person.Person AS p
CROSS APPLY (SELECT TOP 1
s.SalesOrderID,
s.CustomerID,
s.SubTotal
FROM Sales.SalesOrderHeader AS s
JOIN Sales.Customer AS c ON c.CustomerID = s.CustomerID
WHERE c.PersonID = p.BusinessEntityID
ORDER BY s.SubTotal DESC) AS a
ORDER BY p.BusinessEntityID

So the CROSS APPLY operator takes a table expression as input parameter and simply joins the
result with each row of the outer query. Notice that we can do a TOP 1, filter by using
a WHERE clause to match the CustomerIDwith the BusinessEntityID and ORDER BY DESC to
get the most expensive order for that particular customer. Something that would not have been
possible by simply using a JOIN! Notice that Persons that have not placed an order are not
returned. Like with PIVOT and UNPIVOT we need to alias the result of the APPLYoperator.

Because we can reference values from the outer query in our APPLY operator it is also possible to
use functionswith the APPLY operator. The AdventureWorks2012 database actually has one user-
defined table valued function called ufnGetContactInformation, which takes a PersonID as input
and returns information about a Person (names and if they are suppliers, customers etc.). So using
the APPLY operator we can show this information in our resultset by passing
the BusinessEntityID of the outer query to the input of the function. Since calling this function
19972 times (once for each Person) is actually quite time consuming we are only selecting a TOP
1000.

Hide Copy Code

SELECT TOP 1000


p.BusinessEntityID,
a.*
FROM Person.Person AS p
CROSS APPLY ufnGetContactInformation(p.BusinessEntityID) AS a
ORDER BY p.BusinessEntityID

Notice that we must specify the function without parenthesis or a SELECT... FROM.

And of course we can use multiple APPLY operators in a single query.

Hide Copy Code


SELECT TOP 1000
p.BusinessEntityID,
a.*,
s.*
FROM Person.Person AS p
CROSS APPLY ufnGetContactInformation(p.BusinessEntityID) AS a
CROSS APPLY (SELECT TOP 1
s.SalesOrderID,
s.CustomerID,
s.SubTotal
FROM Sales.SalesOrderHeader AS s
JOIN Sales.Customer AS c ON c.CustomerID = s.CustomerID
WHERE c.PersonID = p.BusinessEntityID
ORDER BY s.SubTotal DESC) AS s
ORDER BY p.BusinessEntityID

You do not have to match rows from your outer query with your APPLY result. The
following APPLY simply returns the most expensive order and as a result shows this with
each Person, regardless of whether the order was placed by this Person.

Hide Copy Code

SELECT
p.BusinessEntityID,
p.FirstName,
p.LastName,
a.*
FROM Person.Person AS p
OUTER APPLY (SELECT TOP 1
s.SalesOrderID,
s.CustomerID,
s.SubTotal
FROM Sales.SalesOrderHeader AS s
ORDER BY s.SubTotal DESC) AS a
ORDER BY p.BusinessEntityID

And the rows do not have to be a one on one match either. The following query gets the top three
most expensive orders regardless of customer and as a result each Person is duplicated three times
in the result (once for each order, regardless of whether the order was placed by this Person).

Hide Copy Code

SELECT
p.BusinessEntityID,
p.FirstName,
p.LastName,
a.*
FROM Person.Person AS p
OUTER APPLY (SELECT TOP 3
s.SalesOrderID,
s.CustomerID,
s.SubTotal
FROM Sales.SalesOrderHeader AS s
ORDER BY s.SubTotal DESC) AS a
ORDER BY p.BusinessEntityID
8.2 OUTER APPLY

The OUTER APPLY works in much the same way as the CROSS APPLY with the exception that it also
returns rows if no corresponding row was returned by the APPLY operator. We can see this by using
the first example of the previous section, but by changing the CROSS APPLY into an OUTER APPLY.
By running this query you can see that Persons that have not placed an order are now also returned
in the result set.

Hide Copy Code

SELECT
p.BusinessEntityID,
p.FirstName,
p.LastName,
a.*
FROM Person.Person AS p
OUTER APPLY (SELECT TOP 3
s.SalesOrderID,
s.CustomerID,
s.SubTotal
FROM Sales.SalesOrderHeader AS s
JOIN Sales.Customer AS c ON c.CustomerID = s.CustomerID
WHERE c.PersonID = p.BusinessEntityID
ORDER BY s.SubTotal DESC) AS a
ORDER BY p.BusinessEntityID

Other than that all rules that apply to the CROSS APPLY operator also apply to OUTER APPLY.

9. Other aspects of querying


So far we have only queried data and returned the values from the database as they actually are.
There is much more that we can do with these values though. Suppose, for example, an order has
a SubTotal and a TotalDuecolumn. What we want to know is how much percent tax the customer
paid on an order. However Freight is also added to TotalDue, so we must subtract that first.
Usually we would calculate this by taking the difference between the SubTotal and TotalDue,
divide that by TotalDue and multiply that by 100.

We can do this kind of math in SQL Server. The following query shows how this can be done.

Hide Copy Code

SELECT
SalesOrderID,
CustomerID,
SubTotal,
TaxAmt,
Freight,
TotalDue,
(((TotalDue - Freight) - SubTotal) / (TotalDue - Freight)) * 100 AS TaxPercentage
FROM Sales.SalesOrderHeader
And of course we can use SubTotal and TaxAmt and Freight to calculate TotalDue ourselves.
You should remember is that adding a numeric value to NULL always results in NULL.

Hide Copy Code

SELECT
SalesOrderID,
CustomerID,
SubTotal,
TaxAmt,
Freight,
TotalDue,
SubTotal + TaxAmt + Freight AS TotalDueCalc
FROM Sales.SalesOrderHeader

How about if we want to give every customer a ten percent discount on their orders (probably a bad
idea, but let's do this).

Hide Copy Code

SELECT
SalesOrderID,
CustomerID,
SubTotal,
SubTotal - (SubTotal * 0.10) AS SubTotalAfterDiscount
FROM Sales.SalesOrderHeader

We can also use functions such as FLOOR, CEILING and ROUND to further manipulate our data.

Hide Copy Code

SELECT
SalesOrderID,
CustomerID,
SubTotal,
FLOOR(SubTotal) AS SubTotalRoundedDown,
CEILING(SubTotal) AS SubTotalRoundedUp,
ROUND(SubTotal, 2) AS RoundedToTwoDecimals
FROM Sales.SalesOrderHeader

There are much more functions that can be used to manipulate data in various ways. I cannot
possibly discuss them all here. The following sections will give an overview of some important and
most used functions for manipulating data in SQL Server.

9.1 Converting types; CAST and CONVERT, PARSE and FORMAT


9.1.1 CAST and CONVERT

Casting and converting values are the same thing, being changing a datatype to another datatype.
For example, we can change a numeric into a string, a string into a numeric (given that the text
actually represents a numeric value), a date to a string, string to date etc.

SQL Server actually has four functions for casting and converting values. CAST,
CONVERT, TRY_CAST and TRY_CONVERT.

Let us look at the CAST function first. First of all, you can cast anything to VARCHAR(x), since each
and every value can be shown as plain text. Here is a simple example of casting some values to
different types.

Hide Copy Code

SELECT
CAST('123' AS INT) AS VarcharToInt,
CAST('20131231' AS DATETIME2) AS VarcharToDateTime2,
CAST(1.2 AS INT) AS FloatToInt,
CAST(1234 AS VARCHAR(4)) AS IntToVarchar

We can cast the SubTotal to a VARCHAR(20) and order it not as numerics, but as alphanumerics
(which means 10 comes before 2 etc.).

Hide Copy Code

SELECT
SalesOrderID,
SubTotal,
CAST(SubTotal AS VARCHAR(20)) AS SubTotalAsAlphaNum
FROM Sales.SalesOrderHeader
ORDER BY SubTotalAsAlphaNum

Unfortunately SQL Server formats the results for us and the alphanumeric outcome has only two
digits after the comma. With CONVERT we have more control over the formatting of values.

The CONVERT function takes three parameters. The first parameter is the type you want to convert to,
the second is the value you want to convert and the optional third is a style parameter. In this case
we want our money to have style number 2.

Hide Copy Code

SELECT
SalesOrderID,
SubTotal,
CONVERT(VARCHAR(20), SubTotal, 2) AS SubTotalAsAlphaNum
FROM Sales.SalesOrderHeader
ORDER BY SubTotalAsAlphaNum
How did I know the format? Look them up on the CAST and CONVERT page on TechNet.

With CONVERT we can cast a DATETIME value to a VARCHAR and use the style parameter to display a
local date format.

Hide Copy Code

SELECT
SalesOrderID,
OrderDate,
CONVERT(VARCHAR, OrderDate, 13) AS OrderDateAsEur,
CONVERT(VARCHAR, OrderDate, 101) AS OrderDateAsUS
FROM Sales.SalesOrderHeader

And of course you can convert VARCHARS to DATETIMES. To do this you should always use the text
format yyyyMMdd as shown in the following query. Every other format is not guaranteed to be
culture invariant and running a query in Europe with result july 6th might have the result of june 7th
in America!

Hide Copy Code

SELECT CONVERT(DATETIME, '20131231')

Some casts may actually not be what you expect. For example, you can cast a DATETIME value to
an INT. The value that is returned is actually the difference in days between the specified value and
the minimum date for the SMALLDATETIME type (which is januari first 1900). The following query
shows this.
The following functions are discussed later in this article, but for now focus on the results.

Hide Copy Code

SELECT CONVERT(INT, CONVERT(DATETIME, '17530101')) AS MinDateAsInt,


CONVERT(INT, GETDATE()) AS TodayAsInt,
DATEADD(d, CONVERT(INT,
CONVERT(DATETIME, '17530101')), '19000101')
AS MinSmallDateTimePlusMinDateAsInt,
DATEADD(d, CONVERT(INT, GETDATE()), '19000101')
AS MinSmallDateTimePlusTodayAsInt

And here is the result (your results will look different since I am using the current datetime and the
time at which I write this is different than the time at which you are reading).

So what happens when we cast a value that cannot be cast to the specified type? In that case we
make an invalid cast and an exception is thrown.
If you do not want an exception to be thrown at an invalid cast you can use
the TRY_CAST and TRY_CONVERTfunctions. They work exactly the same as CAST and CONVERT,
except that the TRY_ variants do not throw an error when a cast is invalid, but returns NULL instead.

Hide Copy Code

SELECT TRY_CAST('Hello' AS INT),


TRY_CONVERT(INT, 'Hello')

SQL Server might give you a warning that TRY_CAST is not recognized as a built-in function name.
This appears to be a bug and you can ignore it, the query will run fine.

9.1.2 PARSE

Parsing is a special kind of cast which always casts a VARCHAR value into another datatype. In SQL
Server we can use the PARSE or TRY_PARSE function which takes as parameters a VARCHAR value, a
datetype and an optional culture code to specify in which culture format the value is formatted. We
can for example parse a VARCHAR value that represents a date formatted to Dutch standards into
a DATETIME value.

Hide Copy Code

SELECT PARSE('12-31-2013' AS DATETIME2 USING 'en-US') AS USDate,


PARSE('31-12-2013' AS DATETIME2 USING 'nl-NL') AS DutchDate

We can also use PARSE for numeric or money types. The following example shows how two
differently formatted money styles produce the same output with PARSE. Notice that Americans use
a point as decimal seperator while Dutch use a comma.

Hide Copy Code

SELECT PARSE('$123.45' AS MONEY USING 'en-US') AS USMoney,


PARSE('€123,45' AS MONEY USING 'nl-NL') AS DutchMoney

If we ommit the currency symbol and the culture info we actually get very different results!

Hide Copy Code

SELECT PARSE('123.45' AS MONEY) AS USMoney,


PARSE('123,45' AS MONEY) AS DutchMoney

Of course a parse can also fail. As with CAST and CONVERT we get an error.

And again you can use TRY_PARSE to return NULL if a PARSE fails.

Hide Copy Code

SELECT TRY_PARSE('Hello' AS MONEY USING 'nl-NL')

It is recommended to use PARSE only to parse date and numeric values represented as text to their
corresponding datatypes. For more general casting use CAST and CONVERT.

9.1.3 FORMAT

The FORMAT function does not really provide a means to convert between datatypes. Instead it
provides a way to output data in a given format.

For example, we can format dates to only show the date without time or we can format numerics to
show leading 0's and always x decimal digits.

Hide Copy Code

SELECT
SalesOrderID,
FORMAT(SalesOrderID, 'SO0') AS SalesOrderNumber,
CustomerID,
FORMAT(CustomerID, '0.00') AS CustomerIDAsDecimal,
OrderDate,
FORMAT(OrderDate, 'dd-MM-yy') AS FormattedOrderDate
FROM Sales.SalesOrderHeader

And you can specify cultures to format to that specified culture.

Hide Copy Code

SELECT
SalesOrderID,
OrderDate,
FORMAT(OrderDate, 'd', 'en-US') AS USShortDate,
FORMAT(OrderDate, 'd', 'nl-NL') AS DutchShortDate,
FORMAT(OrderDate, 'D', 'en-US') AS USLongDate,
FORMAT(OrderDate, 'D', 'nl-NL') AS DutchLongDate
FROM Sales.SalesOrderHeader
ORDER BY CustomerID

And here are some results.

And of course we can format numeric values as well.

Hide Copy Code

SELECT
SalesOrderID,
SubTotal,
FORMAT(SubTotal, 'C', 'nl-NL') AS DutchCurrency
FROM Sales.SalesOrderHeader

You might be wondering if there is a <code>TRY_FORMAT. There is not. Ask yourself, why should a
format fail? It probably doesn't. It might format to unexpected values, but other than that invalid
values can be caught when SQL Server parses the query. Here is an example of what you think might
go wrong, but actually just formats to a weird value.

Hide Copy Code

SELECT
SalesOrderID,
SubTotal,
FORMAT(SubTotal, 'Hello', 'nl-NL') AS DutchCurrency
FROM Sales.SalesOrderHeader

For format values I once again redirect you to the FORMAT page on TechNet.

9.2 VARCHAR functions

You will be working with the (VAR)CHAR type a lot. In the previous section we have
seen FORMAT which can be used to get a specific, culture dependent, output for certain values. But
that is not all you can do with text data.
First of all, there are many times that you want to concatenate text. The Person table, for example,
has a FirstName and a LastName column. Those combined, seperated by a space, can become
a FullName field.

Hide Copy Code

SELECT
BusinessEntityID,
FirstName + ' ' + LastName AS FullName
FROM Person.Person
ORDER BY FullName

When concatenating a string to NULL this results in NULL unless a session option
called CONCAT_NULL_YIELDS_NULL_INPUT is turned off. This is outside the scope of this article.

A word of caution on concatenation. Make sure you are either concatenating text to text or numerics
to numerics. The following concatenation results in an error.

Hide Copy Code

SELECT
BusinessEntityID,
BusinessEntityID + FirstName AS IDName
FROM Person.Person
ORDER BY IDName

And here is the error.

SQL Server tries to convert FirstName to an INT, since BusinessEntityID is an INT. Switching
the values will not help either. What will help is a CAST.

Hide Copy Code

SELECT
BusinessEntityID,
CAST(BusinessEntityID AS VARCHAR(6)) + FirstName AS IDName
FROM Person.Person
ORDER BY IDName

Or you could use the CONCAT function, which concatenates all values passed to the function as
strings. When using CONCAT NULLs are ignored.

Hide Copy Code


SELECT
BusinessEntityID,
CONCAT(BusinessEntityID, FirstName) AS IDName,
CONCAT(NULL, LastName) AS LastName
FROM Person.Person
ORDER BY IDName

Other times you only want to return a portion of a string. For example the first or last letter. This can
be achieved with the LEFT and RIGHT functions.

Hide Copy Code

SELECT
BusinessEntityID,
LEFT(FirstName, 1) + '. ' + LastName AS AbbrvName,
RIGHT(FirstName, 3) AS LastThreeLetters
FROM Person.Person
ORDER BY AbbrvName

You can also use SUBSTRING to get a portion of a string by (1-based) index. The following query
outputs the same data as the last query. Notice I also use the LEN function to determine the length
of a given input.

Hide Copy Code

SELECT
BusinessEntityID,
SUBSTRING(FirstName, 1, 1) + '. ' + LastName AS AbbrvName,
SUBSTRING(FirstName, LEN(FirstName) - 2, 3) AS LastThreeLetters
FROM Person.Person
ORDER BY AbbrvName

To get the index of a specific character in a string you can use the CHARINDEX function which
returns the position of the first occurrence of the specified character. Using this we can, for example,
format a numeric value and return the portion before the decimal seperator and the portion after the
decimal seperator seperately.

Hide Copy Code

SELECT
SubTotal,
SUBSTRING(
FORMAT(SubTotal, 'G', 'en-US'),
0,
CHARINDEX('.', FORMAT(SubTotal, 'G', 'en-US'))) AS DigitsBeforeDecimal,
SUBSTRING(
FORMAT(SubTotal, 'G', 'en-US'),
CHARINDEX('.', FORMAT(SubTotal, 'G', 'en-US')) + 1,
4) AS DigitsAfterDecimal
FROM Sales.SalesOrderHeader
Sometimes you want to format strings in a way that is not supported by the FORMAT function. You
can use the functions UPPER and LOWER to make a string all uppercase or all lowercase. The
functions LTRIM and RTRIMremove leading and trailing spaces from a string (especially useful when
dealing with legacy applications!).

Hide Copy Code

SELECT
UPPER(FirstName) AS UpperName,
LOWER(FirstName) AS LowerName,
LTRIM(' abc ') AS AbcWTrailing,
RTRIM(' abc ') AS AbcWLeading,
LTRIM(RTRIM(' abc ')) AS Abc
FROM Person.Person

There are a few more useful functions that you can use to alter strings. With REPLACE you can
replace a character or a substring of a string with another character or string. With STUFF you can
replace a part of a string based on index. With REVERSE you can, of course, reverse a string. In the
following example we revert the SalesOrderNumber, we replace the 'SO' in
the SalesOrderNumber with 'SALE', and we replace the first two characters of
the PurchaseOrderNumber with 'PURC'.

Hide Copy Code

SELECT
SalesOrderNumber,
REVERSE(SalesOrderNumber) AS ReversedOrderNumber,
REPLACE(SalesOrderNumber, 'SO', 'SALE') AS NewOrderFormat,
PurchaseOrderNumber,
STUFF(PurchaseOrderNumber, 1, 2, 'PURC') AS NewPurchaseFormat
FROM Sales.SalesOrderHeader

There are more functions you can use to format, alter or get information about string values. You can
find them on TechNet.

9.3 DATETIME functions

Working with dates and time in SQL Server (or any language) has never been easy. There is no date
without time, no time without date, comparisons fail if two values differ by a millisecond, every
culture has its own formats, we have timezones, daylight savings time, and not even every culture
has the same calendar! Luckily SQL Server provides us with lots of functions (and datatypes) to work
with dates and time.

I can recommend reading the following page on TechNet: Date and Time Data Types and Functions.

First of all, how do we get the current time? SQL Server has a couple of functions you can
use. GETDATE and CURRENT_TIMESTAMP to get the current date and time (on the computer that the
SQL Server instance is running on) as datatype DATETIME, GETUTCDATE gets the current date and
time in Coordinated Universal Time as datatype DATETIME, SYSDATETIME which also returns the
current date and time, but as datatype DATETIME2(7), SYSUTCDATETIME which returns the current
date and time as Coordinated Universal Time as datatype DATETIME2(7) and
the SYSDATETIMEOFFSET which returns the current date and time including the timezone offset as
datatype DATETIMEOFFSET(7). The following query shows these functions.

Hide Copy Code

SELECT
GETDATE() AS [GetDate],
CURRENT_TIMESTAMP AS CurrentTimestamp,
GETUTCDATE() AS [GetUtcDate],
SYSDATETIME() AS [SysDateTime],
SYSUTCDATETIME() AS [SysUtcDateTime],
SYSDATETIMEOFFSET() AS [SysDateTimeOffset]

To get a date without the time part simply convert a DATETIME value to DATE. Similary, if you want
the time without the date you can cast to TIME.

Hide Copy Code

SELECT
SYSDATETIME() AS DateAndTime,
CAST(SYSDATETIME() AS DATE) AS [Date],
CAST(SYSDATETIME() AS TIME) AS [Time]

You may also be interested in only a part of the date, for example the day, month or year. You can
use the DATEPART function for this, or the 'shortcut' functions YEAR, MONTH and DAY. Notice that
you can actually extract a lot more using DATEPART. In addition there is a DATENAME function which
works the same as DATEPART, except it returns the part of the date as string. DATENAME is especially
useful for returning the name of the month. Be aware that the name of the month is translated in the
language of your session.

Hide Copy Code

SELECT
DATEPART(DAY, SYSDATETIME()) AS DayFromDatePart,
DATEPART(WEEK, SYSDATETIME()) AS WeekFromDatePart,
DATEPART(MONTH, SYSDATETIME()) AS MonthFromDatePart,
DATEPART(YEAR, SYSDATETIME()) AS YearFromDatePart,
DATEPART(SECOND, SYSDATETIME()) AS SecondFromDatePart,
DATEPART(NANOSECOND, SYSDATETIME()) AS NanoSecondFromDatePart,
DAY(SYSDATETIME()) AS DayFromFunc,
MONTH(SYSDATETIME()) AS MonthFromFunc,
YEAR(SYSDATETIME()) AS YearFromFunc,
DATENAME(DAY, SYSDATETIME()) AS DayFromDateName,
DATENAME(MONTH, SYSDATETIME()) AS MonthFromDateName,
DATENAME(YEAR, SYSDATETIME()) AS YearFromDateName

Sometimes you want to add specific intervals to dates. For example, when an order is placed today
the latest delivery date is seven days ahead. Or when an item is not in stock it may take up to a
month. To add or subtract dates you can use the DATEADD function. In the following example I
remove the time part by casting to date.

Hide Copy Code

SELECT
DATEADD(DAY, -1, CAST(SYSDATETIME() AS DATE)) AS PreviousDay,
DATEADD(DAY, 1, CAST(SYSDATETIME() AS DATE)) AS NextDay,
DATEADD(WEEK, 1, CAST(SYSDATETIME() AS DATE)) AS NextWeek,
DATEADD(MONTH, 1, CAST(SYSDATETIME() AS DATE)) AS NextMonth,
DATEADD(YEAR, 1, CAST(SYSDATETIME() AS DATE)) AS NextYear

You can also get the difference between two dates. For example, we want to know the difference in
days between the order date and the delivery date of an order in the SalesOrderHeader table. This
can be accomplished by using the DATEDIFF function.

Hide Copy Code

SELECT
OrderDate,
ShipDate,
DATEDIFF(DAY, OrderDate, ShipDate) AS DiffBetweenOrderAndShipDate
FROM Sales.SalesOrderHeader
ORDER BY DiffBetweenOrderAndShipDate DESC

Be aware that the DATEDIFF function only looks at the part you want to know the difference of. So
when you want the difference in years the function only looks at the year part of the dates. So the
result of the next query is 1 for day, month and year, even though the difference between the dates
is really just one day.

Hide Copy Code

SELECT
DATEDIFF(DAY, '20131231', '20140101') AS DiffInDays,
DATEDIFF(MONTH, '20131231', '20140101') AS DiffInMonths,
DATEDIFF(YEAR, '20131231', '20140101') AS DiffInYears

As mentioned we live in a world divided in timezones. With the SWITCHOFFSET function you can
display a date in any given offset, no matter what timezone you are currently in. For example I live in
the Netherlands, which is UTC/GMT+1, now I want the Hawaiin time, which is UTC/GMT-10 and the
time in Sydney which is UTC/GMT+10, or +11 when it's daylight savings time. Unfortunately there is
no easy way to correct for daylight savings time, so you may figure that out by yourself (and Google).
The following query shows the local time, the time in Sydney (not corrected for daylight savings) and
the time in Hawaii.

Hide Copy Code

SELECT
SYSDATETIMEOFFSET() AS LocalTime,
SWITCHOFFSET(SYSDATETIMEOFFSET(), '+10:00') AS SydneyTime,
SWITCHOFFSET(SYSDATETIMEOFFSET(), '-10:00') AS HawaiianTime

So far we have only constructed dates with strings in a specific format or by calling a function that
returns the current date. There are also a couple of functions that can construct date values from
various date parts. These functions
are DATEFROMPARTS, DATETIME2FROMPARTS, DATETIMEFROMPARTS, DATETIMEOFFSETFROMPAR
TS, SMALLDATETIMEFROMPARTS and TIMEFROMPARTS. The function names describe what they do
pretty well, so I will not expand on that further. Here are some examples of the usages of various
function.

Hide Copy Code

SELECT
DATEFROMPARTS(2013, 12, 31) AS [DateFromParts],
DATETIME2FROMPARTS(2013, 12, 31, 14, 30, 0, 0, 0) AS [DateTime2FromParts],
DATETIMEOFFSETFROMPARTS(2013, 12, 31, 14, 30, 0, 0, 1, 0, 0) AS [DateTimeOffsetFromParts],
TIMEFROMPARTS(14, 30, 0, 0, 0) AS [TimeFromParts]

SQL Server has more useful functions that you can use when working with dates and time. One such
functions is EOMONTH, which returns a date representing the last day of the month of the date(time)
that was passed as a parameter. Another is ISDATE which checks if a string can be converted to a
valid date. You can find those and others on TechNet.

I recently came across a rather nice CP article explaining all there is to know about dates, times and
functions in all versions of SQL Server. Recommended reading: Date and Time Date Types and
Functions - SQL Server (2000, 2005, 2008, 2008 R2, 2012)

9.4 CASE and IIF

9.4.1 CASE

Sometimes you want to return a value based on another value. For example, when a bit is 1 or true
you want to return 'Yes' and else 'No'. Or you want to include a value to the result only when that
value is not empty. With CASE such scenario's become possible. With CASE you can either test a
column for a value and return another value based on that value or you can include more advanced
criteria for testing which value to show.

Let us look at the first CASE variant, the simple form. We know that a Person from the Person table
can have the title 'Mr.' or 'Mrs.', 'Ms.' or 'Ms'. Instead of these values we want to
return 'Mister' for 'Mr.' and 'Miss' for all the 'Ms.' variants. If the title is something else,
like 'Sr.' then we want to show that.

Hide Copy Code

SELECT
BusinessEntityID,
CASE Title
WHEN 'Mr.' THEN 'Mister'
WHEN 'Mrs.' THEN 'Miss'
WHEN 'Ms.' THEN 'Miss'
WHEN 'Ms' THEN 'Miss'
ELSE Title
END AS Salutation,
FirstName,
LastName
FROM Person.Person

So the simple CASE statement has an input expression, in this case Title, which is compared to
multiple values defined in the WHEN clauses. If a match is found the value in the THEN clause is
returned. If no match is found the value in the ELSE clause is returned. When the value was not
matched in any WHEN clause and no ELSE clause is specified a NULL is returned.

Another variant on the CASE expression is the searched form. With the searched form of
the CASE expression we have more flexibility in when clauses. We can now use predicates to test for
a certain criterium. The first WHENclause that returns true determines what value is returned. The
following query shows how you can use the searched CASE expression and also offers an alternative
using CONCAT.

Hide Copy Code

SELECT
BusinessEntityID,
CASE
WHEN Title IS NULL AND MiddleName IS NULL
THEN FirstName + ' ' + LastName
WHEN Title IS NULL AND MiddleName IS NOT NULL
THEN FirstName + ' ' + MiddleName + ' ' + LastName
WHEN Title IS NOT NULL AND MiddleName IS NULL
THEN Title + ' ' + FirstName + ' ' + LastName
ELSE Title + ' ' + FirstName + ' ' + MiddleName + ' ' + LastName
END AS FullNameAndTitle,
CONCAT(Title + ' ', FirstName, ' ', MiddleName + ' ', LastName) AS FullNameAndTitleConcat
FROM Person.Person
ORDER BY FullNameAndTitle

Here is another example which we cannot write using other functions.

Hide Copy Code

SELECT
SalesOrderID,
CustomerID,
SubTotal,
CASE
WHEN SubTotal < 100
THEN 'Very cheap order'
WHEN SubTotal < 1000
THEN 'Cheap order'
WHEN SubTotal < 5000
THEN 'Moderate order'
WHEN SubTotal < 10000
THEN 'Expensive order'
ELSE 'Very expensive order'
END AS OrderType
FROM Sales.SalesOrderHeader

Notice that in the second WHEN clause we do not have to check if the order is more expensive than
100. If the first WHEN clause returns true then the value in the corresponding THEN clause is returned
and the subsequent WHENclauses are not evaluated.

9.4.2 IIF

Sometimes all you want to know is if a certain attribute has a value or if it is NULL and return a value
based on that predicate. Using a CASE expression can make your query rather wordy and it would be
nice if we had a shortcut. Well, we have. With IIF you can test a predicate and specify a value if it
evaluates to true and a value if it evaluates to false. The following example shows how IIF is used
and can replace a CASE expression.

Hide Copy Code

SELECT
BusinessEntityID,
CASE
WHEN Title IS NULL THEN 'No title'
ELSE Title
END AS TitleCase,
IIF(Title IS NULL, 'No title', Title) AS TitleIIF,
FirstName,
LastName
FROM Person.Person

And of course other types of predicates can be used as well.

Hide Copy Code

SELECT
SalesOrderID,
CustomerID,
SubTotal,
IIF(SubTotal > 5000, 'Expensive order', 'Not so expensive order')
AS OrderType
FROM Sales.SalesOrderHeader

And even the following.

Hide Copy Code

SELECT
BusinessEntityID,
FirstName,
LastName,
IIF(EXISTS(SELECT *
FROM Sales.SalesOrderHeader AS s
JOIN Sales.Customer AS c ON c.CustomerID = s.CustomerID
WHERE c.PersonID = BusinessEntityID),
'Has orders', 'Does not have orders')
FROM Person.Person

9.5 COALESCE, ISNULL and NULLIF

With COALESCE we can specify a range of values and the first value that is not NULL is returned. It
can actually make our IIF that checks for a NULL from the previous section even shorter.

Hide Copy Code

SELECT
BusinessEntityID,
COALESCE(Title, 'No title'),
FirstName,
LastName
FROM Person.Person

More values can be specified.

Hide Copy Code

SELECT
ProductID,
Name,
ProductNumber,
COALESCE(Style, Class, ProductLine) AS Style
FROM Production.Product

Of course now we do not know if the result represents a style, class or productline, but you can fix
that by adding a CASE expression.

COALESCE returns NULL if all values that were passed to it are NULLs.

ISNULL does the same as COALESCE, but with some differences. The first difference is
that ISNULL can only have two values. So if the first value is NULL it will return the second value
(which may also be NULL).

Hide Copy Code

SELECT
BusinessEntityID,
ISNULL(Title, 'No title'),
FirstName,
LastName
FROM Person.Person
And you can nest ISNULL to get the same effect as COALESCE.

Hide Copy Code

SELECT
ProductID,
Name,
ProductNumber,
ISNULL(Style, ISNULL(Class, ProductLine)) AS Style
FROM Production.Product

So why would you choose one over the other? Well, COALESCE is an ANSI SQL standard function, so
it is more portable than ISNULL. The more important difference, however, is the return type of the
two functions. The type that COALESCE returns is determined by the returned element, while
for ISNULL the type is determined by the first element. In the following query the returned value
of ISNULL is truncated to fit the type of the first element. COALESCE keeps the value intact.

Hide Copy Code

DECLARE @first AS VARCHAR(4) = NULL


DECLARE @second AS VARCHAR(5) = 'Hello'
SELECT
COALESCE(@first, @second) AS [Coalesce],
ISNULL(@first, @second) AS [IsNull]

And the result.

Another difference between the two is that the underlying type of COALESCE is always NULLABLE,
even when a NULL can never be returned. ISNULL recognizes scenario's where NULL is never
returned and gives the underlying value definition the NOT NULLABLE attribute. This difference can
be important when you are creating VIEWS or STORED PROCEDURES.

Though it is not within the scope of this article I want to show the difference just to make it clear.
I have created a view using the following definition.

Hide Copy Code

CREATE VIEW dbo.CoalesceVsIsNull


AS
SELECT
BusinessEntityID,
COALESCE(Title, 'No title') AS TitleIsCoalesce,
ISNULL(Title, 'No title') AS TitleIsNull,
FirstName,
LastName
FROM Person.Person
And here is the views column definition.

As you can see TitleIsCoalesce can contain NULLs even though this is
impossible. TitleIsNull will never have NULLs. If you have created the VIEW you can now delete it
using the following command.

Hide Copy Code

DROP VIEW dbo.CoalesceVsIsNull

Another thing you should be aware of when working with COALESCE, ISNULL or CASE is that every
returned value should have the same data type or conversion errors may occur. For example, the
following query raises an error because the @second parameter is going to be converted to an INT.

Hide Copy Code

DECLARE @first AS INT = NULL


DECLARE @second AS VARCHAR(5) = 'Hello'
SELECT
ISNULL(@first, @second)

A last function I want to mention is NULLIF. This function takes two parameters and returns the first
value if the values are different or NULL if the values are equal.

Hide Copy Code

SELECT
NULLIF(1, 1) AS Equal,
NULLIF(1, 2) AS NotEqual

You might also like