Introduction To Transaction SQL

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Introduction to Transaction SQL

This is intended to be a complete reference of best practices and technique of T-SQL


and its many uses, and will give you a flavor for where it came from and how it can be
used. We will begin by covering the basics of the language. If you're looking for a
particular topic you might want to skip to the relevant chapter, but to really make use of
took you need a solid foundation to build from.
We will cover a lot of topics and answer a number of questions about SQL Server 2000
and Transact SQL in the next few pages. We will begin with some history and build into
some practical applications of the language. Specifically in this chapter we will cover
topics including:

What is it and where it came from

When and why you will use it

How to use some of the specific features that make this language so unique and

powerful
What you can do with it

What is it and where did it come from?


Since the advent of the microprocessor and the widespread use of the PC in the early 1980's, affordable
technology has finally caught up with the theory of relational databases. Introduced in 1970
by E.F. Codd as a way to manage large amounts of information by using a language based on what he
called "Relational Calculus", the relational database has become central to our civilization. Information
about our daily lives, what we buy, where we shop, the banks we use, the cars we drive, all the
information we need to be able to transact business is kept in relational databases.
Fortunately someone had the good idea to standardize how we interact with these systems, and the
Structured Query Language was born. Originally called SEQUEL (Structured English QUEry Language),it
was developed by Microsoft research in support of an experimental relational database called System R
that eventually morphed into DB2. It defines the access paths, not the physical implementation, for
managing the information and data structures. Since the 1980's the American National Standards Institute
(ANSI) has developed criteria for certifying database engines, like SQL-86, SQL-92 and SQL-99 are
instances of those criteria that have been released. The current specification is over 2000 pages in length
and is available at a cost from ANSIs web site (www.ansi.org) or from the International Standards
Organization ISO (www.iso.ch). Software manufacturers use these standard to drive the feature set they
include to ensure that their product is certifiably ANSI compliant.
The Transact SQL Language is a superset of the ANSI specifications, meaning that it not only
implements the required functionality, but it provides much more. It includes a number of programming
extensions including variable handling, flow control, looping, and error handling, that allow the developer
to write sophisticated logic to be executed on the server. In addition, the SQL language includes facilities
for controlling access to the data, declaring relational integrity (DRI), and defining business rules to be
enforced at the server level.

A brief history of Microsoft SQL Server


Microsoft came to be in the relational database arena in a circuitous route that started as an offshoot from
a joint development effort with IBM do create the next generation of Operating Systems O/S 2. In 1985
the Intel 80286 processor was released and promised to provide release from the constraints of the
640KB memory space and a 8 bit architecture of the 8088 processor, and a new operating system was
needed to compete with the recently released Macintosh. Shortly after the companies committed to work
together, IBM announced they were developing a special higher end version of O/S 2 that would include a
Database Manager that would be partially compatible with DB/2 on the Mainframe and a network
manager based on the Systems Network Architecture (SNA) protocol. This left Microsoft holding the bag
because who would buy the standard version?
At the time, most of the PC based desktop databases were single user flat file systems that was
dominated by products like dBASE, Paradox and Rbase. In order to compete with IBM, Microsoft turned
to Sybase, who was a startup company in 1987 that was trying to establish itself in the database market
competing with Informix and Oracle. They had a product called DataServer that they were developing for
the mid-range UNIX based computers. By convincing Sybase that PCs would be fertile ground for
growing future customers of their midrange product, Microsoft convinced Sybase to partner with them to
port their product to O/S 2. And to add more fodder to the fire in this web of alliances and counter
alliances, Microsoft went the additional step of convincing Ashton-Tate to support this new multi-user
product and to develop future versions of dBASE to act as a client to the new Server based product. In
the fall of 1988, the 3 companies announced their new product called Ashton-Tate/Microsoft SQL Server
for O/S 2. The first beta of the product shipped that fall and version 1.0 was released in 1989.
By 1990 the relationship with Ashton-Tate wasnt working. The new version of their product was late and
had other problems as well. The different paradigm to developing for a multi-user environment with SQL
and set based processing and that for a single user record at a time approach of dBASE showed the
marriage wouldnt last. In the time since the arrangement was announced the companies had found that
their priorities had shifted, and Microsoft and Sybase decided to break with Ashton-Tate and released
Microsoft SQL Server 1.1 in 1991.
A significant feature of note is that release 1.1 included support for the new client platform Windows 3.0.
SQL Server was one of the first applications that provided a direct interface to develop applications
against it using dynamic link libraries (DLL) calls. Version 1.11 included some bug fixes, but was still
primarily a Sybase developed and supported product. Microsoft was dependent on Sybase to fix bugs
and support customers with their issues. Sybase at the time was rapidly growing their mid-range market
and found their priorities shifting more towards that market. Microsoft did eventually get permission for
read-only access to the source code, but all changes had to be coordinated with Sybase in order that they
could keep the various versions in sync.
Release 1.0 for Microsoft SQL Server corresponded with the code base for Sybases release 3.0 on other
platforms, and after a couple years of adding new features and functionality, Sybase was planning to
release version 4.0. This new version would include support for stored procedures, unions, backup
devices among other things.
As some may remember, Windows 3.0 was a 16 bit operating system, but the new version of O/S 2 v2.0
would be 32 bit. Unfortunately IBM was running into problems developing a stable 32 bit product that they
could release in a timely fashion. As a result, version 4.2 of Microsoft SQL server was released for 16 bit
O/S 2 v1.3 in 1992.

At about this time, Microsoft realized that they needed to break from IBM and build their own 32 bit
operating system, which they called Windows NT. It would include a symmetric multiprocessing
environment with asynchronous I/O. The new operating system would be multi-threaded and would
provide standard facilities for logging and management of the processes. Microsoft decided to bet the
farm on this new platform and moved to make SQL Server one of the first applications for it when
Windows NT was released to the public in July of 1993. Within 3 months Microsoft SQL Server 4.2 for NT
was shipping to the general public.
By the end of 1993, the relationship between Microsoft and Sybase had changed as the companys core
focus had evolved from startup to maturity in the high tech industry. The new Windows NT platform was
proving to be an alternative to the mini-computer UNIX based environment not only based on
performance, but also on ease of use. The last release of SQL Server 4.2 was the last time that Microsoft
and Sybase would share code. Since 1991 Microsoft had been slowly growing their stable of
development expertise to be able stand on their own supporting and evolving the SQL Server product. In
April of 1994, the companies announced the end of their relationship and each company went on to
develop the product line exclusively for their respective platforms.
By the end of 1994 Microsoft began putting together plans for the next version of SQL Server. Originally
called SQL95 but later released as Version 6.0 it would include support for scrollable cursors, database
replication, better performance, a greatly expanded library of system stored procedures, and the inclusion
of a better database management tool. For the first time, this new release would be developed entirely by
Microsoft. The SQL Server team at Microsoft delivered it on time and it was a great success in the
marketplace..
Version 6.5 came out 10 months later and included support for the Internet, data warehousing and
distributed transactions. Another key event was that it gained certification as ANSI SQL compliant. SQL
Server 7.0 brought us full row level locking, a new query processing component that became Microsoft
Data Engine (MSDE) that supports distributed queries and a more efficient scheme for processing ad-hoc
queries. SQL Server 2000 added user defined functions, new data types, and a host of other new
features we will cover later in this chapter.
The list of enhancements in SQL Server 2000 includes
With each release, SQL Server has expanded its functionality and improved its performance as an
enterprise level database engine. While there may be some who still try to deny it, Microsoft has
established themselves as a serious player in the DBMS industry and has built a core competency at
taking a good product and making it better.

What can we do with it?


The Structured Query Language (SQL) is the primary way of interacting with most relational database
management systems. SQL Server is no exception. While we assume you have a base level of
understanding, we will quickly cover the basics of the language before we get into how SQL Server has
extended it to provide the developer with better ways of using the database. If you find that this
discussion goes too fast, there are several good books available including Robin Dewson's book 'SQL
Server 2000 Programming' and 'Beginning SQL Programming' ISBN 1-861001-80-0 that will give you a
more complete introduction to the language.
SQL is the interface for managing information in relational databases. With it we tell the database what
we want to select, insert, update and delete. Each of these commands follows a standard syntax and use

various clauses to determine exactly which rows are affected. Taken together, these form what is referred
to as the Data Manipulation Language or DML.

Data Manipulation Language (DML)


Data Manipulation Language (DML) provides the language constructs for selecting information from the
tables, adding, modifying, and deleting rows of data, and generally providing a standardized access
method to the data. Any statement that adds, modifies or views information stored in the database is
considered part of the Data Manipulation Language. There are 4 basic statements that make up the
bulwark of this category, including SELECT, INSERT, UPDATE, and DELETE.

The SELECT statement


One of the first uses of a relational database any developer will have is learning how to retrieve
information from the database. A basic SELECT
statement consists of a series of clauses that define what information we are interested in, and how we
want to view it. These include a list of columns to include in the result set, the names of the tables
involved, filter criteria that specify how to limit what we are looking for, and other optional clauses which
further refine the results.
Select
, , ...
From ...
Where ...
For example to select
a list of names from the supplier table in the Northwind database use the following command:
SELECT
SupplierID, CompanyName, ContactName, *
FROM Suppliers
WHERE Country <> 'USA'
ORDER BY Country
This is pretty basic stuff in that it returns a simple list of suppliers. The SELECT
clause tells the query engine what columns we are interested in. The * is a wildcard that indicates we
want all columns in the table to follow the ones that we specified. The FROM clause indicates where the
information is stored, and the WHERE clause filters it down to a subset of the whole table. This is pretty
basic stuff that is typical of most relational databases. The more advanced dbms's give us the ability go
quite a bit further in refining our queries. Suppose we wanted to find out how many suppliers were from
each country. The ANSI SQL language specifies some basic aggregation and summarization functionality
that will run unchanged on most databases:
SELECT
Country, COUNT(*) as 'COUNT'
FROM Suppliers
GROUP BY Country
ORDER BY 'COUNT' DESC
We've added a GROUP BY clause which tells the database how we want to aggregate our results, and
an ORDER BY clause to sort it by the countries where we find the most suppliers. Transact SQL provides

14 aggregate, 23 string, 9 date and time, and 23 mathematical functions out of the box in addition to
several other system and state functions that we can use in developing our code. In addition to the many
built in functions, Transact SQL also provides a number of operators that are worth mention.

Distinct
Eliminating duplicate records from our rowset can be done within our query by specifying the DISTINCT
keyword in our SELECT clause. It will cause the query engine to return only unique rows from our query.
SELECT
DISTINCT Country
FROM Suppliers

Top
The TOP operator is Microsoft extension in Transact SQL that allows us to limit the number of rows we
want in our result set. After the query has been processed, the result set is limited to the number of rows
specified. We can either use a hard number like 10, or include the PERCENT keyword to indicate that we
want to see only a percentage of the full results set.
SELECT
TOP 10 ProductID, Sum(UnitPrice * Quantity) as Sales
FROM [Order Details]
GROUP BY ProductID
ORDER BY Sales
Trying to accomplish the same results on other database platforms requires much more work and
creativity on the part of the developer!

Subqueries
A subquery is a complete select
statement that returns a single value and is nested within another SQL statement in place of an
expression. We can use the subquery anywhere that we can use an expression, whether it is part of the
SELECT
clause or WHERE clause. We can use the subquery to perform lookups on related tables, or to evaluate
a condition. We might need to select the orphaned rows which are those records from the child table
which have no parent record. In our Northwind example we need to reduce our supplier count and
eliminate suppliers who have no product sales. One way to return the list of suppliers with no orders using
subqueries is:
SELECT
SupplierID, CompanyName FROM Suppliers S
WHERE NOT EXISTS
(SELECT
'x' FROM [Order Details] D, Products P
WHERE D.ProductID = P.ProductID
AND S.SupplierID = P.SupplierID)

String Concatenation

One of the basic operations performed in the database is to generate human-readable output. Because
the process of normalization will break apart things like addresses and names into multiple parts that
need to be reassembled, string concatenation is an important function to understand. SQL Server's
approach is to use the + operator between character based values to concatenate. For example, if we
wanted to do a report in which we wanted to return a single column with our Employee's name we could
use the following:
SELECT
FirstName + ' ' + LastName FROM Employees
We could, for example, write a SQL Script where the output is another SQL Script. This generates a script
which counts the user tables in the current database.
SELECT
'SELECT
''' + name + ''', count(*) from [' + name + ']'
FROM sysobjects
WHERE type = 'U'
ORDER BY name
One note to developers is that this syntax is specific to SQL Server and doesn't work on other databases
such as Oracle (use two pipe characters || to concatenate two strings). Also you may notice the square
brackets ('[' and ']') in the code. These are necessary when the database schema contains non-standard
schema. In the Northwind database the Order Detail table's name has a space in between Order and
Detail. In order to reference non-standard names we simply surround it with the brackets.

Comments
A quick note on comments. Just as all good code is self-documenting, we may find at times that it is
helpful to others who read our code to find comments from the author explaining what is going on.
Transact SQL recognizes two forms of comments: Single-line and multi-line.
The -- operator indicates that from that point on the line is a comment text and should be ignored by the
query engine. In Query Analyzer, the commented text is colored as a comment (green by default).
SELECT
GetDate() -- This is a comment from here on, I can type whatever I want
Multi-line comments follow the C++ syntax of /* comment text */, where the comment text can cover
multiple lines of text.
/* This is a multi-line comment, none of the statements are ever processed
DECLARE @myDate DATETIME
SET @myDate = GetDate()
PRINT 'The current date is ' & convert(varchar(18), @MyDate, 110)
*/

Cast & Convert


These functions allow you to convert from one data type to another. For example if you have ever worked
on a cross platform database transformation project in which you are importing data from one DBMS to
another, you may have discovered that the date formats recognized by the different vendors is not the
same. In Oracle a valid date field has the format 16 MAR 2004 while in SQL Server the default format is

2004-03-16. If you are exporting data to a comma separated value file to be loaded into Oracle, you
need to use the convert function as:
Select
convert(varchar(12), mydate, 113)
The first argument is the target datatype, the second is the source value, and the third is an optional style
value. The convert statement is also useful if you are transforming data from oen schema to another and
need to concatenate columns but need to restrict the resulting size.
Select
convert(varchar(35), RTRIM(FNAME) + ' ' + RTRIM LNAME))
Another use is if you are working with text or binary data that is larger than the 8000 character limit of the
varchar datatype. SQL Server doesnt allow these types of columns to be used for comparison or sorting
unless you use the LIKE operator. If you had a table with a column called CODE_TAG that was defined
as TEXT, and you wanted to select
all records where it equals BO you would have to use the following syntax:
select
* from codes where convert(varchar(2), CODE_TAG) = BO

Case.
The CASE expression allows you to evaluate a variable or a condition inline and return one of multiple
values as the SQL statement is being run. The basic version takes a source variable or column and
depending on the value returns one of a discrete set of values.
Select
case datepart(month, GetDate())
When 1 then 'January is nice'
When 2 then 'February is cold'
When 3 then 'Spring is coming'
Else 'Its not winter anymore'
End
You can also evaluate Boolean conditions. These are evaluated in order until the conditions are satisfied
and the resulting value is returned.
Select
case
when datepart(month, GetDate()) > 9
then 'Winter is here'
when datepart(hour, GetDate()) > 22
then 'Its getting late'
when convert(varchar(12), getdate(), 113) = '28 NOV 2003'
then 'Everythings on SALE!'
when datepart(weekday, GetDate()) = 6
then 'TGIF!'
Else 'Who knows???'
Case can be used anywhere an expression can, including defining views that dynamically transform data
based on current conditions. For example we could use this technique to implement a 10 hour sale on all

products for the day after Thanksgiving in the Northwind database by creating a view called PriceLookUp
as follows:
CREATE VIEW PriceLookUp AS
Select
ProductID, ProductName, CategoryID, UnitPrice,
Case when getdate() > '28-NOV-2003 08:00' and
getdate() < '28-NOV-2003 18:00'
then UnitPrice * .85
else UnitPrice
end as SalePrice
from Northwind..Products
go
select
GetDate(), UnitPrice, SalePrice, * from PriceLookUp
We could have written a batch and scheduled it to run at the appropriate times to update the sale price,
but that wouldve added a dependency on the scheduler, the amount of time it would take to process all
the rows (mature retailer product tables likely have hundreds of thousands of rows in them).

Coalesce, Nullif, IsNull


Microsoft took the liberty of providing us with commonly used functions to make our lives easier. These
are cousins of the CASE expression and could be coded in that way. The Coalesce function returns the
first non-null expression in a list, otherwise it returns null.
Coalesce (expression1, expression2, ...)
-- this is essentially the same as:
case when expression1 is not null expression1
when expression2 is not null expression2
etc...
For example if we wanted to return the last date that an order status was changed we could use the
following syntax:
select
coalesce (OrderDate, RequiredDate, ShippedDate) as LastUpdate
from orders
NullIf and IsNull operate similarly, the former returning null if a condition is true, and the second an
alternate value if the expression is null.
NullIf(expression1, expression2)
-- NullIf is evaluated as
case when expression1 = expression2 then Null
else expression1
IsNull(expression1, expression2)
-- This is evaluated as
case when expression1 is null then expression 2
else expression1
-- NullIf is evaluated as

case when expression1 = expression2 then Null


else expression1
SQL Books has a complete listing of the system functions and provides you with syntax diagrams for how
to use them.

Joining tables
As the name "Relational" implies, information in well-designed databases is stored in more than a single
table, and much of our work is to develop means for accessing it effectively. When more than one table is
referenced in a query, they are said to be joined. A join is specific criteria which define how two or more
tables are related. Joins come in two flavors, inner and outer. The basic difference is that inner joins will
only return rows in which both tables have columns matching the join criteria.
Inner
Join
Left Outer Join
Suppliers
Supplier ID primary key
CmpyName info col
Country info col
Products
ProdID primary key
ProdName info column
SupplierID foreign key
Units info column
Right Outer Join

An inner join is defined as the union of the set of records in which the SupplierID column in the Suppliers
table has records which match Products SupplierID column. Records must exist in both tables in order for
it to be part of the resulting set of rows returned. The join criteria can be specified as part of the FROM
clause or as part of the WHERE. The two following statements return identical results:
SupplierID
1
2
3

CmpyName
ABC
Widget
Sprocket
4U

Country

Isa Cmpy

CA

USA
UK

-- SQL:89 Inner Join Syntax


SELECT
*
FROM Suppliers S, Products P
WHERE S.SupplierID = P.SupplierID
AND P.Units > 2
-- SQL:92 Inner Join Syntax
SELECT

ProdID
1
2
3
4

ProdName
WidgOne
WidgTwo
Spockit
WidgThree

SupplierID
1
1
2
1

Units
4
6
3
1

*
FROM Suppliers as S JOIN Products as P ON S.SupplierID = P.SupplierID
WHERE P.Units > 2
Running these statements will result in the selection of rows from both tables which meet the join criteria
and will exclude rows which do not have a match. In our resultset we won't return Supplier 3 Isa Cmpy
because they don't have any products, and we don't return WidgThree from the Product table because it's
Units are less than the selection criteria.
SupplierID
1
1
2

CmpyName
ABC
Widget
ABC
Widget
Sprocket
4U

Country
USA

ProdID
1

ProdName
WidgOne

SupplierID
1

Units
4

USA

WidgTwo

UK

Spockit

A left join (also called a left outer join) would return rows from the first table and information from the other
if it exists. If it doesn't then nulls are returned in place of the missing data. The syntax for this would be:
-- SQL:89 Left Outer Join Syntax
SELECT
*
FROM Suppliers S, Products P
WHERE S.SupplierID *= P.ProductID
AND P.Units > 2
-- SQL:92 Left Outer Join Syntax
SELECT
*
FROM Suppliers as S LEFT JOIN Products as P ON S.SupplierID = P.SupplierID
WHERE P.Units > 2
SupplierID
1
1
2
3

CmpyName
ABC
Widget
ABC
Widget
Sprocket
4U

Country
USA

ProdID
1

ProdName
WidgOne

SupplierID
1

Units
4

USA

WidgTwo

UK

Spockit

Ima Cmpy

CA

NULL

NULL

NULL

NULL

A Right Outer Join is the converse of the Left Outer Join except that the table on the Right is the driver.
Two other types of joins can be made, although you don't see them too often. The first is a Full Outer Join
in which all rows from both tables are returned, and Cross Joins (also called a Cartesian Join) in which
every possible combination of both tables is returned. If you forget to add the WHERE clause and you've
specified more than one table, you are running a Cross Join.
Select
* from Suppliers, Products

Our example would return 12 rows (3 rows in Suppliers x 4 rows in Products = 12 combinations of the
two), but if we were using a more realistic scenario involving hundreds of suppliers and thousands of
products would result in hundreds of thousands of combinations.
SQL Server supports both types of syntax, but not all databases do. In order to ensure portability of your
code, you may find that the older syntax easier to work with. In the ANSI SQL 92 standard, the join
criteria is recommended to be part of the FROM clause in order to avoid confusion with the filter criteria of
the WHERE clause.

Union
A union is the combination into one record set the results of more than one query. The only requirement
is that all the queries have the same set of columns being returned and in the same order. For example, if
you wanted to return the names of tables and the number of rows in each we would use the following
syntax:
select
'Categories' as TAB, count(*) as CNT from Categories UNION
select
'CustomerCustomerDemo', count(*) from CustomerCustomerDemo UNION
select
'CustomerDemographics', count(*) from CustomerDemographics UNION
select
'Customers', count(*) from Customers UNION
select
'Employees', count(*) from Employees UNION
select
'EmployeeTerritories', count(*) from EmployeeTerritories UNION
select
'Order Details', count(*) from Order Details UNION
select
'Orders', count(*) from Orders UNION
select
'Products', count(*) from Products UNION
select
'Region', count(*) from Region UNION
select
'ReplenishWork', count(*) from ReplenishWork UNION
select
'Shippers', count(*) from Shippers UNION
select
'Suppliers', count(*) from Suppliers UNION
select
'Territories', count(*) from Territories
The column header names are taken from the first resultset that the query engine sees, in this case TAB
and CNT.

Insert
We use the INSERT statement to add new information to the database. The standard SQL statement for
inserting a new row is:
INSERT INTO (, , ...) values (, , ...)

You can omit the column list only if you have a value for every column in the row.
Adding a new row to the Supplier table would look like this:
INSERT INTO Suppliers (SupplierID, CmpyName, Country)
Values (3, 'Isa Cmpy', 'CA')

SQL Server provides us with another form of the INSERT statement in which we can
use an existing table as the source of the new records. In this case we use the syntax:
INSERT INTO (, , ...)
SELECT
, , ... FROM WHERE ...

For example, supposing we were implementing a reporting database (also called a data
warehouse) in which we wanted to perform aggregation and summarization on our
vendors. We could define a new table to contain our calculations and provide some
level of isolation from the transactional system. Our new table might look like this:
ReplenishWork
SupplierID
CmpyName
ProductsOutOfStock
We can run a SQL batch to load the table using the second form of the INSERT statement as follows:
INSERT INTO ReplenishWork (SupplierID, CmpyName, ProductsOutOfStock)
SELECT
P.SupplierID, S.CmpyName, Count(*)
FROM Suppliers as S INNER JOIN Products as P ON S.SupplierID = P.SupplierID
WHERE P.Units < 2
GROUP BY P.SupplierID, S.CmpyName

Update
We manage existing information in the database using the UPDATE statement. SQL Server allows for the
basic update statement that looks like this:
UPDATE SET = , =
WHERE ...

But it also includes a variation in which you can update one table using another table as
your source. This syntax is an extension Microsoft includes to the ANSI standard that
won't necessarily run in other environments.
UPDATE ReplenishWork SET ProductsOutOfStock = P.UnitsInStock
FROM Products as P

WHERE P.SupplierID = ReplenishWork.SupplierID

Delete
Purging the database of unnecessary information, or to enforce relational integrity, we
use the DELETE function. The basic syntax is:
DELETE FROM

WHERE ...

The WHERE clause determines which rows will be affected. A word of caution when
you are deleting data, it takes a long time to type it back in so you want to be sure you
check what you are getting rid of. You can do this by running a count of the rows that
will be affected (select
count(*) from
WHERE) before executing the delete. You can further protect yourself from unwanted aggravation by
using Transactions.

Variables
Variables are used to temporarily hold values that may be useful in the processing of
statements. We declare them using the DECLARE keyword, specifying the name and
the type, and we assign a value to them with the SET statement. You can also assign a
value as part of a SELECT
statement, but using SET is listed as the preferred method.
We reference them as we would column data or constants in the processing logic of our
code using the @ character in front of the name to denote a variable. Global variables
reference the status of the system and the database engine and other options of the
server and are referenced with two @@ characters.
All user declared variables are local in scope, meaning they remain accessible until the
end of the current batch of statements is complete. For example executing the following
script would result in an error because the variable @MyVar is not declared within the
batch in which it is referenced.
DECLARE @MyVar varchar(15)
SET @MyVar = 'New Value'
Go
PRINT 'The value of MyVar is ' + @MyVar

Batches
Batches are a grouping of one or more SQL statements that together form a logical unit
of work. They are sent together to the SQL Server engine, which creates an execution
plan. If errors are generated in creating the plan, none of the statements are actually
executed. Errors that occur after the execution plan has been compiled will cause

succeeding statements to fail, but will not affect previous logic in the batch. This is true
unless the batch is part of a transaction in which all of the statements are rolled back
after the error.
For example, the following Transact SQL Code is comprised of 3 statements, which are
all part of the same batch:
DECLARE @Discount float
SET @Discount = 0.05
SELECT
ProductID, UnitPrice, UnitPrice - @Discount * UnitPrice as SalePrice
FROM Products
ORDER BY SalePrice Desc
GO

Because the @Discount variable is used in the successive commands, if you try to run
either of the last two statements separately you will raise an error condition.
Creating objects such as views, defaults, procedures, rules and triggers cannot be
combined with other statements in a batch. A batch is completed by the inclusion of the
command GO. An important point to remember about GO is that it is not a Transact
SQL statement, rather it is recognized by Query Analyzer and ISQL as a marker to send
all statements from the previous GO to the engine and await the result set.
You can see the effect of this when the results of statements before the GO command
are visible in the results pane of Query Analyzer before the next batch of statements
executes. If you include GO in an ODBC call or from one of the database libraries such
as ADO or OLE DB, the SQL Server engine will flag a syntax error. Another point is that
local variables declared in a batch go out of scope when the batch end, which is one
reason why you cannot include GO in stored procedure declarations.

Transactions
A transaction is a grouping of statements that must be either succeed or fail together as
a logical unit of work. You may have heard of the ACID (Atomicity, Consistency,
Isolation and Durability) test for a transaction. These are the four properties which a
valid transaction must meet.

Atomicity means that all the statements part of the group must either succeed
or fail together. For example, if you were developing a banking application to
add new accounts, the transaction of adding rows to the customer table and the
account table might for a logical unit of work in which our data model integrity
constraints require that records must exist in both tables to be valid.
Consistency means that the database is left in a stable state at the end of the
transaction, and that all data integrity constraints are met.
Isolation defines that the data is modified either before or after other
transactions against the same rowset. Basically we cannot alter the data that is
at an intermediate state. The locking rules of the database prevent this from
happening.

Durability means that the effects of the transaction are persisted after the end
of the transaction, even in the event of a system failure. A system failure in the
middle of a transaction would roll back the state of the database to where it was
before the transaction began.

We manage transactions with the command BEGIN TRANSACTION, and end it with the
command COMMIT or ROLLBACK. When beginning a transaction, locks on the
database prevent other user sessions from altering the data.
For example, suppose we want to implement a sale price on our Products at Northwind.
BEGIN TRAN
DECLARE @Discount float
SET @Discount = 0.05
UPDATE Products
SET UnitPrice = UnitPrice - (@Discount * UnitPrice)
Go
-- Check the results...
Select
ProductID, UnitPrice
FROM Products
ORDER BY UnitPrice Desc

If we don't like what we see, we can roll back the transaction and leave the database in
its previous state
ROLLBACK
GO
SELECT
ProductID, UnitPrice -- See the original prices!
FROM Products
ORDER BY UnitPrice Desc
GO

Conversely to make the changes permanent we use the COMMIT operator:


COMMIT
Go

Data Definition Language (DDL)


Data Definition Language (DDL) is used to define tables, views, stored procedures, and
other objects that physically and logically express the data model you are working with.
Data modeling and the Data Definition Language is a big topic that takes up more space
than we've been allotted here. There are some very good books that focus just on this
topic including Louis Davidson's Data Modeling Book ISBN: ???
The core of relational databases are the tables that store information. We can create
tables using the Graphical tools provided with SQL Server, but this practice lends itself
better to development of unique, custom objects that are still in the process of evolving.
Deploying a schema out to multiple servers in a consistent basis typically relies on

scripts to complete the tasks involved. The Enterprise manager allows us to generate
these scripts, but we still should understand what the scripts are doing in case we need
to customize or change the script.
Managing Tables
Our first task in making our data model usable is to create base set of tables. We do this
using the CREATE TABLE statement which specifies how information is stored in the
table. Each column definition includes a name, data type, whether the column allows
null values to be stored or not, and other optional attributes. The basic syntax diagram
for the CREATE TABLE is below:
CREATE TABLE
()

A basic simple example of the ReplenishWork table can be created with the following
statement:
CREATE TABLE ReplenishWork
(SupplierID int NOT NULL,
CmpyName varchar(40) NOT NULL,
ProductsOutOfStock int NULL)

Business changes drive database changes and we don't always have the luxury of
recreating our schema in a production environment. Transact SQL allows us to expand
column sizes and add new columns to existing tables using the ALTER statement:
ALTER TABLE ReplenishWork
ADD ContactName varchar(20)

We can also change the size and type of columns, drop columns, manage constraints
and default values using the ALTER statement. If we didn't need to preserve existing
data and were making changes to the schema that prohibited the use of the ALTER
command we can DROP the table before recreating it. If we needed to reconfigure our
ReplenishWork table to support integration of legacy data that used a character based
supplier id we could do so with the following batch:
DROP TABLE ReplenishWork
GO
CREATE TABLE ReplenishWork
(SupplierID varchar(16) NOT NULL,
CmpyName varchar(40) NOT NULL,
ProductsOutOfStock int NULL,
ContactName varchar(20) NULL)

Indexes and Keys


Databases manage large amounts of information. In order to be efficient, they use data
structures to map the contents called Indexes. In SQL Server the indexes come in two
flavors: Clustered and Non-Clustered. The difference is that there can be only one

Clustered index on a table, and this index will determine the ordering of how the data is
physically stored. SQL Server supports up to 249 non-clustered indexes for any given
table. To add an index to our ReplenishWork table we could use the following command
CREATE TABLE ReplenishWork
(SupplierID int PRIMARY KEY CLUSTERED,
CmpyName varchar(40) NOT NULL,
ProductsOutOfStock int NULL)

By default SQL Server will use clustering on the primary key defined for a table. This is
important because without at least one clustered index on a large table, all the data
would be scattered throughout the physical storage, and performance on retrieval of
data may suffer. You can also create indexes on existing tables using the CREATE
command:
CREATE INDEX IDX_ReplenishWork
ON ReplenishWork (CmpyName)

On the other hand unnecessary indexes can impact performance as well. Think of an
index as a physical data structure that has a cost associated with maintaining it when
data changes. If the nature of your system involves capturing transactional information
as quickly as possible, you want to run with the least number of indexes. Each insert,
update and delete must be reflected in each of the indexes on the table, so the fewer
the indexes the less the overhead of those operations.
Systems which are primarily decision support and reporting based can afford to have
more indexes because once the data has been written it isn't updated as much as a
transactional system. You will tend to see more generous use of indexes in Data
Warehousing systems that support a lot of Ad-Hoc querying.
Create Views
Views allow us to provide an abstraction layer between the physical implementation of
the data and how users see the information. They are in essence virtual tables that we
can use to limit or expand how we are able to use the base set of tables in our
database. In SQL Server we are limited to a single SQL Select
statement to define the result set. The security architecture of SQL Server allows us to
grant users access to a view without giving access to tables (and views) on which the
view is based. For example, if we needed to make an employee phone list available to
users but needed to protect other employee information (such as salary, social security
number, birth date, etc) we could create a view of just the name & phone.
CREATE VIEW PhoneList AS
SELECT
FirstName + ' ' + LastName as EmpName, HomePhone
FROM Employees
go
select
* from PhoneList

The ALTER and DROP commands operate in the same way they do with tables. One
interesting feature of SQL Server is that it supports indexed and updateable views. This
means that you can define a view on some base objects, create indexes to optimize
access and even alter data stored in the underlying tables, although you need to be
aware that there is more overhead in updating a view than if you managed the data in
the underlying tables directly.
Schema Binding
In order to create an index on a view, the view must be created with the
SCHEMABINDING option. This option saves information about the underlying schema
is stored in the system tables and SQL Server will prevent changes that would
invalidate our View.
CREATE VIEW PhoneList
WITH SCHEMABINDING
AS
SELECT
FirstName + ' ' + LastName as EmpName, HomePhone
FROM Employees
go

Declarative Referential Integrity (DRI)


The business rules, which define the how our application behaves, such as that we will
allow users of our application to setup a customer before they make an order, are
implemented using foreign keys and constraints.
Constraints
ALTER TABLE dbo.Table1 ADD CONSTRAINT
PK_Customer PRIMARY KEY CLUSTERED
(
CustID
) ON [PRIMARY]

Default values and extended properties


We can further define default values and extended properties for the tables we create
using the:
ALTER TABLE dbo.Table1 ADD CONSTRAINT
DF_Table1_CreditLimit DEFAULT 0 FOR CreditLimit

Data Control Language (DCL)


Data Control Language (DCL) allows the owner and administrator of the database
objects to define what rights users have to the data and objects in the database. The
DBA can manage access to the database by granting and denying users rights to

objects. The term CRUD Matrix refers to the rights to Create, Read, Update, and/or
Delete information in a given table.

GRANT
Grant is used to give access to someone who needs to use the information
stored in the objects. SQL Server allows us to specify user based or group
based rights, and to apply them to objects or to groups of objects within a
database.
REVOKE
We use the REVOKE statement to reverse the effect of the GRANT statement.
DENY
The DENY statement will prevent a user from accessing objects in the
database.

Improvements over ANSI SQL


With the release of SQL Server 2000 Microsoft takes the SQL language a step further.
Among the many additions in this release, we find new data types, user defined
functions, and extensive support for XML, as well as indexing, trigger handling, and an
expanded suite of system stored procedures. The result is one of the most flexible,
scalable, and complete offerings in the database management system product field.
Data Types
While all relational databases implement the basic numeric, string, and date data-types
for storing data, Microsoft has added some which you may not have heard of. Some of
these were added in support of the Replication functions built into the database engine,
others fit more neatly with the new feature of User Defined Functions. The new data
types include:

BIGINT:
True to its name, this is a very big whole number. You can use it to store values
in the range of -9223372036854775808 to 9223372036854775807. This is 19
digits of precision that can hold some extremely large values. If you are porting
data from platforms that include intelligent keys (like the first 4 digits is the
location, the next 6 is customer, the next 3 is the weight, etc.) it would probably
be better to normalize your schema and transform the data into multiple
columns.
UniqueIdentifier:
Columns of this type store system generated identifiers known as GUID's or
Globally Unique Identifier. The GUID is a 38 character value generated by
windows that is guaranteed to be unique across the world. It is takes the same
form as the CLSID in the COM world. The NEWID function is used to generate
it, and we can store them in the column type UniqueIdentifier.

DECLARE @guid uniqueidentifier


select
@guid = NEWID()
PRINT 'Value of @guid is: '+ CONVERT(varchar(50), @guid)
Value of @guid is: F4840C7B-F44C-4A49-A12E-097B45AE3CE0
While you may not want to use a GUID as a primary key because of its cumbersome
nature for ad-hoc queries (try correctly typing in the 36 character value), it does allow
us to implement n-way replication in which data is created on multiple servers. For
example, if we were going to deploy the Northwind solution across the country and
setup a merge replication with a corporate site to contain all orders, using a
sequentially generated OrderID could create problems in that the same ID could be
created on more than one server. By redefining it as:
CREATE TABLE Orders (
OrderID uniqueidentifier NOT NULL default NEWID(),
CustomerID nchar (5) NULL ,
EmployeeID int NULL ,
OrderDate datetime NULL ,
RequiredDate datetime NULL ,
ShippedDate datetime NULL ,
ShipVia int NULL ,
Freight money NULL DEFAULT (0),
ShipName nvarchar (40) NULL ,
ShipAddress nvarchar (60) NULL ,
ShipCity nvarchar (15) NULL ,
ShipRegion nvarchar (15) NULL ,
ShipPostalCode nvarchar (10) NULL ,
ShipCountry nvarchar (15) NULL
)

While this makes the replication scheme possible, you should define a clustered primary
key on some other columnset. Ordering the data according to a randomly generated
value will result in poor performance when processing the data because the data is
scattered all over the disk in no order means there and there is increased seek time.
SQL_Variant:
Similar to the variant data type in Visual Basic, SQL_Variant is used to store data
values of all the supported data types not including text, ntext, image, and timestamp.
We can use it for column definitions, parameters, variables, and for the return value of
user defined functions. Microsoft provides the function SQL_VARIANT_PROPERTY to
return the base data type and other information about data stored.
One use for variants is in defining schema to hold attribute data. Suppose you need to
keep a variable number of optional attributes about a person, such as hair color,

weight, age, number of children, birthdate, favorite pet, etc. In our example we are
collecting what information we can, but because there are a lot of optional attributes,
we end up with a sparsely populated table.
PersonID
(int)

HairColor
(varchar 8)

Brown

Weight
(int)

Age
(int)

Sex
(char 1)

Birthdate
(datetime)

Pet
(varchar6)
Cat

250

32

Dog
1/2/62

Bird

Using SQL Variant we could more efficiently store the same set of information using a
schema that looks like this:
PersonID
(int)

AttributeName
(varchar 12)

AttributeValue
SQL_Variant

HairColor

Brown

Pet

Cat

Weight

250

Age

32

Sex

Pet

Dog

Birthdate

1/2/62

Pet

Bird

Table:
True to its name, the table data type is a variable that temporarily stores a rowset in
memory for later processing that can be used like an ordinary table. It accomplishes this
using space in the TEMPDB, but because it has a well defined scope, it efficiently cleans up
after itself. It also requires less locking and logging than tables created in the TEMPDB,
which results in fewer recompilations of the execution plan. The Table datatype was added
in support of a powerful new feature of SQL Server, the User Defined Function.
You declare a table variable using the DECLARE statement. You can then manage the
content of that table variable, inserting, updating and deleting as needed. Since the scope
of variables is local to the current batch, other processes cannot access your table variable.
declare @t table(PersonID int, AttrNm varchar(12), AttrVal sql_variant)
insert @t values(1, 'HairColor', 'Brown')
insert @t values(1, 'Pet', 'Cat')
insert @t values(2, 'Weight', 250)
insert @t values(2, 'Age', 32)
insert @t values(2, 'Sex', 'M')
insert @t values(2, 'Pet', 'Dog')
insert @t values(3, 'Birthdate', GetDate())
insert @t values(3, 'Pet', 'Bird')

Select
convert(varchar(15),SQL_VARIANT_PROPERTY(AttrVal, 'BaseType')), * from @t

User Defined Functions


Structured programming languages have long provided the ability to create procedures and function calls
to breakdown complex routines into manageable units. The introduction of the User Defined Function
(UDF) with SQL Server 2000 provides us with a new way of organizing and optimizing our use of the
database engine. A User Defined Function takes zero or more input parameters and returns either a
scalar or a table based value. We could use a UDF to calculate the distance between two geographic
locations, return a list of projects for a given client, or return an amortization table for a given set of loan
parameters.
UDF's are similar to stored procedures in that they both use the control statements and data manipulation
code to define the logic of the module. But they differ in that UDF's don't allow modification of the global
state of the calling process or the database environment. In other words User Defined Functions are
stateless in that there can be no side affects. You can't alter data in tables that are not locally defined
within the UDF, and you can't use non-deterministic functions within your code. Deterministic functions
always return the same value given the same input parameters, such as the DATEADD function. The
GETDATE function is non-deterministic in that the value depends on when you call it.
SQL Server 2000 supports 3 types of User Defined Functions: Scalar Functions, Inline Table Valued
Functions, and Multiple Statement Table Valued Functions. Scalar Functions are UDF's that return a
single discrete value and contain logic that is embedded between begin and end statements. This flavor of
UDF must return a single value each time it is called. We create a UDF using the DECLARE function:
CREATE FUNCTION udfSalePrice (@ProductID int, @TranDate datetime)
RETURNS varchar(50)
BEGIN
declare @UnitPrice money, @ProductName varchar(15)
select
@UnitPrice = UnitPrice, @ProductName = ProductName
from Northwind..Products
where ProductID = @ProductID
if @TranDate > '28-NOV-2003 08:00' and
@TranDate < '28-NOV-2003 18:00'
return @ProductName + ' on Sale for $' +
convert(varchar(10),@UnitPrice * .85)
return @ProductName + ' for $' + convert(varchar(10),@UnitPrice)
END
Go
select
Northwind.dbo.udfSalePrice(3, GetDate()) as Price

Table Valued Functions return a table variable that can be comprised from one or more statements. If it is
a single table and doesn't specify the column set in the declaration, then the statement doesn't require the
begin/end wrapper, and it is considered to be Inline.
create function udfRegionOrders (@Region varchar(15))
returns TABLE
as
return (select
* from Northwind..Orders where ShipRegion = @Region)
go
select
* from northwind..udfRegionOrders ('RJ')
Multi-statement UDF's allow us to do more complex processing in order to derive the result set. Like
scalar UDF's we can declare local variables, use flow control statements, manage data in locally defined
table variables and call stateless extended stored procedures.
Using our Northwind database for this example, suppose we wanted to create a function that returned
sales ranking information.
CREATE FUNCTION udfTopSales (@nCount int)
RETURNS @retTable TABLE
(Rank int, ProductID int, SaleAmt money)
AS
BEGIN
DECLARE @ProdID int, @Sales money, @Rank int
SET @Rank = 1
DECLARE curSales CURSOR FOR
SELECT productID, Sum(UnitPrice * Quantity - Discount) as SaleAmt
FROM [Order Details]
GROUP BY ProductID
ORDER BY SaleAmt DESC
OPEN curSales
FETCH FROM curSales into @ProdID, @Sales
WHILE @@FETCH_STATUS = 0 BEGIN
IF @Rank <= @nCount
INSERT INTO @retTable
values (@Rank, @ProdID, @Sales)
FETCH NEXT FROM curSales into @ProdID, @Sales
SET @Rank = @Rank + 1
END
CLOSE curSales
DEALLOCATE curSales
RETURN
END
go
select * from dbo.udfTopSales (10)

go

Stored Procedures
While we can run ad-hoc SQL Statements within Query Analyzer, and we have the ability to use
the execute() function in ADO, frequently performed logic can be encapsulated into batches and saved as
Stored Procedures on the server. The benefits from this approach not only include the reduction of work
of testing multiple programs and applications to ensure that the logic is correct, but also from the
execution plan generated on the server when the stored procedure is compiled and saved.
Stored Procedures are a collection of Transact SQL statements which perform a defined set of work. This
can be as simple as updating a single row or returning a configuration setting, to being as complex as
implementing a business rule that requires sending an e-mail alert containing crucial information specific
to the problem. They can take both input and output parameters, and always return an integer value.
There are several types of stored procedures including those that come with SQL Server, user defined
and extended stored procedures.

System Stored Procedures.


Microsoft includes a broad suite of system stored procedures for administering the server engine. There
are over 900 system stored procedures that SQL Books breaks into 17 categories. They include
procedures for managing system security, to setup and manage distributed queries and linked servers
(making tables and data on different machines visible to each other), data replication, retrieve information
about schema, view current performance and usage information, manage SQL Agent (the scheduler that
comes with SQL Server), and to interface with XML data.
For example, to return the schema of a table we can use the sp_help stored procedure:
EXEC sp_help Employees
If the stored procedure call is the first command in the batch we can omit the EXEC(UTE) command. To
see the definition of a view or a stored procedure or other system objects we use sp_helptext:
sp_helptext PriceLookUp

User defined Stored Procedures.


We can define our own stored procedures to provide users of the database with a consistent
implementation of work logic across applications. This simplifies the administration of business rules by
acting as a single point of control on the database server itself. If the rule is changed, we only need to
update the associated stored procedures instead of the various applications making use of the data.
Stored procedures are declared with the CREATE PROCEDURE statement.
CREATE PROCEDURE spSimple as
select
'HELLO WORLD!'
go
For more information on developing stored procedures see chapter XX

Extended stored procedures (XP's).


Extended stored procedures allow us to extend Transact SQL develop logic outside of the SQL Server
using tools such as C or Visual Basic. They follow the same calling standards as a normal stored

procedure, complete with parameters and a return code. XP's are implemented using a call to dynamic
link libraries (dll) that conform to the Extended Stored Procedure API. you need to be careful that they are
thoroughly tested and checked for aberrant behavior before using them in a production situation because
these calls are external to SQL Server.
If we had written a standard event management system to log information to the NT Event Log and
wanted to include it in our SQL environment we call the stored procedure sp_addextendedproc:
USE Master
go
EXEC sp_AddExtendedProc xp_LogEvent 'c:\winnt\system32\EventLogger.dll'

Where we use T-SQL


The Transact SQL Language is the primary interface to the SQL Server database engine. Any time we
interact with SQL Server, whether it is through one of the provided administrative tools, a custom
application, or through the command line interface ISQL, we are executing SQL statements. The vehicle
for this might be ODBC or one of the high level drivers, or it might be with DB-Library, the original
interface provided for talking to the database. In any case, requests in the form of SQL statements are
made to the engine, which then processes them and returns the result set.

Query Analyzer
Query Analyzer is the current ad-hoc user interface that comes with SQL Server. It is installed as part of
the client tools and is typically one of the first places a developer will begin working with SQL. By default it
opens to an edit screen in which we can type our code. It provides context based coloring which is a nice
enhancement from the previous version.
To run a batch of statements, you can click on the Execute button and results pane will be display the
output of the selected batch. By default the entire script will be run, but you can limit it to portions of the
script by selecting what you want to execute and then only the selected statements will be run.

2015 - BenkoTIPS.com - Explore the Possible!

You might also like