Sorogate Key

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Introduction

A Surrogate Key in SQL Server is a unique identifier for each row in the table. It is just a key. Using this key
we can identify a unique row. There is no business meaning for Surrogate Keys. This type of key is either
database generated or generated via another application (not supplied by user).

A Surrogate Key is just unique identifier for each row and it may use as a Primary Key. There is only
requirement for a surrogate Primary Key, which is that each row must have a unique value for that
column. A Surrogate Key is also known as an artificial key or identity key. It can be used in data
warehouses.

A Surrogate Key should have the following characteristics:

 Unique Value
 The key is generated by the system, in other words automatically generated
 The key is not visible to the user (not a part of the application)
 It is not composed of multiple keys
 There is no semantic meaning of the key

Generally, a Surrogate Key is a sequential unique number generated by SQL Server or the database itself.
The purpsoe of a Surrogate Key is to act as the Primary Key. There is a slight difference between a
Surrogate Key and a Primary Key. Ideally, every row has both a Primary Key and a Surrogate Key. The
Primary Key identifies the unique row in the database while the Surrogate Key identifies a unique entity in
the model.

Note that Surrogate Keys are never used with any business logic other than simple Create, Read, Update
and Delete (CRUD) operations.

Example of Surrogate Key

 Identity Column in SQL Server


 GUID (Globally Unique Identifier)
 UUID (Universally Unique Identifier)

How can we implement Surrogate Key?

There are several ways to implement Surrogate Keys as in the following:

 Auto Incremental key in Database

A Surrogate Key can be implementing by an auto incremented key. SQL Server supports an
IDENTITY column to perform the auto increment feature. It allows a unique number to be
generated when a new record is inserted into the database table.

--Syntax for Introducing Auto identity column with Create Table.


CREATETABLE [dbo].[EmployeeMaster](
[EmployeeId] [int]IDENTITY(1,1)NOT NULL,
[EmployeeCode] [varchar](25)NULL,
[EmployeeName] [varchar](50)NULL,
[EmailAddress] [varchar](50)NULL,
)
--Syntax for Introducing Auto identity column with Create Table.
ALTERTABLE EmployeeMasterADD ID INTIDENTITY(1,1)

 Manual Incremental key in Database

A Surrogate Key can be implemented by manual incremental key. Using the max() function we
can find a maximum value of a column and this value is incremented by one. This approach
suffers from a performance problem when a table has a large amount of data.

--Example
DECLARE @newId INT
SELECT @newId = ISNULL(MAX(EmployeeId),0)+ 1 FROM EmployeeMaster
PRINT @newId
--The varible @newId can be used as indentifier of newly inserted data.

 Globally Unique Identifiers (GUID)

GUID is a Microsoft standard that extends Universally Unique Identifier (UUID). Using a NEWID()
function we can generate a new GUID in SQL Server. It is a 16 byte GUID.

--Example
DECLARE @newID UNIQUEIDENTIFIER
SET @newID = NEWID()
--The varible @newId can be used as indentifier of newly inserted data.

NEWSEQUENTIALID() can be used with DEFAULT constraints on the table column of type
uniqueidentifier. We cannot use the NEWSEQUENTIALID() function as a reference in queries.

 Universally unique identifier (UUID)

UUID is 128 bit values that are created from hash of the ID of Ethernet card and current data time
of SQL Server.

Advantages of Surrogate Key

 A Surrogate Key does not change so the application cannot lose their reference row in the
database.
 If the Primary Key is changed then the related foreign key does not change across the database
because the Surrogate Key is used as a reference key. In other words, the Surrogate Key value is
never changed, so the foreign key values become stable.
 A Surrogate Key is most often a compact data type such as an integer. A Surrogate Key is less
expensive in a "Join" than the compound key.
 Business logic does not something in this key.
 A table always has a uniform Surrogate Key, so some tasks can be easily automated by writing the
code table independently.
 There is no locking contention because it is a unique identifier.
 A Surrogate Key does not require an extra field; that helps to save space in the database.
 The relationship between any two tables is simple and consistent in SQL code expressions.
 Object Relational Mapping (ORM) frameworks such as Entity Framework, N-Hibernate, and so on
are designed to work optimally with Surrogate Keys. It is very simple to implement them over the
composite keys.
 It allows for a higher degree of normalization, so data is not duplicated within the database.

Disadvantages of Surrogate Key

 Additional index is required.


 It cannot be used as a search key because it is not related to any business logic or it is
independent of any business logic.
 There is always a requirement to join to the main table when data is selected from a child table.
 It increases the sequential number by a random amount.
 There is some administrative overhand to maintain a Surrogate Key.
 Extra disk space required to store a Surrogate Key.

How Surrogate Key is differing from the Natural Key?

The alternative to a Surrogate Key is Natural Keys. A Natural Key is a true unique identifier in the
database. It is a single value or composite value that has business meaning. The Natural Key can be one or
more columns with any data type. If there is no Surrogate Key on table then there is no need to create a
unique index or sequence on a database table, so it helps us to reduce administrative overhead.

Disadvantages of Natural Key

 A query join may become complex because the Natural Key can have one or more columns.
 It is a reduced normalization form.
 It is very difficult to use and time consuming with ORM because ORM is designed to work best
with Surrogate Keys.
 The key type is not consistent.
 More work is required to change a Natural Key when the foreign key relationship has been built
by a Natural Key.
 A Natural Key is larger than a Surrogate Key.
 A Natural Key can be any data type, so it might require a long execution time in a "join" query.
For example, if there is a VARCHAR data type as a Natural Key type then the join between two
tables may take more time to produce output.
 A Natural Key is assigned by the application, so there is no way to know whether a record is new
or an existing record.

Conclusion

A Surrogate Key is unique in the database table; it is just like an artificial or alternative key to a Primary
Key because a Primary Key may be alphanumeric or a composite key. A Surrogate Key is always unique
per table.
Surrogate Keys offer many benefits. Simplicity, consistency and stability, makes the use of an ORM
extremely feasible. We can use a Natural Key instead of A Surrogate Key when A Natural Key is small and
this key is never updated.

What is the difference between a primary


key and a surrogate key?
Ask Question

32
4
I googled a lot, but I did not find the exact straight forward answer with an example.

Any example for this would be more helpful.

sql sql-server sql-server-2008 sql-server-2005 sql-server-2012

shareimprove this question


edited Apr 21 '16 at 15:05

asktonishant
16713
asked Apr 21 '16 at 14:43

Dom
171129
add a comment
6 Answers
activeoldest votes

50
The primary key is a unique key in your table that you choose that best uniquely identifies a record in
the table. All tables should have a primary key, because if you ever need to update or delete a record
you need to know how to uniquely identify it.

A surrogate key is an artificially generated key. They're useful when your records essentially have no
natural key (such as a Person table, since it's possible for two people born on the same date to have
the same name, or records in a log, since it's possible for two events to happen such they they carry
the same timestamp). Most often you'll see these implemented as integers in an automatically
incrementing field, or as GUIDs that are generated automatically for each record. ID numbers are
almost always surrogate keys.
Unlike primary keys, not all tables need surrogate keys, however. If you have a table that lists the
states in America, you don't really need an ID number for them. You could use the state abbreviation
as a primary key code.

The main advantage of the surrogate key is that they're easy to guarantee as unique. The main
disadvantage is that they don't have any meaning. There's no meaning that "28" is Wisconsin, for
example, but when you see 'WI' in the State column of your Address table, you know what state
you're talking about without needing to look up which state is which in your State table.

shareimprove this answer


edited Apr 21 '16 at 15:16
answered Apr 21 '16 at 15:00

Bacon Bits
21.1k43041
 I think the main disadvantage is that sometimes when people use an autogenerated key
(and integers are often used instead of the natural key not just when no natural key
exists), they often forget to put unique indexes on the natural key that they didn't choose
as the PK. This often allows duplicates to get into the system which can create
problems. The two main advantages of autogenerated keys are that they generally
increase performance in the joins (if integers not GUIDS) and they prevent mass
updating of lots of child records when the Natural Key changes. – HLGEM Apr 21 '16 at 15:22
 @HLGEM Sure, I'll buy those. I think I focus on lack of meaning because I've just
worked in hypernormalized systems where essentially every field was it's own table. It
made it impossible to tell where data entry errors had occurred, and very difficult to
apply business rules to locate problems. – Bacon Bits Apr 21 '16 at 18:02
 I love normalization but you can indeed take it too far. – HLGEM Apr 21 '16 at 18:13
 @HLGEM: Interesting, tell me more. You knew your love for normalization had gone to
far when...? – onedaywhen Oct 13 '16 at 15:27
 @onedaywhen Imagine a database which never allowed null values for any field. As
such, nearly every field is in it's own table, and nearly every join is an outer join because
still don't have complete data. So, you haven't actually eliminated nulls, you've just
eliminated storing them. Trying to validate such a system with business rules after the
fact is virtually impossible because it's so difficult to compare records. Since every
potential relation might be many-to-one, you have partial cross joins appearing. And the
performance impact of every query having dozens of joins? – Bacon Bits Mar 16 '17 at 17:55
show 4 more comments
6
A surrogate key is a made up value with the sole purpose of uniquely identifying a row. Usually,
this is represented by an auto incrementing ID.
Example code:

CREATE TABLE Example


(
SurrogateKey INT IDENTITY(1,1) -- A surrogate key that increments automatically
)
A primary key is the identifying column or set of columns of a table. Can be surrogate key or any
other unique combination of columns (for example a compound key). MUST be unique for any row
and cannot be NULL.
Example code:

CREATE TABLE Example


(
PrimaryKey INT PRIMARY KEY -- A primary key is just an unique identifier
)
shareimprove this answer
edited Jun 27 '18 at 13:04
answered Apr 21 '16 at 15:02

tobypls
664516
add a comment
3
All keys are identifiers used as surrogates for the things they identify. E.F.Codd explained the
concept of system-assigned surrogates as follows [1]:
Database users may cause the system to generate or delete a surrogate, but they have no control over
its value, nor is its value ever displayed to them.
This is what is commonly referred to as a surrogate key. The definition is immediately problematic
however because Codd was assuming that such a feature would be provided by the DBMS. DBMSs
in general have no such feature. The keys are normally visible to at least some DBMS users as, for
obvious reasons, they have to be. The concept of a surrogate has therefore morphed slightly in usage.
The term is generally used in the data management profession to mean a key that is not exposed and
used as an identifier in the business domain. Note that this is essentially unrelated to how the key is
generated or how "artificial" it is perceived to be. All keys consist of symbols invented by humans or
machines. The only possible significance of the term surrogate therefore relates how the key is used,
not how it is created or what its values are.
[1] Extending the database relational model to capture more meaning, E.F.Codd, 1979

shareimprove this answer


answered Apr 22 '16 at 8:09

nvogel
21k12964
add a comment
1
This is a great treatment describing the various kinds of keys:
http://www.agiledata.org/essays/keys.html
shareimprove this answer
answered Apr 21 '16 at 14:46

n8wrl
16.4k45090
add a comment
1
A surrogate key is typically a numeric value. Within SQL Server, Microsoft allows you to define a
column with an identity property to help generate surrogate key values.

The PRIMARY KEY constraint uniquely identifies each record in a database table. Primary keys
must contain UNIQUE values. A primary key column cannot contain NULL values. Most tables
should have a primary key, and each table can have only ONE primary key.

http://www.databasejournal.com/features/mssql/article.php/3922066/SQL-Server-Natural-Key-
Verses-Surrogate-Key.htm
shareimprove this answer
answered Apr 21 '16 at 15:00

Bishoy Frank
1147
add a comment
-1
I think Michelle Poolet describes it in a very clear way:

A surrogate key is an artificially produced value, most often a system-managed, incrementing


counter whose values can range from 1 to n, where n represents a table's maximum number of rows.
In SQL Server, you create a surrogate key by assigning an identity property to a column that has a
number data type.
http://sqlmag.com/business-intelligence/surrogate-key-vs-natural-key
It usually helps you use a surrogate key when you change a composite key with an identity column.

You might also like