Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Database Normalization We cant easily search or

What is Normalization? index the data


Normalization allows us to We cant easily change
organize data so that it: the data
Allows faster access We cant easily reference
(dependencies make the data in other tables
sense) Breaking the Employee column
Reduced space (less into more than 1 column
redundancy) doesnt solve our problems:
The data may look
Normal Forms atomic, but only because
Normalization is done through we have many identical
changing or transforming data into columns storing a single
various Normal Forms. piece of data instead of a
There are 5 Normal Forms but single column storing
we almost never use 4NF or many pieces of data.
5NF. We still cant easily sort,
We will only be concerned with search, or index our
1NF, 2NF, and 3NF. employees.
For a database to be in a normal What if a manager has
form, it must meet all more than 2 employees,
requirements of the previous 10 employees, 100
forms: employees? Wed need to
E.g. For a database to be add columns to our
in 2NF, it must already be database just for these
in 1NF. For a database to cases.
be in 3NF, it must already It is still hard to reference
be in 1NF and 2NF. our employees in other
Sample Data tables.
By the way, what would be a
good choice of a Primary Key for
this table?
Students would be expected to
answer Manager since each
manager is only listed once, and
This data has some problems: the employees are scattered
The Employees column is across multiple columns. Also,
not atomic. an employee may change
A column must be managers fairly frequently (but
atomic, meaning once a person is a manager,
that it can only they are likely to remain
hold a single item managers).
of data. This
column holds more First Normal Form
than one employee 1NF means that we must:
name. Eliminate duplicate
Data that is not atomic means: columns from the
We cant easily sort the same table, and
data Create separate tables
for each group of
related data into Of course there may come a
separate tables, each day when we hire a second
with a unique row employee or manager with the
identifier (primary same name. To avoid this, lets
key) use an employee ID instead of
Lets get started by making their name.
our columns atomic Moving to Second Normal Form
Atomic Data A database in 2NF must also
be in 1NF:
Data must be atomic
Every row (or tuple) must
have a unique primary
key
Plus:
Subsets of data that
By breaking each tuple of apply to multiple rows
our table into an entry for (repeating data) are
each employee, we have moved to separate tables
made our data atomic.
What would be the primary
key?
Students should now say that the
Employee is the Primary Key since
there are now multiple manager
values in the table. Only Employee is
unique.
Primary Key
The best primary key would be
the Employee column. This data is in 1NF: all fields are
Every employee only has one atomic and the CustID serves as
manager, therefore an the primary key
employee is unique.

First Normal Form


Congratulations!
The fact that all our data and
columns is atomic and we have
a primary key means that we
are in 1NF!
First Normal Form Revised

But lets pay attention to


the City, State, and Zip
fields:
There are 2 rows of
repeating data: one for
Chicago, and one for St.
Paul.
Both have the same city, the primary key in the Zip code
state and zip code table
The CustID determines all the Advantages of 2NF
data in the row, but U.S. Zip Saves space in the database by
codes determines the City and reducing redundancies
State. (e.g. A given Zip code If a customer calls, you can just
can only belong to one city and ask them for their Zip code and
state so storing Zip codes with a youll know their city and state!
City and State is redundant) (No more spelling mistakes)
This means that City and State If a City name changes, we only
are Functionally Dependent on need to make one change to the
the value in Zip code and not database.
only the primary key.
To be in 2NF, this repeating data Summary So Far
must be in its own table. 1NF:
So: All data is atomic
Lets create a Zip code All rows have a unique
table that maps Zip primary key
codes to their City and 2NF:
State. Data is in 1NF
Note that Canadian Postal Subsets of data in
Codes are different: the multiple columns are
same city and state can moved to a new table
have many different These new tables are
postal codes. related using foreign keys
Moving to 3NF
To be in 3NF, a database must
be:
In 2NF
All columns must be fully
functionally dependent
on the primary key (There
are no transitive
dependencies)

We see that we can actually


save 2 rows in the Zip Code
table by removing these
redundancies: 9 customer
records only need 7 Zip code
records.
Zip code becomes a foreign key In this table:
in the customer table linked to CustomerID and ProdID
depend on the OrderID
and no other column Lets diagram the
(good) dependencies.
Stated another way, If We can see that all fields are
you know the OrderID, dependent on OrderID, the
you know the CustID and Primary Key (white lines)
the ProdID But Total is also determined by
So: OrderID CustID, Price and Quantity (yellow lines)
ProdID This is a derived field
But there are some fields (Price x Quantity = Total)
that are not dependent on We can save a lot of
OrderID: space by getting rid of it
Total is the simple altogether and just
product of Price*Quantity. calculating total when we
As such, has a transitive need it
dependency to Price and Price is also determined by both
Quantity. ProdID and Quantity rather than
Because it is a calculated the primary key (red lines). This
value, doesnt need to be is called a transitive
included at all. dependency. We must get rid of
Also, we can see that Price isnt transitive dependencies to have
really dependent on ProdID, or 3NF.
OrderID. Customer 1001 bought We do this by moving the
AB-111 for $50 (in order 1) and transitive dependency into a
for $75 (in order 7), while 1002 second table
spent $60 for each item in order By splitting out the table, we
2. can quickly adjust our price
Maybe price is dependent on table to meet our competitor, or
the ProdID and Quantity: The if the prices changes from our
more you buy of a given suppliers.
product the cheaper that
product becomes!
So we ask the business
manager and she tells us that
this is the case.
We say that Price has a
transitive dependency on The second table is our pricing
ProdID and Quantity. list.
This means that Price Think of Quantity as a
isnt just determined by range:
the OrderID. It is also AB-111: 1-100, 101-500,
determined by the size 501 and more
(or quantity) of the order ZA-245: 1-10, 11-50, 51
(and of course what is and more
ordered). The primary Key for this
second table is a composite
of ProdID and Quantity.
Congratulations! Were now in
3NF!
We can also quickly figure out
what price to offer our
customers for any quantity they It is in 1NF
want. There is no repeating
To summarize (again) data in its tables.
A database is in 3NF if: Put another way, if
It is in 2NF we use a
It has no transitive composite primary
dependencies key, then all
A transitive attributes are
dependency exists dependent on all
when one attribute parts of the key.
(or field) is
determined by And Finally
another non-key
attribute (or field) A database is in 1NF if:
We remove fields
All its attributes are
with a transitive
atomic (meaning they
dependency to a
contain only a single unit
new table and link
them by a foreign or type of data), and
key. All rows have a unique
Summarizing
primary key.
A database is in 2NF if:

You might also like