Professional Documents
Culture Documents
Gaspar, Vanessa C. 2bsais-3 Assignment #2 (Mon. 6-9PM)
Gaspar, Vanessa C. 2bsais-3 Assignment #2 (Mon. 6-9PM)
Gaspar, Vanessa C. 2bsais-3 Assignment #2 (Mon. 6-9PM)
2- BSAIS- 3
MON(6:00PM-9:00PM)
ASSIGNMENT NO.2
The inventor of the relational model Edgar Codd proposed the theory of normalization with the
introduction of First Normal Form, and he continued to extend theory with Second and Third Normal
Form. Later he joined with Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.
Theory of Data Normalization in SQL is still being developed further. For example, there are discussions
even on 6th Normal Form. However, in most practical applications, normalization achieves its best in
3rd Normal Form. The evolution of Normalization theories is illustrated below-
Assume a video library maintains a database of movies rented out. Without any normalization, all
information is stored in one table as shown below.
1NF Example
What is a KEY?
A KEY is a value used to identify a record in a table uniquely. A KEY could be a single column or
combination of multiple columns
Note: Columns in a table that are NOT used to identify a record uniquely are called non-key columns.
In our database, we have two people with the same name Robert Phil, but they live in different places.
Hence, we require both Full Name and Address to identify a record uniquely. That is a composite key.
Let's move into second normal form 2NF
Rule 1- Be in 1NF
Rule 2- Single Column Primary Key
It is clear that we can't move forward to make our simple database in 2 nd Normalization form unless we
partition the table above.
We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains member
information. Table 2 contains information on movies rented.
We have introduced a new column called Membership_id which is the primary key for table 1. Records
can be uniquely identified in Table 1 using membership id
Foreign Key references the primary key of another Table! It helps connect your Tables
A foreign key can have a different name from its primary key
It ensures rows in one table have corresponding rows in another
Unlike the Primary key, they do not have to be unique. Most often they aren't
Foreign keys can be null even though primary keys can not
The above problem can be overcome by declaring membership id from Table2 as foreign key of
membership id from Table1
Now, if somebody tries to insert a value in the membership id field that does not exist in the parent
table, an error will be shown!
Consider the table 1. Changing the non-key column Full Name may change Salutation.
Rule 1- Be in 2NF
Rule 2- Has no transitive functional dependencies
To move our 2NF table into 3NF, we again need to again divide our table.
3NF Example
We have again divided our tables and created a new table which stores Salutations.
There are no transitive functional dependencies, and hence our table is in 3NF
In Table 3 Salutation ID is primary key, and in Table 1 Salutation ID is foreign to primary key in Table 3
Now our little example is at a level that cannot further be decomposed to attain higher forms of
normalization. In fact, it is already in higher normalization forms. Separate efforts for moving into next
levels of normalizing data are normally needed in complex databases. However, we will be discussing
next levels of normalizations in brief in the following.
Even when a database is in 3rd Normal Form, still there would be anomalies resulted if it has more than
one Candidate Key.
If no database table instance contains two or more, independent and multivalued data describing the
relevant entity, then it is in 4th Normal Form.
A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed into any number of smaller
tables without loss of data.
6th Normal Form is not standardized, yet however, it is being discussed by database experts for some
time. Hopefully, we would have a clear & standardized definition for 6 th Normal Form in the near
future...
3. What is the difference of normalized data and unnormalized data and provide examples
for each.
Normalization and denormalization are the methods used in databases. The terms are differentiable
where Normalization is a technique of minimizing the insertion, deletion and update anomalies through
eliminating the redundant data. On the other hand, Denormalization is the inverse process of
normalization where the redundancy is added to the data to improve the performance of the specific
application and data integrity.
Normalization prevents the disk space wastage by minimizing or eliminating the redundancy.
1. Comparison Chart
2. Definition
3. Key Differences
4. Conclusion
Comparison Chart
BASIS FOR
NORMALIZATION DENORMALIZATION
COMPARISON
Purpose To reduce the data redundancy and To achieve the faster execution of
inconsistency. the queries through introducing
redundancy.
Used in OLTP system, where the emphasize is on OLAP system, where the emphasis is
making the insert, delete and update on making the search and analysis
anomalies faster and storing the quality faster.
data.
Definition of Normalization
Normalization is the method of arranging the data in the database efficiently. It involves constructing
tables and setting up relationships between those tables according to some certain rules. The
redundancy and inconsistent dependency can be removed using these rules in order to make it more
flexible.
Redundant data wastes disk space, increases data inconsistency and slows down the DML queries. If the
same data is present in more than one place and any updation is committed on that data, then the
change must be reflected in all locations. Inconsistent data can make data searching and access harder
by losing the path to it.
There are various reasons behind performing the normalization such as to avoid redundancy, updating
anomalies, unnecessary coding, keeping the data into the form that can accommodate change more
easily and accurately and to enforce the data constraint.
Normalization includes the analysis of functional dependencies between attributes. The relations
(tables) are decomposed with anomalies to generate relations with a structure. It helps in deciding
which attributes should be grouped in a relation.
The normalization is basically based on the concepts of normal forms. A relation table is said to be in a
normal form if it fulfils a certain set of constraints. There are 6 defined normal forms: 1NF, 2NF, 3NF,
BCNF, 4NF and 5NF. Normalization should eliminate the redundancy but not at the cost of integrity.
Definition of Denormalization
Denormalization is the inverse process of normalization, where the normalized schema is converted
into a schema which has redundant information. The performance is improved by using redundancy and
keeping the redundant data consistent. The reason for performing denormalization is
the overheads produced in query processor by an over-normalized structure.
Denormalization can also be defined as the method of storing the join of superior normal form relations
as a base relation, which is in a lower normal form. It reduces the number of tables, and complicated
table joins because a higher number of joins can slow down the process. There are various
denormalization techniques such as: Storing derivable values, pre-joining tables, hard-coded values and
keeping details with master, etc.
Here the denormalization approach, emphasizes on the concept that by placing all the data in one place,
could eliminate the need of searching those multiple files to collect this data. The basic strategy is
followed in denormalization is, where the most ruling process is selected to examine those
modifications that will ultimately improve the performance. And the most basic alteration is that adding
multiple attributes to the existing table to reduce the number of joins.
What is database denormalization? Before diving into the subject, let’s emphasize that normalization
still remains the starting point, meaning that you should first of all normalize a database’s structure. The
essence of normalization is to put each piece of data in its appropriate place; this ensures data integrity
and facilitates updating. However, retrieving data from a normalized database can be slower, as queries
need to address many different tables where different pieces of data are stored. Updating, to the
contrary, gets faster as all pieces of data are stored in a single place.
The majority of modern applications need to be able to retrieve data in the shortest time possible. And
that’s when you can consider denormalizing a relational database. As the name suggests,
denormalization is the opposite of normalization. When you normalize a database, you organize data to
ensure integrity and eliminate redundancies. Database denormalization means you deliberately put the
same data in several places, thus increasing redundancy.
“Why denormalize a database at all?” you may ask. The main purpose of denormalization is to
significantly speed up data retrieval. However, denormalization isn’t a magic pill. Developers should use
this tool only for particular purposes:
Typically, a normalized database requires joining a lot of tables to fetch queries; but the more joins, the
slower the query. As a countermeasure, you can add redundancy to a database by copying values
between parent and child tables and, therefore, reducing the number of joins required for a query.
You can denormalize a database to provide calculated values. Once they’re generated and added to
tables, downstream programmers can easily create their own reports and queries without having in-
depth knowledge of the app’s code or API.
Often, applications need to provide a lot of analytical and statistical information. Generating reports
from live data is time-consuming and can negatively impact overall system performance.
Denormalizing your database can help you meet this challenge. Suppose you need to provide a total
sales summary for one or many users; a normalized database would aggregate and calculate all invoice
details multiple times. Needless to say, this would be quite time-consuming, so to speed up this process,
you could maintain the year-to-date sales summary in a table storing user details.
Now that you know when you should go for database denormalization, you’re probably wondering how
to do it right. There are several denormalization techniques, each appropriate for a particular situation.
Let’s explore them in depth:
If you need to execute a calculation repeatedly during queries, it’s best to store the results of it. If the
calculation contains detail records, you should store the derived calculation in the master table.
Whenever you decide to store derivable values, make sure that denormalized values are always
recalculated by the system.
Disadvantages
Advantages
No need to look up source values Running data manipulation language (DML) statements against
each time a derivable value is the source data requires recalculation of the derivable data
needed
Disadvantages
Advantages
No need to perform a calculation for Data inconsistencies are possible due to data duplication
every query or report
Example
As an example of this denormalization technique, let’s suppose we’re building an email messaging
service. Having received a message, a user gets only a pointer to this message; the pointer is stored in
the User_messages table. This is done to prevent the messaging system from storing multiple copies of
an email message in case it’s sent to many different recipients at a time. But what if a user deletes a
message from their account? In this case, only the respective entry in the User_messages table is
actually removed. So to completely delete the message, all User_messages records for it must be
removed.
Denormalization of data in one of the tables can make this much simpler: we can add
ausers_received_count to the Messages table to keep a record of User_messages kept for a specific
message. When a user deletes this message (read: removes the pointer to the actual message),
the users_received_count column is decremented by one. Naturally, when
the users_received_count equals zero, the actual message can be deleted completely.
To pre-join tables, you need to add a non-key column to a table that bears no business value. This way,
you can dodge joining tables and therefore speed up queries. Yet you must ensure that the
denormalized column gets updated every time the master column value is altered.
This denormalization technique can be used when you have to make lots of queries against many
different tables – and as long as stale data is acceptable.
Advantages Disadvantages
You can put off updates as long as stale data is tolerable An extra column requires additional working and disk s
Example
Imagine that users of our email messaging service want to access messages by category. Keeping the
name of a category right in the User_messages table can save time and reduce the number of necessary
joins.
However, when using hardcoded values, you should create a check constraint to validate values against
reference values. This constraint must be rewritten each time a new value in the A table is required.
This data denormalization technique should be used if values are static throughout the lifecycle of your
system and as long as the number of these values is quite small. Now let’s have a look at the pros and
cons of this technique:
Advantages Disadvantages
Example
Suppose we need to find out background information about users of an email messaging service, for
example the kind, or type, of user. We’ve created a User_kinds table to store data on the kinds of users
we need to recognize.
The values stored in this table aren’t likely to be changed frequently, so we can apply hardcoding. We
can add a check constraint to the column or build the check constraint into the field validation for the
application where users sign in to our email messaging service.
There can be cases when the number of detail records per master is fixed or when detail records are
queried with the master. In these cases, you can denormalize a database by adding detail columns to
the master table. This technique proves most useful when there are few records in the detail table.
Advantages Disadvantages
Saves space
Example
Imagine that we need to limit the maximum amount of storage space a user can get. To do so, we need
to implement restraints in our email messaging service − one for messages and another for files. Since
the amount of allowed storage space for each of these restraints is different, we need to track each
restraint individually. In a normalized relational database, we could simply introduce two different tables
− Storage_types and Storage_restraints − that would store records for each user.
message_space_allocated
message_space_available
file_space_allocated
file_space_available
In this case, the denormalized Users table stores not only the actual information about a user but the
restraints as well, so in terms of functionality the table doesn’t fully correspond to its name.
Advantages Disadvantages
No need to create joins for queries that need a single Data inconsistencies are possible as a record value must b
record repeated
Example
Often, users send not only messages but attachments too. The majority of messages are sent either
without an attachment or with a single attachment, but in some cases users attach several files to a
message.
If a database has over three levels of master detail and you need to query only records from the lowest
and highest levels, you can denormalize your database by creating short-circuit keys that connect the
lowest-level grandchild records to higher-level grandparent records. This technique helps you reduce
the number of table joins when queries are executed.
Advantages Disadvantages
Example
Now let’s imagine that an email messaging service has to handle frequent queries that require data from
the Users and Messages tables only, without addressing the Categoriestable. In a normalized database,
such queries would need to join the Users and Categoriestables.
To improve database performance and avoid such joins, we can add a primary or unique key from
the Users table directly to the Messages table. This way we can provide information about users and
messages without querying the Categories table, which means we can do without a redundant table
join.
Drawbacks of database denormalization
Though denormalization seems like the best way to increase performance of a database and,
consequently, an application in general, you should resort to it only when other methods prove
inefficient. For instance, often insufficient database performance can be caused by incorrectly written
queries, faulty application code, inconsistent index design, or even improper hardware configuration.
Denormalization sounds tempting and extremely efficient in theory, but it comes with a number of
drawbacks that you must be aware of before going with this strategy:
When you denormalize a database, you have to duplicate a lot of data. Naturally, your database
will require more storage space.
Additional documentation
Every single step you take during denormalization must be properly documented. If you change
the design of your database sometime later, you’ll need to revise all rules you created before:
you may not need some of them or you may need to upgrade particular denormalization rules.
When denormalizing a database, you should understand that you get more data that can be
modified. Accordingly, you need to take care of every single case of duplicate data. You should
use triggers, stored procedures, and transactions to avoid data anomalies.
More code
When denormalizing a database you modify select queries, and though this brings a lot of
benefits it has its price − you need to write extra code. You also need to update values in new
attributes that you add to existing records, which means even more code is required.
Slower operations
Database denormalization may speed up data retrievals but at the same time it slows down
updates. If your application needs to perform a lot of write operations to the database, it may
show slower performance than a similar normalized database. So make sure to implement
denormalization without damaging the usability of your application.
As you can see, denormalization is a serious process that requires a lot of effort and skill. If you want to
denormalize databases without any issues, follow these useful tips:
1. Instead of trying to denormalize the whole database right away, focus on particular parts that
you want to speed up.
2. Do your best to learn the logical design of your application really well to understand what parts
of your system are likely to be affected by denormalization.
3. Analyze how often data is changed in your application; if data changes too often, maintaining
the integrity of your database after denormalization could become a real problem.
4. Take a close look at what parts of your application are having performance issues; often, you
can speed up your application by fine-tuning queries rather than denormalizing the database.
5. Learn more about data storage techniques; picking the most relevant can help you do without
denormalization.
There are three main reasons to normalize a database. The first is to minimize duplicate data, the
second is to minimize or avoid data modification issues, and the third is to simplify queries.
As we go through the various states of normalization we’ll discuss how each form addresses these
issues, but to start, let’s look at some data which hasn’t been normalized and discuss some potential
pitfalls.
I think once you understand the issues, you better appreciate normalization. Consider the following
table:
Notice that for each SalesPerson we have listed both the SalesOffice and OfficeNumber. There is
duplicate sales person data. Duplicated information presents two problems:
Consider if we move the Chicago office to Evanston, IL. To properly reflect this in our table, we need to
update the entries for all the SalesPersons currently in Chicago. Our table is a small example, but you
can see if it were larger, that potentially this could involve hundreds of updates.
These situations are modification anomalies. Database normalization fixes them. There are three
modification anomalies that can occur:
Insert Anomaly
There are facts we cannot record until we know information for the entire row. In our example we
cannot record a new sales office until we also know the sales person. Why? Because in order to create
the record, we need provide a primary key. In our case this is the EmployeeID.
Update Anomaly
In this case we have the same information in several rows. For instance if the office number changes,
then there are multiple updates that need to be made. If we don’t update all rows, then inconsistencies
appear.
Deletion Anomaly
Deletion of a row causes removal of more than one set of facts. For instance, if John Hunt retires, then
deleting that row cause us to lose information about the New York office.
The last reason we’ll consider is making it easier to search and sort your data. In the SalesStaff table if
you want to search for a specific customer such as Ford, you would have to write a query like
SELECT SalesOffice
FROM SalesStaff
Customer2 = ‘Ford’ OR
Customer3 = ‘Ford’
Clearly if the customer were somehow in one column our query would be simpler. Also, consider if you
want to run a query and sort by customer.
Our current table makes this tough. You would have to use three separate UNION queries! You can
eliminate or reduce these anomalies by separating the data into different tables. This puts the data into
tables serving a single purpose.