Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

14,222,536 members Sign in

Search for articles, questions, tips


articles Q&A forums stuff lounge ?

11 important database designing rules which I follow


Shivprasad koirala Rate this: 4.80 (111 votes)

25 Feb 2014 CPOL

This article will discuss about 11 important database designing rules.

Table of Contents
Introduction
Rule 1: What is the nature of the application (OLTP or OLAP)?
Rule 2: Break your data in to logical pieces, make life simpler
Rule 3: Do not get overdosed with rule 2
Rule 4: Treat duplicate non-uniform data as your biggest enemy
Rule 5: Watch for data separated by separators 
Rule 6: Watch for partial dependencies
Rule 7: Choose derived columns preciously
Rule 8: Do not be hard on avoiding redundancy, if performance is the key
Rule 9: Multidimensional data is a different beast altogether
Rule 10: Centralize name value table design
Rule 11: For unlimited hierarchical data self-reference PK and FK
Courtesy: Image from Motion pictures

Introduction
Before you start reading this article let me confirm to you I am not a guru in database designing. The below 11 points are what I
have learnt via projects, my own experiences, and my own reading. I personally think it has helped me a lot when it comes to DB
designing. Any criticism is welcome.

The reason I am writing a full blown article is, when developers design a database they tend to follow the three normal forms like a
silver bullet. They tend to think normalization is the only way of designing. Due this mind set they sometimes hit road blocks as the
project moves ahead.

If you are new to normalization, then click and see 3 normal forms in action which explains all the three normal forms step by step.

Said and done normalization rules are important guidelines but taking them as a mark on stone is calling for trouble. Below are my
own 11 rules which I remember on the top of my head while doing DB design.

Rule 1: What is the nature of the application (OLTP or OLAP)?


When you start your database design the first thing to analyze is the nature of the application you are designing for, is it
Transactional or Analytical. You will find many developers by default applying normalization rules without thinking about the nature
of the application and then later getting into performance and customization issues. As said, there are two kinds of applications:
transaction based and analytical based, let’s understand what these types are.

Transactional: In this kind of application, your end user is more interested in CRUD, i.e., creating, reading, updating, and deleting
records. The official name for such a kind of database is OLTP.

Analytical: In these kinds of applications your end user is more interested in analysis, reporting, forecasting, etc. These kinds of
databases have a less number of inserts and updates. The main intention here is to fetch and analyze data as fast as possible. The
official name for such a kind of database is OLAP.

In other words if you think inserts, updates, and deletes are more prominent then go for a normalized table design, else create a flat
denormalized database structure.

Below is a simple diagram which shows how the names and address in the left hand side are a simple normalized table and by
applying a denormalized structure how we have created a flat table structure.
Rule 2: Break your data into logical pieces, make life simpler
This rule is actually the first rule from 1st normal form. One of the signs of violation of this rule is if your queries are using too many
string parsing functions like substring, charindex, etc., then probably this rule needs to be applied.

For instance you can see the below table which has student names; if you ever want to query student names having “Koirala” and
not “Harisingh”, you can imagine what kind of a query you will end up with.

So the better approach would be to break this field into further logical pieces so that we can write clean and optimal queries.

Rule 3: Do not get overdosed with rule 2


Developers are cute creatures. If you tell them this is the way, they keep doing it; well, they overdo it leading to unwanted
consequences. This also applies to rule 2 which we just talked above. When you think about decomposing, give a pause and ask
yourself, is it needed? As said, the decomposition should be logical.

For instance, you can see the phone number field; it’s rare that you will operate on ISD codes of phone numbers separately (until
your application demands it). So it would be a wise decision to just leave it as it can lead to more complications.

Rule 4: Treat duplicate non-uniform data as your biggest


enemy
Focus and refactor duplicate data. My personal worry about duplicate data is not that it takes hard disk space, but the confusion it
creates.

For instance, in the below diagram, you can see “5th Standard” and “Fifth standard” means the same. Now you can say the data has
come into your system due to bad data entry or poor validation. If you ever want to derive a report, they would show them as
different entities, which is very confusing from the end user point of view.
One of the solutions would be to move the data into a different master table altogether and refer them via foreign keys. You can
see in the below figure how we have created a new master table called “Standards” and linked the same using a simple foreign key.

Rule 5: Watch for data separated by separators


The second rule of 1st normal form says avoid repeating groups. One of the examples of repeating groups is explained in the below
diagram. If you see the syllabus field closely, in one field we have too much data stuffed. These kinds of fields are termed as
“Repeating groups”. If we have to manipulate this data, the query would be complex and also I doubt about the performance of the
queries.
These kinds of columns which have data stuffed with separators need special attention and a better approach would be to move
those fields to a different table and link them with keys for better management.

So now let’s apply the second rule of 1st normal form: “Avoid repeating groups”. You can see in the above figure I have created a
separate syllabus table and then made a many-to-many relationship with the subject table.

With this approach the syllabus field in the main table is no more repeating and has data separators.

Rule 6: Watch for partial dependencies


Watch for fields which depend partially on primary keys. For instance in the above table we can see the primary key is created on
roll number and standard. Now watch the syllabus field closely. The syllabus field is associated with a standard and not with a
student directly (roll number).

The syllabus is associated with the standard in which the student is studying and not directly with the student. So if tomorrow we
want to update the syllabus we have to update it for each student, which is painstaking and not logical. It makes more sense to
move these fields out and associate them with the Standard table.

You can see how we have moved the syllabus field and attached it to the Standards table.

This rule is nothing but the 2nd normal form: “All keys should depend on the full primary key and not partially”.

Rule 7: Choose derived columns preciously

If you are working on OLTP applications, getting rid of derived columns would be a good thought, unless there is some pressing
reason for performance. In case of OLAP where we do a lot of summations, calculations, these kinds of fields are necessary to gain
performance.

In the above figure you can see how the average field is dependent on the marks and subject. This is also one form of redundancy.
So for such kinds of fields which are derived from other fields, give a thought: are they really necessary?

This rule is also termed as the 3rd normal form: “No column should depend on other non-primary key columns”. My personal
thought is do not apply this rule blindly, see the situation; it’s not that redundant data is always bad. If the redundant data is
calculative data, see the situation and then decide if you want to implement the 3rd normal form.

Rule 8: Do not be hard on avoiding redundancy, if


performance is the key

Do not make it a strict rule that you will always avoid redundancy. If there is a pressing need for performance think about de-
normalization. In normalization, you need to make joins with many tables and in denormalization, the joins reduce and thus increase
performance.

Rule 9: Multidimensional data is a different beast altogether


OLAP projects mostly deal with multidimensional data. For instance you can see the below figure, you would like to get sales per
country, customer, and date. In simple words you are looking at sales figures which have three intersections of dimension data.

For such kinds of situations a dimension and fact design is a better approach. In simple words you can create a simple central sales
fact table which has the sales amount field and it makes a connection with all dimension tables using a foreign key relationship.
Rule 10: Centralize name value table design
Many times I have come across name value tables. Name and value tables means it has key and some data associated with the key.
For instance in the below figure you can see we have a currency table and a country table. If you watch the data closely they actually
only have a key and value.
For such kinds of tables, creating a central table and differentiating the data by using a type field makes more sense.

Rule 11: For unlimited hierarchical data self-reference PK and


FK
Many times we come across data with unlimited parent child hierarchy. For instance consider a multi-level marketing scenario where
a sales person can have multiple sales people below them. For such scenarios, using a self-referencing primary key and foreign key
will help to achieve the same.

This article is not meant to say that do not follow normal forms, instead do not follow them blindly, look at your project's nature
and the type of data you are dealing with first.
Below is a video which explains the three normal forms step by step using a simple school table.

You can also visit my website for step by step videos on Design Patterns, UML, SharePoint 2010, .NET Fundamentals, VSTS, UML,
SQL Server, MVC, and lots more.

License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author


Shivprasad koirala
Architect https://www.questpond.com
India

Do not forget to watch my Learn step by step video series.

Learn MVC 5 step by step in 16 hours


Learn MVC Core step by step
Learn Angular tutorials step by step for beginners
Learn Azure Step by Step
Learn Data Science Step by Step
Step by Step Mathematics for Data Science
Learn Xamarin Mobile Programming Step by Step
Learn Design Pattern in 8 hours
Learn C# in ...
show more

Comments and Discussions


 

You must Sign In to use this message board.

Search Comments

First Prev Next

11 important database designing rules which I follow


XolaniG 19-Jun-17 21:08

Best article I have ever read, very nicely explained.

Thanks Shivprasad

Sign In · View Thread  

MS SQL Server developer download


Member 12541993 24-May-16 0:33

Hi,
Hide   Copy Code
I have just gone thru this video and explanation and I found that it was very useful.

I want to know the tools(MS SQL Server...) you have used in this video?
Is it MS SQL server Visual studio? I have normal Oracle SQL Developer v4.
But I want to practice & learn the same with the same sets of background you have used
in this video.

Can you please let me know the tools(with versions) required to implement the same and
also the appropriate website to download?

Thanks.
Sign In · View Thread  

nice
BillW33 2-Mar-16 7:37

A well written and useful article, although I don't agree with rule #10.

Just because the code works, it doesn't mean that it is good code.

Sign In · View Thread  

Greate
Aladár Horváth 18-Aug-15 1:27

Sign In · View Thread  

[My vote of 2] rule 10 is one of common database design mistakes.


seyyed hamed monem 21-Jun-15 21:53

hi,
rule 10 is one of common database design mistakes.
see:
https://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[^]
https://www.simple-talk.com/sql/database-administration/five-simple--database-design-errors-you-should-avoid/[^]
https://decipherinfosys.wordpress.com/2007/02/01/otlt-one-true-lookup-table/[^]
http://tonyandrews.blogspot.ca/2004/10/otlt-and-eav-two-big-design-mistakes.html[^]
http://sqlmag.com/database-administration/designing-performance-lookup-tables[^]
http://sqlblog.com/blogs/louis_davidson/archive/2010/11/20/one-domain-table-or-many.aspx[^]

Sign In · View Thread 5.00/5 (2 votes)  

My vote of 5
Pratik Bhuva 30-Mar-15 23:41

Great Article.
Thanks Sir.

Sign In · View Thread  

design
Member 11218475 18-Nov-14 1:02

hi!
I'm totally new to database design and really appreciate your article! i find myself in a dead end in my design project and
wonder weather the problem lies within the database structure itself cause i cannot see the way to implement my desired
structure in the SQL framework.
i'd love to get in touch and ask a simple question, but cannot find any contact details here. please get in touch!

Sign In · View Thread  

Rule #10 has to be put down


opc3 18-Sep-14 15:31

If you're designing a data model with the express purpose to gain simplicity in a .NET application class then you're doing it
wrong. If you want to abstract your "code type" table (which is a nonsensical idea for an entity to begin with, think that
through) then abstract the multitude of domain lookup tables you should be creating and maintaining with a "getCodes"
stored procedure that unions all the attributes together, however allow for the specialized attributes to be added to the
domain lookup tables later, e.g. the ISO3 country code that you may one day need to add to your country table. Consider
not oversimplifying your data model with conveniences that paint you into the proverbial corner while trying to pass them
off as Rules for database design. Rule # 10 is a classic anti-pattern.

Sign In · View Thread 5.00/5 (2 votes)  

Re: Rule #10 has to be put down


Anurag Gandhi 17-Aug-16 0:19

Yeah, while reading the article i was thinking that it conflicts my thought process.

When I see your comment, i got some more person who thinks like me.

Life is a computer program and everyone is the programmer of his own life.

Sign In · View Thread  

My vote of 5
Akiii_Lethal 25-Jul-14 0:03

Nice tips sir

Sign In · View Thread  

Good one
Shahriar Iqbal Chowdhury/Galib 8-Jun-14 22:28

informative as always ... and awesome too

Sign In · View Thread  

My vote of 5, but one doubt


Thava Rajan 28-Feb-14 0:21

a good article
is the 8th point acceptable for the large dataset too?

Sign In · View Thread 5.00/5 (1 vote)  

Good Article
AzuriVN 27-Feb-14 17:03

Thank you very much for your helpful article.


Can you tell me which software or tool you use to draw the table in the image?
It looks amazing and I like it

Sign In · View Thread  
My vote of 5
David Days 27-Feb-14 6:57

Really good outline of how to approach a project from the database side. I use many of these same principles, though more
from painful experience than having a written guide.

Sign In · View Thread  

here a question is here! what do you think? which decision you will made?
Phu Hiep DUONG 26-Apr-13 11:05

http://stackoverflow.com/questions/16226683/database-which-is-a-better-design-for-link-tables[^]

Thankyou

Sign In · View Thread  

Rule 8: Do not be hard on redundancy


Jan Zumwalt 30-Nov-12 7:19

I appreciate your well thought out topic.


I would add that redundancy is not always a bad thing
if it makes maintenance clear and and easy.

Sign In · View Thread  

My vote of 5
zeego 13-Nov-12 20:18

My Vote of 5. What an amazingly detailed & informative article, all information in a single article. Thanks a lot !

Sign In · View Thread  

My vote of 5
Xavi Rius 11-Sep-12 11:24

Great article, a really good job. Thanks

Sign In · View Thread  

My vote of 5
Jasmine2501 10-Sep-12 10:49

I agree with Item 10 actually, because I've seen it used to great effectiveness in a real application where maintaining a
whole table for each name-value type would have been nearly impossible.

However, I think that 7 could use a little work. You need to drive home the point that calculated columns are ok if they are
actually defined that way, rather than calculated and then stored. Point out the fact that if column X is supposed be the
product of Y and Z, if it's possible to store a number in X that is NOT the product of Y an Z, violating your rule, then it will
happen at some point and your app will break. Define the column as calculated from those other columns, don't fill it in
with a trigger or something. If you can't define it that way and have the DB enforce that definition, you risk the integrity of
that column.

Sign In · View Thread  
My vote of 2
Member 8528409 7-Sep-12 11:02

so so

Sign In · View Thread  

My vote of 4
brother.gabriel 28-Apr-12 11:45

I would give 5 if there were a few hints about how to "fix" an existing database that is replete with these kinds of errors.
Also, a brief explanation of terms such as "primary key" and "normal forms" at the top of the article would be very nice.

I give a 4 because the explanation of the database refinements is a worthy topic, the diagrams are pleasant, and the
authour is candid.

Also, I know Mr. Koirala is from India, but the English could use a little sprucing up!

Sign In · View Thread 5.00/5 (1 vote)  

My vote of 4
Uilleam 9-Apr-12 5:39

I want to encourage this sort of discussion as these are mostly very good points that frankly all should be at least aware of,
and able to argue one way or another. On the other hand, I too think #10 is really a BAD idea. There are plenty of examples
of similar structures but that alone is no reason to merge data - data management is about a lot more than simply structure
- it's relationships, and for example there are plenty of countries that share currency with others (e.g. the entire EU) . If you
must merge all country data then a better way would be a table of countries having fields of country code, currency code,
and currency name, and then provide indices on both code fields. That at least provides the benefit that you shouldn't have
a country added without both codes provided, but frankly I'd make them separate tables at least for the shared currency
issue.

Sign In · View Thread 5.00/5 (1 vote)  

Re: My vote of 4
Shivprasad koirala 10-Apr-12 0:58

yes i agree to what you are saying and i will be removing the point. You can see the below discussion where people
had raised contention about it and i second and agree it.

I will update with something better.

Thanks

My book .NET interview questions with 500 mostly asked questions in .NET world .NET Interview questions and answers

Sign In · View Thread  

My vote of 5
Manoj Kumar Choubey 5-Apr-12 7:45

nice

Sign In · View Thread  
Great
rajeshrisharma 4-Apr-12 3:25

Great i enjoyed it

Sign In · View Thread  

Refresh 1 2 Next »

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin   

Permalink Layout: fixed | fluid Article Copyright 2012 by Shivprasad koirala


Advertise Everything else Copyright © CodeProject, 1999-2019
Privacy
Cookies Web01 2.8.190627.1
Terms of Use

You might also like