Professional Documents
Culture Documents
CLASS 1: Database Modeling
CLASS 1: Database Modeling
Overview
In this lecture you will learn the basics of data (or database) model design. Also
you will examine the evolution of database modeling, which lead to today’s
relational database model. In the end you will be introduced to applications,
which will in turn affect the database modeling process, since different types of
applications use a database in different ways. Consequently, in order to figure
out an appropriate database design strategy, you must have a general idea of
the kind of applications your database will serve.
Believe it or not, the first databases are actually predecessors to written language,
found amongst ancient tribes in the Middle East. In the local villages, shepherds were
taking care of rural community flocks belonging to their fellow tribesmen. To keep things
in order, the members of the tribe needed a way to maintain and manipulate the
ownership of animals. Rather than branding the individual animals (aka, putting
nametags), they decided on a different scheme where each member of the tribe had a
set of baked clay tokens. Since they were not concerned about any animal in particular,
a possession of one token represented ownership of any one animal.
The purpose of the tokens was not a way to measure and store economical value. Their
function was determining ownership by establish a record keeping system. They would
“change hands” only after a trade occurred, unlike regular transactions where you “pay
before you receive”. They were updated when a lamb grew to become a ram, deleted
when an animal was eaten or died and new tokens were inserted when new lambs
were born in spring. As you will soon see these activities actually represent all the basic
operations you would expect in a database.
Regulating the tokens involved an illiterate man, passing the flock through a gate and
matching one token to one animal (establishing a one-to one relationship). Then he
would hand the tokens to the head of the tribe, who would then update the tokens
based on rules involving dowry payments, oral traditions, familial relations, shares
owned last year, etc. (Managing records with SQL.) The tokens were then stored in soft
clay bottles that were pinched shut to ensure that they were not tampered with once
accounts were settled. (Locking the database.)
Although used interchangeably in everyday discussions, information and data are two
distinct pieces of the database puzzle. A collection of raw data (facts) does not help
anyone to make a decision. Following the illustration of the primitive token database,
just by counting the tokens one could not arrive to any kind of relative information, the
result being 5 rams and 14 lambs. These facts have to be expanded to a higher-level
abstraction in order to have any significant meaning.
For example, if it is known that the tokens belong to a tribe member called Nasim, then
by combining the data: Nasim, 5 rams, 14 lambs, there is an immediately useful,
although low-level information “Nasim owns 5 rams and 14 lambs.” Furthermore, if
Nasim collected his data for several years, then he could move up to a more conceptual
level and produce more information such as “In the second year of the drought, the
number of lambs born is less than in the following three years.” As more data
accumulates, the derived information becomes more qualitative as well as quantitative
with attempts to understand the past and predict the future.
To examine the file model here’s a simple example. Suppose that you want to set
up a database for the books in your home library. You’ve moved out on your own
a while back, so your library contains only 13 books. To begin with, you don’t
want to overcomplicate things so for each book you decide to store information
on: ISBN number, Author, Title, Publisher and Publisher Contact. After going
through all the books you display the results in a table called LIBRARY_HOME.
It is more than obvious that the creation of this table requires no special skills.
Actually for such a simple database, Microsoft Word has enough power to
manage all the data. Furthermore, such a simple, flat database, consisting of a
single table does not require much knowledge of database theory and could be
easily managed in the beginning. However as you add more and more books to
the library, the table starts growing and soon many problems start to appear.
Redundancy
The first issue of large flat databases is the unnecessary repetition of data. Even
in the small LIBRARY_FLAT database you can see repetitive information when
the book has the same Publisher. The Publisher/Publisher Contact columns
repeats several times for few of the books, and you only have 13 books in the
library. How much repetition is there when the number increases to thousands?
Multiple value problems
What happens with the database when a book is written by multiple authors?
There are several possible solutions, each lacking in one or several ways.
1. Multiple authors can be accommodated with multiple rows, one row
for each author.
2. Multiple authors can be displayed with multiple columns in a single
row.
3. All the authors’ names can be included in one column of the table.
The problem with the multiple-row choice is that all of the data about a book must
be repeated as many times as there are authors of the book—an obvious case of
redundancy. The multiple column approach presents the problem of guessing
how many Author columns you will ever need, and creates a lot of wasted space
(empty fields) for books with only one author. It also creates major programming
headaches. The third choice is to include all authors' names in one cell, which
can lead to trouble of its own. For example, it becomes more difficult to search
the database for a single author. Worse yet, how would you create an
alphabetical list of the authors in the table?