Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

CLASS 1: Database Modeling

Overview
In this lecture you will learn the basics of data (or database) model design. Also
you will examine the evolution of database modeling, which lead to today’s
relational database model. In the end you will be introduced to applications,
which will in turn affect the database modeling process, since different types of
applications use a database in different ways. Consequently, in order to figure
out an appropriate database design strategy, you must have a general idea of
the kind of applications your database will serve.

Database model or database?


To start things off, you need to understand the difference between a database
model design and a database. A database is a collection of information. It
consists of a physical file you set up on a computer. On the other hand, a
database model is a concept used to create the database. Using an architectural
analogy you can view the database model as a blueprint for how data is stored,
which basically means that when you work on the model you end up with a
“pretty picture” on a piece of paper. Consequently, a database would be the
physical creation of the data model on a computer, or sticking with the analogy
constructing the building from the blueprint.
There are numerous, precise explanations as to what exactly a database or
database model is. A database can be defined as an organized structured object
stored on a computer consisting of data and metadata. Data is the actual
information stored in the database such as customer names and addresses.
Metadata describes the structure of the data, such as field length or datatype. As
an example, let’s say that we need to add a Price field for the Products table of
an imaginary Store database. Now, the price of the product stored in the field is
data. The description of the field, which for example limits the entry to only
positive integers less than 1000, is metadata.
The approach to the structuring and organization of data cannot be arbitrary,
since the final solution has to allow for an efficient and easy way to retrieve and
change the stored data. Therefore, the ordered set of data is created by using
database modeling.
Before going into more details, it is appropriate to discover how different
database modeling techniques have developed over the past years to
accommodate efficiency, in terms of both data retrieval and data changes.
The Evolution of Database Modeling
The most-widely used database model today is the relational model. The various
data models that came before (such as the hierarchical database model and the
network database model) were partial solutions to the never-ending problem of
how to store data efficiently. By examining the roots you can understand the
critical problems solved with the relational database model. Therefore, it is
essential that you know how the different data models evolved into the relational
database model as it is today.
The First Databases

Believe it or not, the first databases are actually predecessors to written language,
found amongst ancient tribes in the Middle East. In the local villages, shepherds were
taking care of rural community flocks belonging to their fellow tribesmen. To keep things
in order, the members of the tribe needed a way to maintain and manipulate the
ownership of animals. Rather than branding the individual animals (aka, putting
nametags), they decided on a different scheme where each member of the tribe had a
set of baked clay tokens. Since they were not concerned about any animal in particular,
a possession of one token represented ownership of any one animal.

The purpose of the tokens was not a way to measure and store economical value. Their
function was determining ownership by establish a record keeping system. They would
“change hands” only after a trade occurred, unlike regular transactions where you “pay
before you receive”. They were updated when a lamb grew to become a ram, deleted
when an animal was eaten or died and new tokens were inserted when new lambs
were born in spring. As you will soon see these activities actually represent all the basic
operations you would expect in a database.

Regulating the tokens involved an illiterate man, passing the flock through a gate and
matching one token to one animal (establishing a one-to one relationship). Then he
would hand the tokens to the head of the tribe, who would then update the tokens
based on rules involving dowry payments, oral traditions, familial relations, shares
owned last year, etc. (Managing records with SQL.) The tokens were then stored in soft
clay bottles that were pinched shut to ensure that they were not tampered with once
accounts were settled. (Locking the database.)

The evolution of database modeling occurred when each database model


improved upon the previous one. The initial solution was virtually no database
model at all: the file system.

The File System Model


Using a file system model means that no modeling techniques are applied. The
database is stored in flat files in a file system. The term “flat file” is a way of
describing a simple text file that has no structure; there are no commas and no
new lines, only a large string of characters. The data items are found based on
their position in the file. Also data can be stored in individual or multiple files.
Note: The easiest way to understand the file system is to look at the Windows operating
system. You can examine files in the file system by searching through the Windows
Explorer which is a good representation of a file system structure.
Any searching through flat files for data has to be explicitly programmed. Any
relationships between different flat files would also have to be programmed and
have limited capability. File systems store data, but that's about it. There is no
efficient access to data, the structure is limited to directories (folders in Windows)
and the files cannot be simultaneously accessed by more than one user.
Data vs. Information

Although used interchangeably in everyday discussions, information and data are two
distinct pieces of the database puzzle. A collection of raw data (facts) does not help
anyone to make a decision. Following the illustration of the primitive token database,
just by counting the tokens one could not arrive to any kind of relative information, the
result being 5 rams and 14 lambs. These facts have to be expanded to a higher-level
abstraction in order to have any significant meaning.

For example, if it is known that the tokens belong to a tribe member called Nasim, then
by combining the data: Nasim, 5 rams, 14 lambs, there is an immediately useful,
although low-level information “Nasim owns 5 rams and 14 lambs.” Furthermore, if
Nasim collected his data for several years, then he could move up to a more conceptual
level and produce more information such as “In the second year of the drought, the
number of lambs born is less than in the following three years.” As more data
accumulates, the derived information becomes more qualitative as well as quantitative
with attempts to understand the past and predict the future.

To examine the file model here’s a simple example. Suppose that you want to set
up a database for the books in your home library. You’ve moved out on your own
a while back, so your library contains only 13 books. To begin with, you don’t
want to overcomplicate things so for each book you decide to store information
on: ISBN number, Author, Title, Publisher and Publisher Contact. After going
through all the books you display the results in a table called LIBRARY_HOME.

ISBN Author Title Publishe Publisher Reta


r Contact il
Pric
e
1-1111- Gladwell, Blink Penguin 375 Hudson $7.5
1111-1 Malcolm Books Street, New 0
York
0-99- Leather, Private Three Soi Rambutri $8.0
999999-9 Stephen Dancer Elephant 7, Bangkok 0
s
0-91- Hosseini, The Kite Bloomsb 36 Soho $6.5
335678-7 Khaled Runner ury Square, 0
London
0-91- Severin, Odinn’s Pan 20 New Wharf $8.0
045678-5 Tim Child Books Road, London 0
0-103- Severin, King’s Man Pan 20 New Wharf $9.0
45678-9 Tim Books Road, London 0
0-12- Gallo, Napoleon Pan 20 New Wharf $9.0
345678-9 Max Books Road, London 0
0-99- Murakami Kafka on Vintage 18 Poland $7.5
777777-7 , Haruki the Shore Road, 0
Auckland
0-555- Jin, Ha The Crazed Vintage 18 Poland $7.5
55555-9 Road, 0
Auckland
0-11- Jin, Ha Waiting Vintage 18 Poland $6.0
345678-9 Road, 0
Auckland
0-12- McCullou Caesar Avon 10 East 53rd $6.0
333433-3 gh, Books Street, New 0
Collen York
0-321- McCullou October Pocket 5 Broadway $6.5
32132-1 gh, Horse Books Avenue, New 0
Collen York
0-55- Vaite, Frangipani Arrow 14 Milsons $9.0
123456-9 Celestine Books Point, Sydney 0
0-123- Mehran, Pomengran Arrow 14 Milsons $7.5
45678-0 Marsha ate Soup Books Point, Sydney 0

Note: In reality displaying a flat-file database as a table would require some


programming. Nevertheless, you can use Microsoft Word or Microsoft Excel when creating
this type of database, if only to resolve the display issues.

It is more than obvious that the creation of this table requires no special skills.
Actually for such a simple database, Microsoft Word has enough power to
manage all the data. Furthermore, such a simple, flat database, consisting of a
single table does not require much knowledge of database theory and could be
easily managed in the beginning. However as you add more and more books to
the library, the table starts growing and soon many problems start to appear.

Redundancy
The first issue of large flat databases is the unnecessary repetition of data. Even
in the small LIBRARY_FLAT database you can see repetitive information when
the book has the same Publisher. The Publisher/Publisher Contact columns
repeats several times for few of the books, and you only have 13 books in the
library. How much repetition is there when the number increases to thousands?
Multiple value problems
What happens with the database when a book is written by multiple authors?
There are several possible solutions, each lacking in one or several ways.
1. Multiple authors can be accommodated with multiple rows, one row
for each author.
2. Multiple authors can be displayed with multiple columns in a single
row.
3. All the authors’ names can be included in one column of the table.
The problem with the multiple-row choice is that all of the data about a book must
be repeated as many times as there are authors of the book—an obvious case of
redundancy. The multiple column approach presents the problem of guessing
how many Author columns you will ever need, and creates a lot of wasted space
(empty fields) for books with only one author. It also creates major programming
headaches. The third choice is to include all authors' names in one cell, which
can lead to trouble of its own. For example, it becomes more difficult to search
the database for a single author. Worse yet, how would you create an
alphabetical list of the authors in the table?

You might also like