Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 18

UNIT 2: DATA STORAGE APPROACHES

Contents
2.0. Introduction
2.1. Objectives
2.2 Fundamental Data Storage Concepts and Definitions
2.3 Logical, Physical, Conceptual Database Models
2.4 Data Modeling For Relational Databases
2.5 Types of Data Models
2.6 Summary
2.7 Answers to Check Your Progress
2.8 Glossary

2.0 INTRODUCTION

Accountants play a major role in data processing and storing. They must interact with system
analysts to help answer questions such as the following: what data should be entered and
stored by the organization? Who should have access to the data? Which data storage approach
should be used: manual file based or data base? How the data should be organized, updated,
stored, accessed, and retrieved? To answer these and related questions, accountants must
understand the data processing and storage concepts.

A company’s data is one of its most important resources. How ever, the mere existence of
relevant data does not guarantee its usefulness. An organization must have ready and easy
access to its data in order to function properly. Therefore accountants need understand how
data is organized and stored in an AIS and how the data can be accessed. In essence,
accountants need to know how to mange data for maximum corporate use.

2.1 OBJECTIVES

After completing this unit the reader should be able to


 explain how data is stored in files
 explain how data is stored in a data base
 describe main characteristics of database

1
 describe types of data models

2.2 FUNDAMENTAL DATA STORAGE CONCEPTS AND DEFINITIONS

2.2.0 Overview
A data base management system is a set of integrated programs designed to simplify the tasks
of creating and managing data. database management system integrate a collection of files
that are independent of application programs and are available to satisfy a number of different
processing needs. A database management system is really the means by which an
organization coordinates the disparate activities of its many functional areas. The database
management system, containing data related to all of an organization’s applications, supports
normal data processing needs and enhances the organization’s management activities by
providing data useful to managers. while in its strictest sense a database is a collection of
files, we will use the term database synonymously with database management system since
this has evolved as the normal meaning intended by the vast majority of computer users and
developers in using the term.

2.2.1 Objectives
After reading, this section the reader must be able to:
 explain the different types of data storage approaches

2.2.2 What are flat file and relational databases?


A database is a collection of data, which is organized into files called tables. These tables
provide a systematic way of accessing, managing, and updating data. A relational database is
one that contains multiple tables of data that relate to each other through special key fields.
Relational databases are far more flexible (though harder to design and maintain) than what
are known as flat file databases, which contain a single table of data.

To understand the advantages of a relational database, imagine the needs of two small
companies that take customer orders for their products. Company A uses a flat file database
with a single table named orders to record orders they receive, while Company B uses a
relational database with two tables: orders and customers.
customers.

2
When a customer places an order with Company A, a new record (or row) in the table orders
is created. Because Company A has only one table of data, all the information pertaining to
that order must be put into a single record. This means that the customer's general
information, such as name and address, is stored in the same record as the order information,
such as product description, quantity, and price. If customers place more than one order, their
general information will need to be re-entered and thus duplicated for each order they place.

Whenever there is duplicate data, as in the case above, many inconsistencies may arise when
users try to query the database. Additionally, a customer's change of address would require
the database manager to find all records in orders that the customer placed, and change the
address data for each one.

Company B is much better off with its relational database. Each of its customers has one and
only one record of general information stored in the table customers.
customers. Each customer's record
is identified by a unique customer code which will serve as the relational key. When a
customer orders from Company B, the record in orders need contain only a reference to the
customer's code, because all of the customer's general information is already stored in
customers.
customers.

This approach to entering data solves the problems of duplicate data and making changes to
customer information. The database manager need change only one record in customers if
someone changes addresses.

The database management system (DBMS) performs several functions such as:
defining the data
defining the relations among the data (whether the data structure is relational, object-
oriented, hierarchical, or network).
interfacing with the operating system for storage of the data on the physical media.
mapping each user’s view of the data (through subschema schema).

In the language of data base management system (DBMS), a SCHEMA is a complete


description of the configuration of record types and data items and the relationship among

3
them. The schema defines the logical structure of the database. The schema, therefore, defines
the organizational view of the data.
A subschema is a description of a portion of a schema. The DBMS maps each user’s view of
the data from sub schemas to the schema. A chief advantage of the DBMS is that it contains a
QUERY LANGUAGE which is a language much like ordinary language. A query language is
used to access a database and to produce an inquiry reports.

Learning Activity 1
1. What is a flat file?
________________________________________________________________________
________________________________________________________________________
2. What is a database management?
________________________________________________________________________
________________________________________________________________________

2.3 LOGICAL, PHYSICAL, CONCEPTUAL DATABASE MODELS

2.3.0 Overview
Data modeling is the act of exploring data-oriented structures. 
structures.  Like other modeling artifacts
data models can be used for a variety of purposes, from high-level conceptual models to
physical data models. 
models.  In this section, we discuss about the logical, physical and conceptual
data models.

2.3.1 Objectives
After reading, this section the reader must be able to:
 explain the logical data modeling
 describe the physical data modeling and
 explain conceptual data modeling

4
2.3.2 What is Data Modeling?
Data modeling is the act of exploring data-oriented structures. 
structures.  Like other modeling artifacts,
data models can be used for a variety of purposes, from high-level conceptual models to
physical data models. 
models.  From the point of view of an object-oriented developer data modeling
is conceptually similar to class modeling. With data modeling you identify entity types
whereas with class modeling you identify classes. 
classes.  Data attributes are assigned to entity types
just as you would assign attributes and operations to classes. 
classes.  There are associations between
entities, similar to the associations between classes – relationships, inheritance, composition,
and aggregation are all applicable concepts in data modeling.

Data modeling is different from class modeling because it focuses solely on data – class
models allow you to explore both the behavior and data aspects of your domain, with a data
model you can only explore data issues. 
issues.  Because of this focus data modelers have a tendency
to be much better at getting the data “right” than object modelers. The following are the three
types of data modeling:

 Conceptual data models.


models. These models, sometimes called domain models, are
typically used to explore domain concepts with project stakeholders. Conceptual data
models are often created as the precursor to LDMs or as alternatives to LDMs.
 Logical data models (LDMs).
(LDMs). LDMs are used to explore the domain concepts, and
their relationships, of your problem domain. This could be done for the scope of a
single project or for your entire enterprise. LDMs depict the logical entity types,
typically referred to simply as entity types, the data attributes describing those entities,
and the relationships between the entities.
 Physical data models (PDMs). PDMs are used to design the internal schema of a
database, depicting the data tables, the data columns of those tables, and the
relationships between the tables.

2.3.2.1 Logical And Physical Database Modeling

Although LDMs and PDMs sound very similar, and they in fact are, the level of detail that
they model can be significantly different. 
different.  This is because the goal for each diagram is

5
different – you can use an LDM to explore domain concepts with your stakeholders and the
PDM to define your database design. Figure1 presents a simple LDM and Figure2 a simple
PDM, both modeling the concept of customers and addresses as well as the relationship
between them. Both diagrams apply the Barker (1990) notation, Notice how the PDM shows
greater detail, including an associative table required to implement the association as well as
the keys needed to maintain the relationships. PDMs should also reflect your organization’s
database naming standards, in this case an abbreviation of the entity name is appended to each
column name and an abbreviation for “Number” was consistently introduced. A PDM should
also indicate the data types for the columns, such as integer and char (5). Although Figure 2
does not show them, lookup tables for how the address is used as well as for states and
countries are implied by the attributes ADDR_USAGE_CODE,
ADDR_USAGE_CODE, STATE_CODE,
STATE_CODE, and
COUNTRY_CODE.
COUNTRY_CODE.

  Figure 1.A simple logical data model.

  Figure 2.A simple physical data model.

An important observation about Figure1 and Figure2 is that it is not slavishly following
Barker’s approach to naming relationships. For example, between Customer and Address

6
there really should be two names “Each CUSTOMER may be located in one or more
ADDRESSES” and “Each ADDRESS may be the site of one or more CUSTOMERS”.
Although these names explicitly define the relationship it looks that they’re visual noise that
clutters the diagram. Therefore preferred simple names such as “has” and then trust readers to
interpret the name in each direction. Only add more information where it’s needed, in this
case it isn’t. However, a significant advantage of describing the names the way that Barker
suggests is that it’s a good test to see if you actually understand the relationship – if you can’t
name it then you likely don’t understand it.

Data models can be used effectively at both the enterprise level and on projects. Enterprise
architects will often create one or more high-level LDMs that depict the data structures that
support your enterprise, models typically referred to as enterprise data models or enterprise
information models. An enterprise data model is one of several critical views that your
organization’s enterprise architects will maintain and support – other views may explore your
network/hardware infrastructure, your organization structure, your software infrastructure,
and your business processes (to name a few). Enterprise data models provide information that
a project team can use both as a set of constraints as well as important insights into the
structure of their system.

Project teams will typically create LDMs as a primary analysis artifact when their
implementation environment is predominantly procedural in nature, for example they are
using structured COBOL as an implementation language. LDMs are also a good choice when
a project is data-oriented in nature; perhaps a data warehouse or reporting system is being
developed. However LDMs are often a poor choice when a project team is using object-
oriented or component-based technologies because the developers would rather work with
UML diagrams or when the project is not data-oriented in nature. When a relational database
is used for data storage project teams are best advised to create PDMs to model its internal
schema. The experience is that a PDM is often one of the critical design artifacts for business
application development projects.

2.3.2.2 Conceptual Models

7
Halpin (2001) points out that many data professionals prefer to create an Object-Role Model
(ORM), an example is depicted inFigure3
inFigure3,, instead of an LDM for a conceptual model. The
advantage is that the notation is very simple, something your project stakeholders can quickly
grasp, although the disadvantage is that the models become large very quickly. ORMs enable
you to first explore actual data examples instead of simply jumping to a potentially incorrect
abstraction – for example Figure3 examines the relationship between customers and addresses
in detail. For more information about ORM, visit http://www.orm.net/.

Figure 3.A simple Object-Role Model

The experience is that people will capture information in the best place that they know. As a
result typically they discard ORMs after they have finished with them. They sometimes use
ORMs to explore the domain with project stakeholders but later replace them with a more
traditional artifact such as an LDM, a class diagram, or even a PDM. As a “generalizing
specialist” (Ambler 2003b), someone with one or more specialties who also strives to gain
general skills and knowledge, this is an easy decision for them to make; it is known that this
information that they have just “discarded” will be captured in another artifact – a model, the
tests, or even the code – that they understand. A specialist who only understands a limited
number of artifacts and therefore “hands-off” their work to other specialists doesn’t have this
as an option. Not only are they tempted to keep the artifacts that they create but also to invest
even more time to enhance the artifacts. The experience is that generalizing specialists are
more likely than specialists to travel light.

2.4 DATA MODEILING FOR RELATIONAL DATABASES

8
2.4.0 Overview

Data modeling must be preceded by planning and analysis. Planning defines the goals of the
database, explains why the goals are important, and sets out the path by which the goals will
be reached. Analysis involves determining the requirements of the database. This is typically
done by examining existing documentation and interviewing users.

An effective data model completely and accurately represents the data requirements of the end
users. It is simple enough to be understood by the end user yet detailed enough to be used by a
database designer to build the database. The model eliminates redundant data, it is
independent of any hardware and software constraints, and can be adapted to changing
requirements with a minimum of effort.

Data modeling is a bottom up process. A basic model, representing entities and relationships,
is developed first. Then detail is added to the model by including information about attributes
and business rules.

The data model is one part of the conceptual design process. The other is the function model.
model.
The data model focuses on what data should be stored in the database while the function
model deals with how the data is processed. To put this in the context of the relational
database, the data model is used to design the relational tables. The functional model is used
to design the queries that will access and perform operations on those tables.

Data modeling is preceded by planning and analysis. The effort devoted to this stage is
proportional to the scope of the database. The planning and analysis of a database intended to
serve the needs of an enterprise will require more effort than one intended to serve a small
workgroup.

The information needed to build a data model is gathered during the requirements analysis.
Although not formally considered part of the data modeling stage by some methodologies, in
reality the requirements analysis and the ER diagramming part of the data model are done at
the same time.

9
2.4.1 Objectives

After reading this section the reader must be able to:

 Describe how a relational data model can be designed.

2.4.2 Requirements Analysis

The goals of the requirements analysis are:

 to determine the data requirements of the database in terms of primitive objects


 to classify and describe the information about these objects
 to identify and classify the relationships among the objects
 to determine the types of transactions that will be executed on the database and the
interactions between the data and the transactions
 to identify rules governing the integrity of the data

The modeler, or modelers, works with the end users of an organization to determine the data
requirements of the database. Information needed for the requirements analysis can be
gathered in several ways:

 Review of existing documents - such documents include existing forms and reports,
written guidelines, job descriptions, personal narratives, and memoranda. Paper
documentation is a good way to become familiar with the organization or activity you
need to model.
 Interviews with end users - these can be a combination of individual or group
meetings. Try to keep group sessions to under five or six people. If possible, try to
have everyone with the same function in one meeting. Use a blackboard, flip charts, or
overhead transparencies to record information gathered from the interviews.
 Review of existing automated systems - if the organization already has an automated
system, review the system design specifications and documentation

10
The requirements analysis is usually done at the same time as the data modeling. As
information is collected, data objects are identified and classified as entities, attributes, or
relationship; assigned names; and, defined using terms familiar to the end-users. The objects
are then modeled and analyzed using an ER diagram. The diagram can be reviewed by the
modeler and the end-users to determine its completeness and accuracy. If the model is not
correct, it is modified, which sometimes requires additional information to be collected. The
review and edit cycle continues until the model is certified as correct.

Three points to keep in mind during the requirements analysis are:

1. Talk to the end users about their data in "real-world" terms. Users do not think in
terms of entities, attributes, and relationships but about the actual people, things, and
activities they deal with daily.
2. Take the time to learn the basics about the organization and its activities that you want
to model. Having an understanding about the processes will make it easier to build the
model.
3. End-users typically think about and view data in different ways according to their
function within an organization. Therefore, it is important to interview the largest
number of people that time permits.

2.5 TYPES OF DATA MODELS

2.5.0 Overview

 This section discusses the following topics with respect to database models:
o Flat Files
o Hierarchical Database Model 
Model 
o Network Database Models
o ISAM
o Relational Database Model
o Object-Oriented Database Mode

2.5.1 Objectives

11
After reading, this section the reader must be able to:

o describe Flat Files


o describe Hierarchical Database Model 
Model 
o describe Network Database Models
o describe Relational Database Model
o describe Object-Oriented Database Mode

2.5.2 Flat File Database

In its most simple form, a flat-file database is nothing more than a single, large table (e.g., a
spreadsheet). A flat file contains only one record structure; there are no links between separate
records. Access to data is done in a sequential manner; random access is not supported.
Access times are slow because the entire file must be scanned to locate 
locate the desired data.
Access times can be improved if the data is sorted but this introduces the potential for error
(e.g., one or more records may be is filed). The 
The problems with a flat-file database include 1)
data redundancy; 2) data maintenance; and 3) data integrity. EXAMPLE. 
EXAMPLE.  An ‘orders’ file
might require fields for the order number, date, customer name, customer address, quantity,
part description, price, etc. 
etc.  In this example, each record must repeat the name and address of
the customer (data
(data redundancy).
redundancy).  If the customer’s address changed, it would have to be
changed in multiple locations (data
(data maintenance).
maintenance). If the customer name were spelled
differently in different orders (e.g., Acme, Acme Inc, Acme Co.) then the data would be
inconsistent (data
(data integrity).
integrity). (See the Normalization Exercise for a practical example of these
deficiencies). Flat file data structures are only viable for small data processing requirements 
requirements 

2.5.3 Hierarchical Database Model

The hierarchical and network database models preceded the relational model; today very few
commercial databases use either of these models. A hierarchical database is a series of flat-
files linked in structured 'tree' relationships 
relationships  IBM's IMS (Information Management System)
database, often referred to by the name of its proprietary language, DL/I (Data Language I), is
the only viable commercial hierarchical database still in use today, primarily on older

12
mainframe computers. The concept for the hierarchical database model was originally based
on a bill of materials (BOM) Data is represented as a series of parent/child relationships. This
concept 
concept  is fundamentally different from that used in relational model where data resides in a
collection of tables without any hierarchy and that are physically independent of each other
.In the hierarchical model, a database 'record' is a tree that consists of one or more groupings
of fields called 'segments'. Segments make up the individual 'nodes' of the tree (e.g., the
'customers' record may consist of 'customer' and 'order' segments). The model requires that
each child segment can be linked to only one parent and a child can only be reached through
its parent. The requirement for a one-to-many relationship between parent and child can result
in redundant data (e.g., 'orders' might be a child of 'customers' as well as a child of 'parts')   To
get around the data redundancy problem, data is stored in one place and referenced by links or
physical pointers in other places (e.g., the 'customers' record contains actual data in the
'orders' segment while the 'parts' record contains a pointer to the 'orders' data in 'customers') .
EXAMPLE: To create a sales report, you have to access 
access  ‘customers’ 
‘customers’  to get to ‘orders’ 
‘orders’ .This
is fundamentally different from the way a relational database operates; in a relational
database there is no hierarchy among tables and any table can be accessed directly or
potentially linked with any other table; THERE ARE NO HARD-CODED, PREDEFINED
PATHS AMONG THE DATA 
DATA  (Note that while a 
a  primary key-foreign key combination in a
relational database represents a logical relationship among data, it does not necessarily limit
the possible physical access paths through the data. In the hierarchical model, the link
established by the pointers is permanent and cannot be modified; IOW the links are hard-
coded into the data structure .The hard-coding makes the hierarchical model very inflexible;
a design originally optimized to work with the data in one way may be totally inefficient in
working with the data in other ways .In addition, the physical links make it very difficult to
expand or modify the database; changes typically require substantial rewriting efforts. 
efforts. ( See
the following diagram)

13
Salesperson
Hierarchical database model

Territory Samuel

Metro
West East

Territory West

Salesperson Sam Betty

2.5.4 Network Database Model

The network database model expands on the hierarchical model by providing multiple paths
among segments (i.e., more than one parent/child relationship). The network model was
standardized as the 
the CODASYL DBTG (Conference on Data System Languages, Data Base
Task Group) model. Although supporting multiple paths in the data structure eliminates some
of the inflexibility of the hierarchical model, the network model is not very practical. The
network model only supports simple network relationships that are implemented as 'chains'
connecting individual records. With no restrictions on the number of relations, the database
design can become overwhelmingly complex. The relational model provides the same

14
flexibility offered by the network model but is much easier to work with. The network model
is for all practical purposes obsolete.(see the following diagram

Network model
Customer Customer Customer
A B
C

Part number Part number Part number Part number


1234 2345 3456 4567

2.5.5 Relational Database Model

The theory behind the relational database model is discussed the section entitled
'Characteristics of a Relational Database'; this section focuses on the distinctions between the
relational model and other database models. In a relational database, the logical design is
independent of the physical design. Queries against a relational database management
system (RDBMS) are based on logical relationships and processing those queries does not
require pre-defined access paths among the data (i.e., pointers) . The relational database
provides flexibility that allows changes to the database structure to be easily accommodated.
Because the data reside in tables, the structure of the database can be changed without having
to change any applications that were based on that structure. EXAMPLE: You add a new field
for e-mail address in the customers table. 
table.  If you are using a non-relational database, 
database, you
probably have to modify the application that will access this information by including

15
'pointers' to the new data. With a relational database, the information is immediately
accessible because it is automatically related to the other data by virtue of its position in the
table. 
table.  All that is required to access the new e-mail field is to add it to a SELECT list. The
structural flexibility of a relational database allows combinations of data to be retrieved that
were never anticipated at the time the database was initially designed. In contrast, the
database structure in older database models is "hard-coded" into the application; if you add
new fields to a non-relational database, any application that access the database will have to
be updated. In practice, there is significant confusion as to what constitutes a relational
database management system (RDBMS).

2.5.6 Object-Oriented Database

The object-oriented database also referred to as the ‘post-relational’ database


model, 
model, addresses some of the limitations of the relational model .The
.The most significant
limitation of the relational model is its limited ability to deal with BLOBs.
BLOBs. Binary Large
Objects or BLOBs are complex data types such as images, spreadsheets, documents, CAD, e-
mail messages, and directory structures. At its most basic level, 'data' is a sequence of bits
('1s' and '0s') residing in some sort of storage structure. 'Traditional' databases are designed to
support small bit streams representing values expressed as numeric or small character strings.
Bit stream data is atomic;
atomic; it cannot be broken down into small pieces. BLOBs are large and
non-atomic data; they have parts and subparts and are not easily represented in a relational
database. There is no specific mechanism in the relational model to allow for the retrieval of
parts of a BLOB. A relational database can store BLOBs but they are stored outside the
database and referenced by pointers. The pointers allow the relational database to be searched
for BLOBs, but the BLOB itself must be manipulated by conventional file I/O methods.
Object-orient databases 
databases provide native support BLOBs. Unfortunately, there is no clear model
or framework for the object-oriented database like the one provided for the relational
database. Under the general concept of an object-oriented database, everything is treated as an
object that can be manipulated. Objects inherit characteristics of their class and have a set of
behaviors (methods) and properties that can be manipulated. The hierarchical notion of
classes and subclasses in the object-oriented database model replaces the relational concept
of atomic data types.
types. The object-oriented approach provides a natural way to represent the

16
hierarchies that occur in complex data. EXAMPLE, a Word document object consists of
paragraph objects and has method to ‘draw' itself. There are a limited number of commercial
object-oriented database systems available; mostly for engineering or CAD applications. In a
way, object-oriented concept represents a ‘Back to the Future’ approach in that it is very
similar to the old hierarchical database design. Relational databases are not obsolete and may
simply evolve by adding additional support for BLOBs. (See the following diagram).

17
18

You might also like