Object Persistence Formats

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

OBJECT PERSISTENCE FORMATS

Each of the object persistence types is described in this section. Files are electronic
lists of data that have been optimized to perform a particular transaction. For
example, Figure 9-1 shows a customer order file with information about
customers’ orders, in the form in which it is used, so that the information can be
accessed and processed quickly by the system.
A database is a collection of groupings of information, each of which is related to
each other in some way (e.g., through common fields). Logical groupings of
information could include such categories as customer data, information about an
order, product information, and so on. A database management system (DBMS) is
software that creates and manipulates these databases (see Figure 9-2 for a
relational database example). Such end-user DBMSs as Microsoft Access support
small-scale databases that are used to enhance personal productivity, whereas
enterprise DBMSs, such as DB2, Versant, and Oracle, can manage huge volumes
of data and support applications that run an entire company. An end-user DBMS is
significantly less expensive and easier for novice users to use than its enterprise
counterpart, but it does not have the features or capabilities that are necessary to
support mission-critical or large-scale systems.
Sequential and Random Access Files
From a practical perspective, most object-oriented programming languages support
sequential and random access files as part of the language.2 In this section, we
describe what sequential access and random access files are.3 We also describe
how sequential access and random access files are used to support an application.
For example, they can be used to support master files, look-up files, transaction
files, audit files, and history files.
Sequential access files allow only sequential file operations to be performed (e.g.,
read, write, and search). Sequential access files are very efficient for sequential
operations that process all of the objects consecutively, such as report writing.
However, for random operations, such as finding or updating a specific object, they
are very inefficient. On the average, 50 percent of the contents of a sequential
access file will have to be searched before finding the specific object of interest in
the file. They come in two flavors: ordered and unordered
An unordered sequential access file is basically an electronic list of information
stored on disk. Unordered files are organized serially (i.e., the order of the file is
the order in which the objects are written to the file). Typically, new objects simply
are added to the file’s end.
Ordered sequential access files are placed into a specific sorted order (e.g., in
ascending order by customer number). There is overhead associated with keeping
files in a particular sorted order. The file designer can keep the file in sorted order
by always creating a new file each time a delete or addition occurs, or he or she
can keep track of the sorted order via the use of a pointer, which is information
about the location of the related record. A pointer is placed at the end of each
record, and it “points” to the next record in a series or set. The underlying data/ file
structure in this case is the linked list data structure demonstrated in the previous
chapter.
Random access files allow only random or direct file operations to be performed.
This type of file is optimized for random operations, such as finding and updating a
specific object. Random access files typically have a faster response time to find
and update operations than any other type of file. However, because they do not
support sequential processing, applications such as report writing are very
inefficient. The various methods to implement random access files are beyond the
scope of this book. There are times when it is necessary to be able to process f les
in both a sequential and random manner. One simple way to do this is to use a
sequential file that contains a list of the keys(the field in which the file is to be kept
in sorted order) and a random access file for the actual objects. T is minimizes the
cost of additions and deletions to a sequential f le while allowing the random file to
be processed sequentially by simply passing the key to the random f le to retrieve
each object in sequential order. It also allows fast random processing to occur by
using only the random access file, thus optimizing the overall cost of file
processing. However, if a file of objects needs to be processed in both a random
and sequential manner, the developer should consider using a database (relational,
object-relational, or object-oriented) instead.
There are many different application types of files—e.g., master files, lookup files,
transaction files, audit files, and history files. Master files store core information
that is important to the business and, more specifically, to the application, such as
order information or customer mailing information. They usually are kept for long
periods of time, and new records are appended to the end of the file as new orders
or new customers are captured by the system. If changes need to be made to
existing records, programs must be written to update the old information.
Lookup files contain static values, such as a list of valid ZIP codes or the names of
the U.S. states. Typically, the list is used for validation. For example, if a
customer’s mailing address is entered into a master file, the state name is validated
against a lookup file that contains U.S. states to make sure that the operator entered
the value correctly.
A transaction file holds information that can be used to update a master file. The
transaction file can be destroyed after changes are added, or the file may be saved
in case the transactions need to be accessed again in the future. Customer address
changes, for one, would be stored in a transaction file until a program is run that
updates the customer address master file with the new information.
For control purposes, a company might need to store information about how data
change over time. For example, as human resources clerks change employee
salaries in a human resources system, the system should record the person who
made the changes to the salary amount, the date, and the actual change that was
made. An audit file records before and after images of data as they are altered so
that an audit can be performed if the integrity of the data is questioned.
Sometimes files become so large that they are unwieldy, and much of the
information in the file is no longer used. The history file (or archive file) stores
past transactions (e.g., old customers, past orders) that are no longer needed by
system users. Typically the file is stored off-line, yet it can be accessed on an as-
needed basis. Other files, such as master files, can then be streamlined to include
only active or very recent information.
Relational Databases
A relational database is the most popular kind of database for application
development today.
A relational database is based on collections of tables with each table having a
primary key—a field or fields whose values are unique for every row of the table.
The tables are related to one another by placing the primary key from one table
into the related table as a foreign key (see Figure 9-3). Most relational database
management systems (RDBMS) support referential integrity, or the idea of
ensuring that values linking the tables together through the primary and
foreign keys are valid and correctly synchronized. For example, if an order-entry
clerk using the tables in Figure 9-3 attempted to add order 254 for customer
number 1111, he or she would have made a mistake because no customer exists in
the Customer table with that number. If the RDBMS supported referential
integrity, it would check the customer numbers in the Customer table, discover that
the number 1111 is invalid, and return an error to the entry clerk. The clerk would
then go back to the original order form and recheck the customer information.
Tables have a set number of columns and a variable number of rows that contain
occurrences of data. Structured query language (SQL) is the standard language for
accessing the data in the tables. SQL operates on complete tables, as opposed to
the individual rows in the tables. Thus, a query written in SQL is applied to all the
rows in a table all at once, which is different from a lot of programming languages,
which manipulate data row by row. When queries must include information from
more than one table, the tables first are joined based on their primary key and
foreign key relationships and treated as if they were one large table. Examples of
RDBMS software are Microsoft SQL Server, Oracle, DB2, and MySQL.
To use a RDBMS to store objects, objects must be converted so that they can be
stored in a table. From a design perspective, this entails mapping a UML class
diagram to a relational database schema.
Object-Relational Databases
Object-relational database management systems (ORDBMSs) are relational
database management systems with extensions to handle the storage of objects in
the relational table structure.
This is typically done through the use of user-defined types. For example, an
attribute in a table could have a data type of map, which would support storing a
map. This is an example of a complex data type. In pure RDBMSs, attributes are
limited to simple or atomic data types, such as integers, floats, or chars.
ORDBMSs, because they are simply extensions to their RDBMS counterparts, also
have very good support for the typical data management operations that business
has come to expect from RDBMSs, including an easy-to-use query language
(SQL), authorization, concurrency-control, and recovery facilities. However,
because SQL was designed to handle only simple data types, it too has been
extended to handle complex object data. Currently, vendors deal with this issue in
different manners. For example, DB2, Informix, and Oracle all have extensions
that provide some level of support for objects.
Many of the ORDBMSs on the market still do not support all of the object-oriented
features that can appear in an object-oriented design (e.g., inheritance). As
described in Chapter 8, one of the problems in supporting inheritance is that
inheritance support is language dependent. For example, the way Smalltalk
supports inheritance is different from C11’s’ approach, which is different from
Java’s approach. Thus, vendors currently must support many different versions of
inheritance, one for each object-oriented language, or decide on a specific version
and force developers to map their object-oriented design (and implementation) to
their approach. Like RDBMSs, a mapping from a UML class diagram to an object-
relational database schema is required.

You might also like