Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 51

1

LECTURE ONE
DATABASE MANAGEMENT SYSTEMS

OUTLINE
1. Introduction
2. History of Database Management System
3. Significance of Database Management System
4. Roles in Database environment
5. Database Languages
6. Course Outline

1.1 Introduction
- Database systems are arguably the most important development in the field of software
engineering. The database, itself is now the underlying framework of the information system, and has
fundamentally influenced the way many organizations operate.
Nevertheless owing to the proliferation and simplicity of modern databases, several users
(novices, non-teacher) are creating databases and applications without the requisite knowledge and
skill, leading to the development of ineffective and inefficient systems. This is evidenced by the
increasing numbers of cases of software failures, crises, depression (if you like recession), and all the
ugly tales we hear today.

- Therefore, this course COS 321, is intended to provide the opportunity to sufficiently explore
the concepts of database design and management in order to evolve the necessary skills for the
design of new systems and for resolving the inadequacies of existing systems. [I hope this doesn ’t
sound too ambitious]. To this end, and in the main, attempts shall be made to introduce and discuss
the theory(ies) behind databases and also provide methodology(ies) for database design, which the
students can verify by whatever means, laboratory or otherwise.
- To introduce database systems, in the first instance, it is necessary to consider some
justifications for databases. In this vein, we attempt to understand “what is ” from “what is not ”.
Imagine operating a business without knowing who the customers are, what products to sell, details
of personnel, debtors, and creditors, etc.

- The above requirements presuppose that businesses have to keep the relevant data and much
more. Besides, data availability is critical to decision making. Business information systems help
businesses to use information as an organizational resource. Central to systems for decision making
and/or information processing are the issues of the collection, storage, aggregation, manipulation,
and management of data.

- For reasons of clarity, we differentiate between data and information as follows: Data are raw
facts, which imply that the facts have not yet been processed to reveal their meaning, while
information is the result of processing data to reveal its meaning. Furthermore, a database is
considerably a collection of related data. Also, the term database system can be described as a
collection of application programs; that interact with the database. While a database application is
simply a program that interacts with the database at some point in its execution. It is important to
note these terminologies in the parlance of database systems.
2

- Having figured out that a database is a shared collection of logically related data, and a
description of this data, designed to meet the information needs of an organization, it becomes
acceptable to define a database management system as software system that enables users to define,
create, maintain, and control access to the database.

- Fundamentally, some of the uses of database systems include but not limited to the following:
- Purchases from supermarkets
- Purchases/transactions using credit cards
- Bookings; and as a holiday from travel agents
- Using the library; and other catalogue-based resources
- Taking insurance, or other policies
- Internet usage; Amazon, Google searches, Facebook etc
- Studying at university; students information system, examination information system,
transcript processing, etc.
- To realize the above objectives, and even more, the Database Management System (DBMS),
being the software that interacts with the users ’ application programs and the database, provides the
following facilities;
- Mechanisms for defining the database; usually through a Data Definition Language (DDL),
which allows users to specify the data types and structures and the constraints on the data to be
stored in the database.

- Mechanism for data manipulation; usually through a Data Manipulation Language (DML),
which allows users to insert, update, delete, and retrieve data from the database. Owing to the
centralized nature of modern data storage systems, the DML offers a general inquiry facility to data,
which is called a query language. The query language alleviates the problems with the traditional file-
based systems (to be discussed subsequently) where the use has to work with a fixed set of queries or
there is a proliferation of programs, giving major software management problems. The most common
query language is the Structured Query Language (SQL), which we shall study in detail.

- Mechanism for access control: The DBMS, for example provides for the following:
i. a security system: which prevents unauthorized access to the database.
ii. the integrity system: which maintains the consistency of stored data.
iii. a concurrency control system: which allows shared access of the database
iv. a recovery system: which restores the database to a previous consistent state following a
hardware or software failure. These days we hear of replication systems and servers,
redundancy, cloud compiling, etc.
v. a user-accessible catalog: which contains description of the data in the database.
The database approach info prog. Can be graphically elicited as follows:
3
Data Entry
and Report
Sales
application
Sales program DBMS

Database
Data Entry
Contracts and Report
Contracts
application
program

1.2 HISTORY OF DATABASE MANAGEMENT SYSTEM


Originally, most businesses were able to keep track of all necessary data with the help of
manual file system. Organisations of data within file folders was a function of the data ’s expected use.
The contents were logically related. For example a doctor ’s office might contain patient data, one file
for each patient describing his medical history. A personnel manager might organise personnel data
by category of employment; clerical, technical, sales, administration, etc., each file is properly
labelled/classified. Because the data collection was relatively small, organisation ’s managers had few
reporting requirements and the manual system served its role as data repository.
However, as the organisations grew and reporting requirements became more complex,
keeping track of data in the manual system became more difficult, time consuming and cumbersome.
With this it became less likely that the data would serve to generate useful information. The creating
of necessary reports and responding to questions during occasional spot checks especially by agents
outside the organisation was unsatisfactory because efficient data access was not in place.
In fact some business managers face governmental imposed reporting requirements that take
weeks of intense efforts each quarter even when a well designed manual system was used.
Consequently, the pressure mounted led to the use of computer to keep track of the data and produce
required reports.
Consequently, a new kind of profession known as data processing DP specialist had to be
hired (or grown) who created the necessary computer file structures and often wrote the software that
managed the data within the structure as well as the application programs that produced required
reports. Thus numerous home grown computerized file systems were born.
Initially, the computer file systems resemble the manual files used earlier. The description of
computer file requires a special vocabulary just like every other discipline to enable its practitioners to
be precise in communication. Some of the basic computer file vocabularies include data, field, record,
file, etc.
The DP specialist wrote programs that produced very useful reports. The reports were
impressive and the ability to perform complex data search produced the materials on which sound
decisions could be based. But as the number of files grew due to increased reporting requirements,
the demand for DP specialists and programming skill grew faster. The job became too much for one
person and additional hands had to be engaged to handle different aspects of this work. Thus while
the DP work grew, the DP specialist evolved into a manager now managing the evolved DP
department.
4
The traditional file-based systems are generally regarded as the immediate ancestors/predecessors of
the database system. The file-base system is a collection of application programs that perform
services for the end-users such as the production of reports; in a way by which each program defines
and manages its own data.

Owing to the peculiar nature of traditional file-based systems, as highlighted above, a number of
limitations are associated with the approach, which include;

o separation and location of data: which creates access problems when data is isolated in separate
files
o -Duplication of data: owing to the fact that each program manages its own data. In a
decentralized organisation, each department managing its own data definitely results in
uncontrolled duplication of data.
o Data dependence: owing to the fact that the application code defines the physical structure
and storage of the data files and records. This creates manipulative difficulties.
o Incompatible file formats: the fact that different application programs cannot share some data
because of the differences in file formats, such as COBOL, C, etc.
o Fixed queries/proliferation of application programs: new and different queries have to be
written for each type of request or operation: different jokes for different folks.

- Although, traditional file-based systems still exist in some specific areas, their limitations
provided fresh impetus for the development of Database Management Systems (DBMS), which
according to several texts/literature, originated in the 1960s.

- The development of DBMS from 1960s can be classified as follows:


(1) The Generalised Update Access Method (GUAM):
This is a software developed by Worth American Aviation (NAR, non Rockwell International), as
an attempt to be able to handle and manage the vast amount of information, which President
Kennedy’s Apollo Moon-kinding Project World generate. The GUAM system is based on the
hierarchical model in which smaller computers come together as parts of larger computers,
until the final product is assembled: an upside-down tree.

(2) Information Management System (IMS):


The system was borne out of the collection between IBM and NAR in the mid-1960s. IMS is still
the main hierarchical DBMS used by most large mainframe installations.

(3) Integrated Data Store (IDS):


This is yet another development of the mid-1960s, developed by General Electric. This work
was headed by one of the early pioneers of database systems, Charles Bachmann. The
evolution of IDS led to the development of a new type of database system known as network
DBMS.

(4) Network DBMS:


This system was developed partly to address the need to represent more complex data
relationships than could be modeled with hierarchical structures, and partly to impose a
5
database standard. To establish such standards were the efforts of the Conference on Data
Systems Language (CODASYL), comprising representations of the US Government and the
world of business and commerce formed a List Processing Task Force in 1965, which was
subsequently renamed the Data Base Task Group (DBTG) in 1967. The mandate of the Task
Group was to define standard specification for an environment that would allow database
creation and data manipulation. To this end, an initial draft report was submitted in 1969, and
the first definitive report in 1971, termed the DBTG proposal.

(5) The DBTG Proposal:


This proposal for database system, identified three components;
- The network schemes: which deals with the logical organization of the entire database
as seen by the DBA which includes a definition of the database name, the type of each
record, and the components of each record type;
- The subscheme – the part of the database seen by the user or application program;
- A data management language – to define the data characteristics and the data
structure, and to manipulate the data.

- For standardization, the DBTG specified 3 distinct languages;


- a Scheme Data Definition Language (DDL), which enables the DBA to define the schema
- A subschema DDL, which allows the application programs to define the parts of the
database they require.
- A Data Manipulation Language (DML) – to manipulate data.

- Although the above report was not formerly adopted by ANSI, a number of systems have
emerged following the DBTG proposal. These systems are known as CODASYL or DBTG systems. The
CODASYL and hierarchical approaches represented the first-generation of DBMSs. Nevertheless, these
systems shared some fundamental disadvantages which include;
- requirement of complex program to be written to answer simple queries based on
navigational record-oriented access;
- Minimal data independence; and
- absence of widely accepted theoretical foundation.

(6) The Relational Data Model:


The relational data model, based on the influential paper by E.f. Codd in 1970, is a bold attempt
to address the disadvantages of the former approaches. The first commercial products, based on this
model, were seen in the late 1970s and the early 1980s. These include the IBM ’s late 1970 ’s System R
Project, which was designed to prove the practicality/implementation of the relational model by
providing an implementation of its data structures and operations, and led to two major
developments:
- The development of SQL, which is the standard language for relational DBMSs; and
- The production of various commercial relational DBMS products during the 1980s, for
example; DB2, and SQL/DS from IBM and Oracle from Oracle Corporation.
Today, there are numerous relational DBMSs for various computer platforms. Including multiuser
relational DBMSs like INGRES II from Computer Associates, Informix from Informix Software while PC-
6
based relational DBMSs are Access and ForPro from Microsoft, Paradox from Corel Corp, InterBase
and BDE from Borland, and R:Base from R-Base Tech.
In effect, Relational DBMSs (RDBMSs) are regarded as the second-generation DBMSs.
Nevertheless, relational model still showed some shortcomings warranting further research and
efforts for their improvement. Such efforts include the Entity-Relationship Model presented by Peter
Chen in 1976, which has become widely accepted technique/methodology for database design.

(7) Object-Oriented Models:


In response to the increasing complexity of database application, two systems emerged. These
include the Object-Oriented DBMS (OODBMS) and the Object-Relational DBMS (ORDBMS), whose
evolution represents the third-generation DBMSs. The OO data model is, of course, based on the
components and principles of Object-oriented design. These are described in APIE; where an object is
an abstraction of a real-world entity; attributes are used to describe the properties of an object;
Objects with similar characteristics are grouped in classes; classes are organized in a class hierarchy;
while inheritance is the ability of an object within the class hierarchy to inherit the attributes and
methods of the classes above it.

(8) Object/Relational and XML:


These are newer data models deemed to support more complex data representations.
Nonetheless, they evolve from relational models, and include the extended relational data model
(ERDM), which adds many of the OO models features within the inherently simpler relational
database structure. DBMS based on ERDM are described as object/relational database management
system (O/R DBMS).

- It is tempting to continue the discussion by looking at the future data models [Coronel, et al. p.
40, 2011] but this section is becoming too lengthy but before we proceed by briefly looking at the
remaining sections; suffice to mention that 1990s also saw the rise of the Internet, the three-tier
client-server architecture and the demand to allow corporate databases to be integrated with web
applications.
The late 1990s (1998) saw the development of XML (eXtensible Markup Language), which has
had profound effect on many aspects of IT, including database integration, graphical interfaces,
embedded systems, distributed systems and database systems.
Currently, most major DBMS vendors provide what is called date warehousing solutions. Data
warehouses are specialized DBMSs, which have evolved, making it possible to store data drawn from
several data sources, possibly maintained by different operating units of an organization. Such
systems provide comprehensive data analysis facilities to allow strategic decisions to be made based
on, for example, historical trends. [We may be lucky to discuss this concept sometimes towards the
end]. Another example of the recent development of specialized DBMSs is the Enterprise Resource
Planning (ERP) system, which is an application layer built on top of a DBMS that integrates all the
business functions of an organization, such as manufacturing, sales, finance, marketing, shipping,
invoicing, and human resources. Popular ERP systems are SAP R/3 from SAP and Peoplesoft from
Oracle.
Have you heard of SISs?

1.3 SIGNIFICANCE OF DATABASE MANAGEMENT SYSTEMS


7

- A DBMS performs several important functions, and has several promising potential advantages.
In the main, the functions of the DBMS guarantee the integrity and consistency of the data in the
database. Most of the functions are transparent to end users, while some others can be achieved
through the use of the DBMS. The functions include; data dictionary management, data storage
management, data integrity management, data transformation and presentation, security
management, multi-user access control, backup and recovery management, database access
languages and application programming interfaces (API), database communication interfaces, and a
host of others.
Comments [Students are required to explain these].

- The advantages of the DBMS are in line with the functions they perform. These advantages
include; control of data redundancy, data consistency, more information from the same amount of
data, data sharing, improved data integrity, improved security, enforcement of standards, economy of
scale, balance of conflicting requirements, improved data accessibility and responsiveness, increased
productivity, improved maintenance through data independence, increased concurrency, improved
backup and recovery services.

- Unfortunately, there are some Challenges associated with the implementation of the DBMS.
These include; complexity – the functionality of DBMSs, gives rise to complex piece of software
demanding that designers, developers, administrators and end users must understand the
functionality to take advantage of the system. The size of storage, costs in terms of the DBMS,
associated hardware, and conversion; together with performance and higher impact of failure also
add to the disadvantages of DBMS.

1.4 ROLES IN DATABASE ENVIRONMENT

- These are four distinct types of people that participate in the DBMS environment. Data and
database administrators, database designers, application developers, and the end users.

- The data and database administration are the roles generally associated with the management
and control of a DBMS and its data. Respectively, the Data Administrator (DA) is responsible for the
management of the data resource, including database planning, development and maintenance of
standards, policies and procedures, and conceptual/logical database design. The DA consults with
and advises senior managers, ensuring that the direction of database development will ultimately
support corporate objectives. On the other hand, the Database Administrator (DBA) is responsible for
the physical realization of the database, including physical database design and implementation,
security and integrity control, maintenance of the operational system, and ensuring satisfactory
performance of the application for users. The technical requirement of the DBA is higher than that of
the DA. Some organizations do not distinguish between the two.

The database designers can be of two categories; the logical and the physical DB designers.
- The former identifies the data (ie the entities and attributes), the relationship between the data,
and the constraints on the data. This demands understanding of the business rules. The
physical DB designer deals with the physical realization of the logical database. This involves
8
mapping the logical database design into set of tables and integrity constraints, selecting specific
storage structures and access methods, in addition to designing adopting appropriate security
measures required on the data.

- The application developers: deal with the development of application programs that provide
required functionality for the end-users. They work from/with specifications produced by system
analyst. Each program contains statements that request the DBMS to perform some operation on the
database.

- End-users are the clients for the database, which has been designed and implemented, and is
being maintained to serve their information needs. End-users can be classified as naïve users or
sophisticated users depending on their degree of competence and/or expense.

CONCLUSION

To conclude this introduction (which of need is long), we have to call the following to mind;
- Data are raw facts. Information is the result of processing data to reveal its meaning.
Accurate, relevant, and timely information is the key to good decision making, and good decision
making is the key to organizational survival in a global environment.

- Data are usually stored in a database. To implement a database and to manage its content,
you need a database management system (DBMS). The DBMS serves as the intermediary between
the user and the database. The database contains the data collected and the make data, data about
data.

- Database design defines the database structure and is required to be designed well. Databases
evolved from manual and their computerized file systems. In a file system, data are stored in
independent files, each requiring its own data management programs. The approach is now largely
outmoded, but acts as reference for DBMS improvements.
9
1.5 COURSE OUTLINE
- The titles to be discussed in this course, CS 321 (Database Management System I), are in
complication with NUC’s requirements. However, consideration is given to the ongoing curriculum
improvement efforts in the Department, which seek to address the requirements of international
standards and best practices. The course content, as specified by NUC, as follows:

1 Database Management systems


2 Review of basic concepts
3 Functions and components of DBMS
4 File design and access path
5. Future directions in DBMS
6. Programming and Application Development in D-base environment.

Recommended Texts

- Database Systems: A practical Approach to Design, Implementation, and Management.


By: Thomas Conngly and Carolyn Begg (3rd Edition and Newer) -> (5th Edition, 2011)

- Database Principles: Fundamentals of Design, Implementation, and Management. By Carlos


Coronel, Steven Morrris, and Petr Rob. 9th Edition.
1
LECTURE TWO

Review of Basic Concepts and Terminologies


Outline
- Database Architecture
- Database Structure
- [Database Moved to (3)] => Database Languages
- Data Models
- Relational Database Models

Introductory Comments
- The basic concepts and terminologies pertain to all the relevant aspects of database and the
management systems. In the regular parlance or language of database management systems, these
concepts will normally, range from issues of conceptualization of the database, analysis, modeling,
design, implementation, and management.

- Notably, this section of the course fundamentally cuts across all the sections of database
systems. We shall limit the discussions on the topics outlined above.

2.1 Database Architecture


- The database architecture is the set of specifications, rules, and processes that dictate how
data is stored in the database and how data is accessed by components of a system. It includes data
types, relationships, and naming conventions.
- The database architecture describes the organization of all database objects and how they
work together. It affects integrity, reliability, scalability, and performance.
- The database architecture involves anything that defines the nature of the data, the structure
of the data, or how the data flows.
- Another view of database architecture is in terms of its application logic. In this case, the
database architecture can be distinguished by examining the way application logic is distributed
throughout the system. Application logic consists of three components, presentation logic, processing
logic, and storage logic.
- The presentation logic component is responsible for formatting and presenting data on the
user’s screen.
- The processing logic component handles data processing logic, business rules logic, and data
management logic.
- While the storage logic component is responsible for the storage and retrieval from actual
devices, such as a hard drive or RAM.
- Specific database architectures are implemented in database environment that are appropriate
for their realization. By determining where and how the components are processed, we can get a
good idea of what type of architecture and subtype we are dealing with, such as one-tier, 2-tier,
client-server, or N-tier client-server. As a matter of fact, there are different tiers of architectures that
are used to implement the database system.
- In this course, emphasis is placed on multi-user DBMS architecture. These include;
teleprocessing, fileserver, and client-server.
1
(a) Teleprocessing
The Teleprocessing is the traditional architecture for multi-user systems. In this architecture,
there exists a single central processing unit (CPU) and a number of terminals. All the processing is
performed within the same physical computer housing the CPU. The user terminals (aka dumb
terminals) are incapable of functioning on their own, and are cabled to the central computer.
Messaging from the terminals to the CPU, and vice versa, is through communications control
subsystem of the operating system to the user ’s application program, which in turn uses the services
of the DBMS.
Unfortunately, this architecture placed a tremendous burden on the central computer, which
not only had to run the application programs and the DBMS, but also had to carry out a significant
amount of work on behalf of the terminals. We can all imagine the other problems associated with
this approach including; availability, performance, security, etc. Nevertheless, thanks to the advances
in the development of high performance PCs and networks.
Terminals
CPU
Terminals

Central computer

Figure 1: Teleprocessing architecture.

(b) File-Server Architecture


In this architecture, the processing is distributed about the networks, which includes; LAN,
MAN, and WAN. The file-server holds the files required by the applications and the DBMS. However,
the applications and the DBMS run on each workstation, requesting files from the file server when
needed. The file server, therefore, acts simply as a shared hard disk.

Workstation 2

Workstation 1 Workstation n

Network
Medium

Data Data
(files returned)
Requests Response

Database
File-server
Figure 2: File-server Architecture
1
This approach can generate a significant amount of network traffic. The shortfalls includes;
large amounts of network traffic, the requirement of a full copy of the DBMS on each workstation,
and the issues of concurrency, recovery, and integrity control become more difficult as many DBMS
may be accessing the same files.

(c) Client-Server Architecture


- This architecture was developed to overcome the disadvantages of the first two approaches.
Client-server refers to the way in which software components interact to form a system. The client
process requests (the resources), which are served by a server. The process can be on different hosts
in different locations.

- In the context of the database, the client manages the user interface and the application logic,
acting as a sophisticated workstation on which to run database applications. The client takes the
user’s requests, checks the syntax and generates database requests in SQL or another database
language appropriate to the application logic. It then transmits the message to the server, waits for a
response, and formats the response for the end-user. The server accepts and processes the database
requests, then transmits the result back to the client.
- The processing involves checking authorization, ensuring integrity, maintaining system catalog,
and performing query and update processing. There is also provision for concurrency and recovery
controls.
- The advantages of client-server architecture include: wider access to existing database,
increased performance, reduction in hardware costs and communication costs, increased consistency,
and easy conformance to the open-systems architecture.

NB: The above discussion is not intended to be ignorant of distributed computing, web-based
systems, cloud computing, and recent developments, some of which will be discussed subsequently.
It is intended to open up reasoning in the area of database architectures, while keeping the section as
short as possible.

2.2 DATABASE STRUCTURE

- This section, like its predecessor, further introduces more fundamental concepts in database
systems, by precisely introducing those found in database structures. Apparently, to store
information in a computer, consideration must be given to the pieces of information to be stored.
This is usually in terms of its size and what type of information is to be stored.
- The first consideration is about the required fields. Fields are the categories of information
that the database is going to store. For example, a university that issues transcripts to graduated
students. To produce a transcript database, relevant fields would include; student Name, RegNo,
Programme Type, Department, Courses, Year-of-Enrolment, Year of Graduation, etc.
Once a decision is reached with respect to the relevant fields, the next phase is to start
collecting the data.
- All the information for one person (or thing), is a collection of related fields is called a record.
Thus, each student in the transcript database would have a record in the database.
1
- Similarly, a collection of related records make up a table in the database. Tables are also
known as files, if you are using a flat file database.
- It is important to note that most database programs allow, or rather require, that each field in
the database be associated with a type. The type of information indicates what data type is to be
stored in that field. Common field/data types include; integers, float, character, dates, Boolean. More
advanced databases allow for multimedia objects, such as pictures, sound and video.
- Field types are used to enforce a basic type of validation (consistency) in that the database
does not allow an item of a different type to be stored in a field of another type, say, a text in a date
field. Field types also facilitate sorting, as the database “understands ” the contents of the field.
- A certain field usually called the key field or primary key is the field that uniquely identifies
each record in the database. It significantly produces a logical structure for the database. We shall
discuss more about this when we begin to design our databases.
- Fields are also associated with indexes. An index (just like the index in a book) is an extra bit
added on to the database to help the database program find records quickly. Superfluous use of
indexes is discouraged owing to the overhead involved when creating the records. Nevertheless,
indexing of key fields and those used in searching or sorting is encouraged.

2.3 DATABASE LANGUAGES

- The database language is a generic term referring to a class of languages used for defining and
accessing databases. A particular database language will be associated with a particular database
management system.
- There are two distinct classes of database languages: those that do not provide complete
programming facilities and are designed to be used in association with some general-purpose
programming language (the host language), and those that do provide complete programming
facilities (database programming languages).
- The products adopting the approach of incomplete programming facilities seek to minimize
host-language programming by the provision of fourth generation language (4GL) facilities.
- A database language must provide for both logical-schema specification and modification (data
description) and for retrieval and update (Data manipulation). In line with the CODASYL network
database standard, some products, complying with the standard, treat data description manipulation
distinctly using the Data Description Language (DDL) and the Data Manipulation Language (DML).
These are sometimes regarded as the data sublanguage.
- The DDL is used to specify the database schema, and can be defined as a language that allows
the DBA or user to describe and name the entities, attributes, and relationships required for the
application, together with any associated integrity and security constraints. While the DML is used to
both read and update the database, and can be defined as a language that provides a set of
operations to support the basic data manipulation operations on the data held in the database.
- The compilation of DDL statements results in a set of tables stored in special files collectively
called the system catalog. The system catalog integrates the meta data, which in turn is the data that
describes objects in the database and makes it easier for those objects to be accessed or manipulated.
The meta data contains definitions of records, data items, and other objects that are of interest to
users or are required by the DBMS. The terms data dictionary and data directory are also used to
describe the system catalog.
1
- With respect to DML, data manipulation operations usually include; data insertion,
modification, retrieval, or deletion from the database. One of the main functions of the DBMS is to
provide support for data manipulation such that the user can construct statements for data
manipulation. The part of the DML that involves data retrieval is called a query Language. A query
language is a high-level special-purpose language used to satisfy diverse requests for the retrieval of
data held in the database. Furthermore, DMLs can be classified as either procedural or non-
procedural depending on their underlying retrieval constructs. The procedural DMLs allow the user to
tell the system what data is needed and exactly how to retrieve the data while the non-procedural
DMLs allow the user to state what data is needed rather than how it is to be retrieved.
- DMLs provide users with considerable data independence, by freeing the user from having to
know how data structures are internally implemented and what algorithms are required to retrieve
and possibly transform the data. RDBMS usually include some form of non-procedural language for
data manipulation. For example SQL, or QBE (Query-by-Example). Non-procedural DMLs are
normally easier to use and learn than the procedural.
- We finish the section by looking at 4GLs. In actual fact, there is still some confusion as to what
constitutes a 4GL. In essence, a 4GL is non-procedural. The user defines what is to be done, not how.
A 4GL relies largely on much higher-level components known as fourth-generation tools.
Nevertheless, 4GLs are known as productivity enhancing tools. They encompass the following;
- Presentation language, such as query language and report generators. SQL and QBE are
examples.
- Speciality Languages, such as Spreadsheet and Database languages.
- Application generators that define, insert, update, and retrieve data from the database to build
applications.
- Very high-level languages that are used to generate application code.

Other types of 4GLs are; Form generators, Report generators, Graphics generators, and Application
generators.

2.4 DATA MODELS

- The terms data models and database models are often used interchangeably. Nevertheless,
there is a subtle difference between the two terminologies.
- A data model can be described as an integrated collection of concepts for describing and
manipulating data, relationships between data, and constraints on the data in an organization. On the
other hand, a database model is mainly used to refer to the implementation of a data model in a
specific database system.
- The essence of data modeling, which bridges the gap between real-world objects and the
database that resides in the computer, stems from the fact that designers, programmers, and end-
users view data in different ways. The different views of the same data can lead to database designs
that do not reflect an organisation’s actual operation, thus failing to meet end-user needs and data
efficiency requirements.
- Significantly, a model is a representation of `real world ’ objects and events, and their
associations. It is an abstraction that concentrates on the essential, inherent aspects of an
organization and ignores the accidental properties. Generally, the scale of representation using a
model is usually to the extent that conclusions made on models are applicable to real-world scenarios.
1
Thus, a data model represents the organisation itself, and is required to provide the basic concepts
and notations that will allow database designers and end-users unambiguously and accurately to
communicate their understanding of the organizational data together with their association(s).
- While it is a matter of common knowledge that hardly any two designers can produce the same
model to solve the same problem, what matters is that correct model has to meet all the end-user
requirements. A data model, ready for implementation, should contain the following components.
1. A description of the data structure that will store the end-user data
2. A set of enforceable rules to guarantee the integrity of the data, and
3. A data manipulation methodology to support the real-world data transformation.
- In order to understand the importance of data models, imagine building a house without a
blueprint (a building plan). It is highly unlikely to create a good database without first creating an
appropriate data model. It is not possible to live in a building plan, one cannot draw the required data
out of the data model, which is simply an abstraction. Importantly, data models facilitate interaction
among the designer, the application programmer, and the end-user. Essentially, a data model can
foster improved understanding of the organization for which the database design is developed; data
models are a communication tool.
- There are about three common related data models. They are, the external data model, the
conceptual data model, and the internal data model, which reflects the ANSI-SPARC architecture.
- The external data model is used to represent each user ’s view of the organization, which is
sometimes called the universe of Discourse (UoD).

Note: ANSI – American National Standard Institute – establishing industrial


standard
SPARC – Scalable Processor Architecture – Cort/Performance design
consideration
- The conceptual data model is used to represent the logical (or community) view that is DBMS
dependent, and;
- The internal data model is used to represent the conceptual schema in such a way that it can
be understood by the DBMS.
- Besides the models that bear on the ANSI-SPARC architecture, other data models proposals
include;

(i) Object-based Data Models:


These models use concepts such as entities, attributes, and relationships, which are regarded in
some literature as the basic building blocks of all data models. In any case, an entity is a distinct object
(a person, place, thing, concept, event) in the organization that is to be represented in the database.
An attribute is a property that describes some aspect of the object that is intended for recording.
While a relationship is an association between entities.
Some of the more common types of object-based data model are;
- Entity-Relationship (E-R)
- Semantic
- Functional
- Object-oriented
- While the E-R model is very popular for database design, the object-oriented model extends
the definition of an entity to include not only the attributes that describe the state of the object but
1
also the actions that are associated with the object, that is, its behaviour. The object is, therefore,
said to encapsulate both state and behaviour [we shall treat E-R modeling subsequently].

(ii) Record-Base Data Models


In this model, the database consists of a number of fixed-format records possibly of differing
types. Each record type defines a fixed number of fields, each typically of a fixed length. There are
three principal types of record-based logical data model; the relational data model, the network data
model, and hierarchical data model.
- The hierarchical and network data models were developed long before the relational data
models, and are associated with traditional file processing concepts. The relational data model is
based on the concept of mathematical relations. In the relational model, data and relationships are
represented as tables, each of which has a number of columns with a unique name. For example, a
relational schema for part of examination system, showing student details;

StudentN FName LName Sex DOB Dept. …


o
.
.
.

(iii) Physical Data Models:


These models describe how data is stored in the computer, representing information such as
record structures, record orderings, and access paths. There are not as many physical data models as
logical data models, the most common ones are the unifying model and the frame model.

(iv) Conceptual Model


- The conceptual schema is usually regarded as the `heart ’ of the database design, in a three-
level architecture. This model supports all the external views and is, in turn, supported by the internal
schema. The conceptual schema is required to be a complete and accurate representation of the data
requirements of an organization. Otherwise, some relevant organizational information will either be
misrepresented or missing entirely resulting in difficulty in implementing some of the external views.
- Conceptual modeling, or conceptual database design, is the process of constructing a model of
the information use in an enterprise that is independent of implementation details, such as the target
DBMS, application programs, programming languages, or any other physical considerations. In some
literature, conceptual models are also treated/regarded as logical models.

2.5 RELATIONAL DATABASE MODELS


- The relational model was first proposed by E.F. Codd in his seminal paper published in 1970
and titled “A relational model of data for large shared data banks ”. The relational model ’s objections
were specified as follows;
(i) To allow a high degree of data independence, such that application programs must not be
affected by modifications to the internal data representation, particularly by changes to file
organizations, record orderings, or access paths.
1
(ii) To provide substantial grounds for dealing with data semantics, consistency, and redundancy
problems. This is by virtue of the concept of normalisation (to be discussed), which was introduced in
the seminal paper, and pertains to relations that have no repeating groups.
(iii) To enable the expansion of set-oriented data manipulation languages. Set-oriented model was
proposed earlier than relational model in 1968 by D.L. Childs in a paper titled “Feasibility of a set-
theoretical data structure”.
- The interests and investments in the spin offs of the relational model must have spiraled
beyond the conjunctures of its originators to the extent that Relational Database Management
System (RDBMs) have become the dominant data-processing software in use today. From the
development of the prototype relational DBMS, system R by IBM in 1976, which was deemed to prove
the practicality of the relational model by providing an implementation of its data structures and
operators, while also showing implementation concerns of transaction management, concurrency
control, recovery technique, query optimization, data security and integrity, human factors, and user
interfaces. Furthermore, system R project led to two major projects;
(i) The development of the Structured Query Language (SQL), which has become the formal
International Organisation for Standardization (ISO) and the default standard language for RDBMSs:
impetus for its popularity.
(ii) The production of various commercial RDBM products during the late 1970s and the 1980; e.g.
DB2 and SQL/DS from IBM and Oracle from Oracle Corporation.
- Along side the development of system-R was that of INGREs (Integrated Graphics Retrieval
System) project by University of California Berkeley.
- Now there are several hundreds of commercial RDBMS for both mainframe and PC
environment, already stated in the history section: PC-based RDBMS, are Office Access and Visual
FoxPro from Microsoft, InterBase for Code hear, and R:Base from R:base Technologies.

The Relational Data Structure:


In the relational model, relations are used to hold information about the objects to be
represented in the database. A relation is a table with columns and rows. Thus, a relation is
represented as a two-dimensional table in which the rows of the table correspond to individual
records and the table columns correspond to attributes. Attributes can appear in any order and the
relation will not change, and will convey the same meaning.
- An RDBMS requires only that the database be perceived by the user as tables. However, this
perception applies only to the logical structure of the database: that is, the external and conceptual
levels of the ANSI-SPARC architecture. It does not apply to the physical structure of the database,
which is implemented using a variety of storage structures [what we will discuss later].
- Having mentioned attributes, an attribute is a named column of a relation. For example,
fName, LName, RegNo, Sex, DOB, BranchNo, etc. Other terminologies of interest in relational model
are;
- Domain – which is a set of allowable values for one or more attributes. Domains are important
features of the relational model; every attribute in a relation is defined on a domain, while domains
may differ for each attribute, two or more attributes may be defined on the same domain. The
domain concept is important because it allows the user to define the meaning and source of values
that attributes can hold in a common place. As a result, more information is available to the system
when it undertakes the execution of a relational operation, and operations that are semantically
1
incorrect can be avoided. For example, a student name cannot be compared with a telephone
number etc.

Illustration:
Attribute Domain Name Meanings Domain Definition
BranchNo Branch Numbers Set of all possible Char: size 25
Street Street Names branch name Date, range
DOB Date of Birth Set of all street 1-Jan, 1970
Names in Nsukka format: dd-mm-yy
Possible values of
staff birth dates

- Tuple – a tuple is a row of a relation. Precisely the elements of a relation are the rows or tuples
in the table. The structure of a relation, together with a specification of the domains and any other
restriction on possible values, is sometimes called its intension, which is usually fixed, unless the
meaning of a relation is changed to include additional attributes. The tuples are called the extension
(or state) of a relation, which changes over time.
- Degree – the degree of a relation is the number of attributes it contains. This implies that an n-
tuple relation contain n values. Where a relation of one degree is called a unary relation or one-tuple,
two is called binary, three ternary, and so on. The degree of a relation is a property of the intension of
the relation.
- Cardinality – the cardinality of a relation is the number of tuples it contains. Cardinality
changes as tuples are added or deleted. The cardinality is a property of the extension of the relation
and is determined from the particular instance of the relation at any given moment. Finally, we look
at the definition of a relational database.
- Relational database – This is thus a collection of normalized relations with distinct relation
name. Implying that a relational database consists of relations that are appropriately structured. This
appropriateness is normally called normalization. This paradigm we shall discuss subsequently.
- Owing to the variety of these terminologies in different texts, we present the alternatives to
the ones already defined.
Formal terms Alternative 1 Alternative 2
Relation Table File
Tuple Row Record
Attribute Column Field

- Furthermore, we note that as far as the concept of mathematical relation is concerned, the
Cartesian product of, say, two sets for instance D1 and D2, is a set of all ordered pairs such that the
first element is a member of D1 and the second element is a member of D2. Any subset of this
Cartesian product is a relation. For example, 1 relation R should be such that
R = {(x,y) \ x  D1, y  D2}

Demonstration:
D1 = {2, 4} and D2 = {1, 3, 5}
D1 x D2 = {(2, 1), (2, 3), (2,5), (4,1), (4,3), (4,5)}
1
And R can be;
R = {(2,1), (4,1)} coming from the rule, which says
R = {(x, y) \ x  D1, y  D2, and y = 1}

- In terms of database relations, a relation schema is a named relation defined by a set of


attribute and domain name pairs. Let A 1, A2 , …, An be attributes with domains D1, D2, …, Dn. Then the
set {A1 : D1, A2 : D2, …, An : Dn} is a relation schema. A relation R defined by a relation schema S is a set
of mappings from the attribute names to their corresponding domains.
Thus, relation R is a set of n-tuples:
(A1 :d1, A2 : d2, …, An, dn)  d1  D1, d2  D2, …, dn  Dn. Each element in the n-tuple consists of an
attribute and a value for that attribute.
Recall that in a relation, a table, the attributes are usually the column headings, and the tuples,
the rows, having the form (d1, d2, …, dn), where each value is taken from the appropriate domain.
Thus, we consider a relation in the relational model as any subset of the Cartesian product of the
domains of the attributes.
A table is simply a physical representation of such a relation. Any specific occurrence in
relation, row entry, is referred to as a relation instance.
- Similarly, a relational database has a schema. In this case, a relational database schema is a set
of relation schemas, each with a distinct name. If R 1, R2, …, Rn are a set of relation schemas, then we
can write the relational database schema, or simply relational schema, R, as;
R = {R1, R2, …, Rn}.

NB: We shall study properties of relations when we treat ER modeling prior to the section on
normalization.
- Finally, we can assert that we have picked on most of the concepts and terminologies
pertaining to database architecture, structure and languages. Together within those of data models
and relational database models. More concepts will be revealed in the subsequent sections alongside
their meanings as they become necessary.
2
LECTURER THREE
FUNCTIONS AND COMPONENTS OF DBMS
Outline:
- Database environment
- Functions of a DBMS
- Components of a DBMS
- Web Services and Services Oriented Architectures

3.1 DATABASE ENVIRONMENT

- From our previous discussion, we stated/saw that the DBTG proposal by CODASYL in 1971, was
not formally adopted by ANSI. Recall that the DBTG proposal recognized the need for a two-
level approach with a system view called the Schema and user views called subschema.
- In spite of the specifications of the DBTG proposal, the ANSI ’s Standard Planning and
Requirements Committee (SPARS), otherwise called ANSI/X3/SPARC, produced a similar
terminology and architecture in 1975.
- The ANSI-SPARC architecture addresses the need for a three-level approach with a system
catalogue i.e. the specification/list of items that comprise the database environment.
- Even though the ANSI-SPARC model did not become a standard, it still
provides a basis for understanding some of the functionality of a DBMS, and it concentrated on
the need for an implementation – independent layer to isolate programs from underlying
representational issues.
- The fundamental point of the ANSI-SPARC report is the identification of
three levels of abstraction ie. The three levels at which data items can be described. These
three levels give rise to a three-level architecture made up of an external, a conceptual, and an
internal levels.
- The stated three-levels are respectively such that; the external level describe
how users perceive the data, the internal level shows the way the DBMS and the operating
system perceive the data, in terms of the actual storage of data using the appropriate data
structures and file organizations. Thirdly, the conceptual level provides both the mapping and
the desired independence between the external and internal levels.
The three-levels are shown in the following figure:
2
User 1 User 2 User n
External View 1 View 2 ... View 3
Level

Conceptual Conceptual
Level Schema

Internal Internet
Level Schema

Physical data
organisation Database

- The fundamental objective of the three-level architecture, of the database environment, is to


separate each user’s view of the database from the way the database is physically represented.
- The reasons for this all-important separation of concerns include the following:
1. Access to the same data (consistency, integrity): each user should be able to access the same
data, but can enjoy a different customized view of the data. There is also the provision for each
user to change the way he/she views the data, without necessarily affecting other users.
2. Database Independence: this implies that users should not have to deal directly with physical
database storage details, such as indexing or hashing. In other words, a user ’s interaction with
the database should be independent of storage considerations. This enhances usability as
users are not compelled to deal with rather far technical issues.
3. Flexibility: This implies that the DBA should be able to change the database storage structures
without affecting the user’s views. Also, the DBA should be able to change the conceptual
structure of the database without affecting all users.
4. Maintainability: This implies that the internal structure of the database should be unaffected
by changes to the physical aspects of storage, such as change over to a new storage device.
Let us look at the three-level ANSI-SPARC architecture in a little more detail:
2
1. The External Level:
As stated earlier, this level is the user’s view of the database. It described the part of the
database that is relevant to each user.
- The external level consists of a number of different external views of the database, in such a
way that each user’s real world view is represented in a form familiar for that user.
- The external view includes only those entities, attributes, and relationship, in the real world
that the user is interested in. Other entities, attributes, or relationships that are not of interest
may be represented in the database, but the user will be unaware of them. We shall see the
examples when we discuss the modeling.
2. Conceptual Level:
- This level is the coming view of the database, and describes what data is stored in the database
and the relationship among the data.
- As this level represents the middle level in the three-level architecture, it contains the logical
structure of the entire database as seen by the DBA.
- It is, in principle, a complete view of the data requirement of the organization that is
independent of any storage considerations.
- The conceptual level represents the following:
(i) All entities, their attributes, and their relationships
(ii) The constraints on the data
(iii) Semantic information about the data
(iv) Security and integrity information.
- The conceptual level supports each external view, in that any data available to a user must be
contained in or desired from the conceptual level. However, this level must not contain any
storage-dependent details. For instance, the description of an entity should contain only data
types of attributes, for example, integer, real, character and their length, such as the maximum
number of digits or character, but not any storage considerations, such as the number of bytes
occupied.
3. Internal Level:
- This level covers the physical implementation of the database to achieve optimal routine
performance and storage space utilization. Precisely, the internal level connotes the physical
representation of the database on the computer, and describes how the data is stored in the
database.
- In effect, the internal level covers the data structures and file organizations used to store data
or storage devices. It interferes with the operating system access methods (file management
2
techniques for storing and retrieving data records) to place the data on the storage devices,
build indexes, retrieve the data, and so on.
- The internal level is concerned with such things as;
- Storage space allocation for data and indexes;
- Record description for storage (with stored sizes for data items);
- Record placement
- Data compression and data encryption techniques.
- Below the internal level is the physical level that may be managed by the operating system
under the direction of the DBMS. However, the functions of the DBMS and the OS at the
physical level are not clear-cut and vary from system to system. Some DBMS take advantage of
many of the OS access methods, and others use only the most basic ones and create their our
file organization. The physical level below the DBMS consists of items that only the OS knows,
such as exactly how the sequencing is implemented and whether the fields of internal records
are stored as contiguous bytes on the disk.

Important Notes:
- The overall description of the database is called the database scheme.
- There are three different types of schemes in the database and these are defined according to
the levels of abstraction of the three-level ANSI-SPARC architecture.
- The highest level is the multiple external schemes (also called subschemes) that corresponds to
different views of the data.
- The conceptual level, conceptual scheme, describes all the entities, attributes, and
relationships together with integrity constraints.
- The lowest level is the internal scheme, which is a complete description of the internal model,
containing the definitions of stored records, the methods of representation, the data fields,
and the indexes and storage structures used. There is only one conceptual scheme and one
internal scheme per database.
- The DBMS is responsible for mapping between these three types of scheme. The DBMS must
check the schemes for consistency; in other words, the DBMS must check that each external
scheme is derivable from the conceptual scheme, and it must use the information in the
conceptual scheme to map between each external scheme and the internal scheme.
- The conceptual scheme is related to the internal through a conceptual/internal mapping, which
enables the DBMS to find actual record or combination of records in physical storage that
constitute a logical record in the conceptual scheme, together with any constraint to be
enforced on the operation of the logical record.
2
- The conceptual/internal mapping also allows the differences in entity names, attributes names,
attribute order, data types, and so on to be resolved.
- Finally, the external schemes is related to the conceptual scheme by the external/conceptual
mapping. This mapping enables the DMBMS to map names in the user ’s view to the relevant
part of the conceptual scheme. [You can find examples].
- Furthermore, it is worthy of note that a major objective for the three level architecture is to
provide data independence. This means that upper levels are unaffected by changes to lower
levels. There are two kinds of data independence; logical and physical.
- Logical data independence is used to describe the immunity of the external schemes to
changes in the conceptual scheme, such as the addition or removal of new entities, attributes,
or relationships. The actions are required to be possible without having to change existing
external schemes or having to rewrite the application programs. Clearly, the users for whom
the changes have been made need to be aware of them, but what is important is that other
users should not be.
- The physical data independence describes the immunity of the conceptual scheme to the
changes in the internal scheme. This implies that changes to the internal scheme, such as using
different file organizations or storage structures, using different storage devices, modifying
indexes or hashing algorithms, should be made possible without having to change the
conceptual or the external schemes. The only observation on the part of the user should be
change in performance. In fact, performance degradation is the most common reason for
internal scheme changes. The following figure illustrates where each type of data
independence occurs with the three level architectures.
External External External
scheme scheme ... scheme

External/Conceptual Logical data


Mapping Independence
Conceptual
Schema

Physical data
Conceptual/Internal
Independence
Mapping
Internet
Schema

- While the three-level ANSI-SPARC architecture guarantees data independence, there is still
concerns for efficiency. To improve efficiency, owing to the two-storage mapping, the model
2
allows the by-passing of the conceptual scheme, thus, providing for a direct external to internal
scheme mapping, but this reduces data independence, to the extent that changes in the
internal scheme can affect the external scheme and any dependent application.
3.2 Functions of a DBMS
- Recall that the relational data model was based on the paper by E.F. Codd in 1970. The same
man through an award lecture in 1982 published “Relational Database: A Practical Foundation
for Productivity”, where he enlisted eight services that should be provided by any full-scale
DBMS. Those and some others, to be discussed here, make up the functions of the DMBS that
follow:
1. Data Storage, Retrieval and Update
- This is the fundamental function of a DBMS, and implies that a DBMS must furnish users with
the ability to store, retrieve, and update data in the database. To achieve this purpose, the
DBMS should hide the internal physical implementation details (such as file organization and
storage structures) from the user.
2. A User-Accessible Catalog
- A key feature of the ANSI-SPARC architecture is the recognition of an integrated system catalog
to hold data about the scheme, users, applications, etc. This implies that the DMBS must furnish a
catalog in which description of data items are stored which is accessible to users.
Typically, the system catalog stores the following:
- names, types, and sizes of data items;
- names of relationship
- integrity constraints in the data
- names of authorized users who have access to the data
- the data items that each user can access and the types of access allowed e.g. insert, update,
delete, or read access.
- external, conceptual, and internal schemes and the mapping, between the schemes.
- usage statistics, such as the frequencies of transitions and counts on the number of accesses
made to object in the database.
Infact, the system catalog is one of the fundamental components of the system. Some of the benefits
of a system catalog includes;
- Information about data can be collected and stored centrally. This helps to maintain control
over the data as a resource.
- Definition of what the data means
- Simplification of communication, by conveying data on the meaning and identification of users
and their access rights.
2
- Ease of identification of redundancy and inconsistencies.
- Records of database changes, and
- Enforcement of security and integrity roles, together with audit trails.
NB Sometimes, systems catalog and data directory are used interchangeably.
3. Transaction Support
By definition, a transaction is a series of actions carried out by a single user or application
program, which accesses or changes the contents of the database. To this effect, a DBMS must
provide a mechanism that will ensure that all the updates corresponding to a given transaction are
made or that none of them is made.
4. Concurrency Control Services
- These range of services demand that the DMBS must provide a mechanism to ensure that the
database is updated correctly when multiple users are updating the database concurrently. One
important objective of using a DBMS is to enable many users to access shared data concurrently.
Concurrent access become more complicated when the users go beyond reading data to simultaneous
access moulding updates and interface that can result in inconsistencies. Thus, the DBMS is required
to ensure that when multiple users are accessing the database, interference cannot occur.
5 Recovery Services
These services imply that the DBMS is required to provide a mechanism for recovering the
database in the event that the database is damaged in any way. The failure of any transaction
requires that the database has to be returned to a consistent state. Failures may be the result of a
system crash, media failure (network problem), a hardware or software error causing the DBMS to
stop, or as a result of user action, such as aborting a transaction due to error detected by the user. In
all these cases, the DBMS is required to provide a mechanism to restore the database to a consistent
state.
6. Authorization Services
- The DBMS is required to ensure that only authorized users can access the database. The term
“security” refers to the protection of the database against unauthorized access, either intentional or
accidental. The DBMS is expected to ensure that the data is secure, using appropriate mechanism.
7. Support of Data Communication
This requires that a DBMS must be capable of integrating with communication software. Most
often, users operate from remote workstations, and have to communicate with the host of the DBMS
over a network. Thus, the DBMS receives requests as communication messages and responds in a
similar way. All such transmissions are handled by a data communication manager (DCM). Although
the DCM is not necessarily part of the DBMS, the DBMS is required to be capable of being integrated
2
with a variety of DCMs if the system is to be commercially viable. It is essential that (PC, workstation)
users should be able to access a centralized database from remote locations.
8. Integrity Services
The term “Database integrity” refers to the correctness and consistency of stored data: It is a
another type of database protection (requirement of security). To this end, the DBMS is required to
provide a means to ensure that both the data in the database and changes to the data follow certain
rules. Integrity is closely related to security, and is concerned with the quality of data itself. Integrity
is usually expressed in terms of constraints, which are consistency rules that the database is not
permitted to violate. For example, we may wish to specify a constraint that no student is allowed to
do more than 14 courses in one academic session in a students database. Therefore, the DBMS has to
check whether this limit is exceeded when assigning courses to students.
9. Services to Promote Data Independence
A DBMS must include facilities to support the independence of programs from the actual
structure of the database. Recall that data independence is normally achieved through a view or
subschema mechanism. Physical data independence is known to be easier achieved than logical data
independence. The addition of a new entity, attribute, or relationship can usually be accommodated,
but not their removal. In some systems, changes to an existing component in the logical structure is
prohibited.
10. Utility Services
A DBMS is required to provide a set of utility services. Utility programs help the DBA to
administer the database effectively. Some other utilities function internally and can be produced by
the DBMS vendor. Examples of utilities provided by the vendor include:
- Import facilities, to load the database from flat files, and export facilities, to unload the
database to flat files.
- Monitoring facilities, to monitor database usage and operation.
- Statistical analysis programs, to extract performance or usage statistics.
- Index reorganization in facilities, to reorganize indexes and their overflows.
- garbage collection and reallocation, to remove deleted records physically from the storage
devices, to consolidate the space released and to reallocate it where it is needed.
3.3 Components of a DBMS
For the fact that DBMS are highly complex and sophisticated pieces of software, which provide
the functionalities discussed, it may not be easy to generalize the component structure of a DBMS, as
it varies greatly from system to system. However, we present a rather simplistic view of the common
components structure of a DBMS and their relationship: using the figure that follow:
2
Programmers Users DBA

Application Queries Database


Programs Schema

DBMS

DML Queries DDL


Preprocessor Processor Compiler

Program Database Catalog


Object Code Manager Manager

File
Manager

Access
Methods

System
Buffers

Database and
System Catalog
- As depicted in the above figure, a DBMS is partitioned into several software components (or
modules), each of which is assigned a specific operation. Notably some of the functions of the
DBMS are supported by the underlying operating system. However, the operating system
2
provides only basic services and the DBMS must be built on top of it. Therefore, the design of a
DBMS must take into account the interface between the DBMS and the operating system.
- The figure shows how the DBMS interfaces with other software components, such as user
queries and access methods (file management techniques for storing and retrieving data
records) in accordance with the figure the components of the DBMS are:
* Query Processor: The major DBMS component that transforms queries into a series of low-level
instructions directed to the database manager.
* Database Manager (DM): which interfaces with user-submitted application programs and
queries. The DM accepts queries and examines the external and conceptual schemes to
determine what conceptual records are required to satisfy the request. The DM then places a
call to the file manager to perform the request. Within the DM are subcomponents that make
it possible for the DM to function. These software components (subcomponents) include;
authorization control, command processor, integrity checker, query optimizer, transaction
manager, scheduler, recovery manager, buffer manager, etc.
* File Manager:
The file manage manipulates the underlying storage files and manages the allocation of storage
space on disk. It establishes and maintains the list of structures and indexes defined in the
internal scheme. If hashed files are used, it calls on the hashing functions to generate record
addresses. However, the file manager does not directly manage the physical input and output
of data. Rather, it passes the request on to the appropriate access methods, which either read
data from or write data into the system buffer (or cache).
* DML Proprocessor:
This module converts DML statements embedded in an application program into standard
function calls in the host language. The DML preprocessor must interact with the query processor to
generate the appropriate code.
* DDL Compiler:
This compiler converts DDL statements into a set of tables containing metadata. These tables
are then stored in the system catalog while control information is stored in data file headers.
* Catalog Manager:
The catalog manager manages access to and maintains the system catalog. The system catalog
is accessed by most DBMS components.
NB: With the understanding of the fundamental components of a DBMS, it becomes possible to easily
relate with the components of most commercial DBMS, such as Oracle, which is based on the client-
server architecture. These are discussed in recommended texts and most literature in the subject.
3.4 Web Services and Service-Oriented Architectures
3
- A web service is a software system designed to support interoperable machine-to-machine
interaction over a network. Though the Internet has allowed companies to provide a wide
range of services to users, sometimes called B2C (Business to Consumer), web services allow
applications to integrate with other applications across the Internet and may be a key
technology that supports B2B (Business to Business) interaction.
- Unlike other web-based applications, web services have no user interface and are not aimed at
web browsers. Web services instead shared business logic, data, and processes through a
programmatic interface across a network. In this way, it is the application that interface and
not the users. Developers can then add the web service to a web page (or an executable
program) to offer specific functionality to users.
- Examples of Web services include:
* Microsoft Virtual Earth Web Services, which offer static map images (gif, jpeg, and png), direct
map file access, search functionality, geocoding, reverse geocoding, and routing. Microsoft
MapPoint Web service provides access to location based services, such as maps, driving
directions, and proximity searches.
* Amazon S3, which is a simple Web services interface that can be used to store and retrieve
large amounts of data, at any time, from anywhere on the Web. It gives any developer access
to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon
uses to run its own global network of Websites. Charges are based on the “pay as you go ”
policy and about $0.15 per GB for the first 50TB/month of storage used, as at 2011.
* Geonames, which provides a number of location-related web services; for example, to return a
set of Wikipedia entries as XML, documents for a given place name or to return the time zone for a
give latitude/longitude.
* DOTS Web services from Objectservices, an early adopter of web services, provide a range of
services such as company information, reverse telephonenumber Lookup, email address
validation, weather information, IP address-to-location determination.
The Key to Web services approach is the use of widely accepted technologies and standards,
such as:
* XML (eXtensible Markup Language)
* SOAP (Simple Object Access Protocol), which is a communication protocol for exchanging
structure information over the Internet and uses a message format based on XML. It is both
platform-and language-independent.
* WSDL (Web Services Description Language) protocol, which is again based on XML, and used to
describe and locate a Web service.
3
* UDDI (Universal Discovery, Description, and Integration) protocol is a platform-independent,
XML-based registry for business to list themselves on the Internet. It was designed to be
integrated by SOAP messages and to provide access to WSDL documents describing the
protocol bindings and message formats required to interact with the Web services listed in its
directory.
- From the database perspective, webservices can be used both from within the database (to
invoke an external web service as a consumer) and the web service itself can access its own
database (as a provider) to maintain the data required to provide the requested service. The
details of SOAP, WSDL, and UDDI are outside the scope of this course.
* Service-Oriented Architecture (SOA)
In brief, a service-oriented architecture is a business-centric software architecture for building
applications that implement business processes as sets of services published at a granulainly relevant
to the service consumer. Services can be invoked, published, and discovered, and are abstracted
away from the implementation using a single standard-based form of interface.
- Some common SOA principles that provide a unique design approach for building web services
for SOA are; loose coupling, reusability, contract, abstraction, composibility, autonomy, stateless,
discoversibility, etc.
3
LECTURE FOUR
FILE DESIGN AND ACCESS PATH
Outline:
- Database Design
- Entity Relationship (ER) Modelling
- Normalisation
- Introduction to SQL – Database Access
- Analysis of Open Database Connectivity Standard.

4.1 DATABASE DESIGN


- Generally speaking, the failures associated with most software projects (small to big), which are
commonly attributed to lack of complete requirement specification and appropriate development
methodology, together with poor decomposition of design into manageable components, have
prompted the evolution and adoption of a structured approach to the development of software
systems. This approach is generally referred to as Information Systems Lifecycle (ISLC) or the Software
Development Lifecycle (SDLC).
- In the case of database systems design, the lifecycle is more specifically referred to as the
Database System Development Lifecycle (DSDLC). Notably, the lifecycle of an organization ’s
information system is inherently linked to the lifecycle of the database system that supports it.
- Typically, the stages in the lifecycle of an information SD include: planning, requirements
collection and analysis, design, prototyping, implementation, testing, conversion, and operational
maintenance.
- For the fact that a database system is a fundamental component of the larger organisation
information system, the database system development lifecycle is inherently associated with the
lifecycle of the information system.
- We note that for a small database system with small number of users, the development
lifecycle need not be complex. However, when designing a medium to large, database system with
tens of thousand of users, using hundreds of queries and application programs, the lifecycle can
become considerably complex.
- The figure that follows is used to support the stages of the database development lifecycle.
The stages are not strictly sequential, but involve some amount of repetition of previous stages
through feedback loops.
3
Database Planning

System Definition

Requirement collection
and analysis

Database
Design

Conceptual design
DBMS Application
selection Design
(optional)
Logical design

Physical design

Prototyping Implementation
(optional)

Data Conversion
and Loading

Testing

Operational
Maintenance
Assignment 1:
- Briefly discuss the main activities associated with each stage of the database system
development lifecycle suitable for medium to large database system.
- By definition, database design can be viewed as the process of creating a design that will
support the enterprise’s mission statement and mission objectives for the required database system.
- The two main approaches to the design of a database are referred to as “bottom-up ” and “top-
down” approaches. Where are bottom-up approach begins at the fundamental level of attributes (ie,
3
properties of entities and relationships), which through analysis of the associations between
attributes are grouped into relations that represent types of entities and relationships between
entities. Fundamentally, the process of normalization (to be discussed) represents a bottom-up
approach to database design. Normalization involves the identification of the required attributes and
their subsequent aggregation into normalized relations based on fundamental dependencies between
the attributes. The bottom-up approach is more suitable for simple database with a relatively small
number of attributes.
- The top-down approach is a more appropriate strategy for the design of complex databases,
which are known to show difficulty in establishing all the attributes to be included in the data models.
The top-down approach starts with the development of data model that contain a few high-level
entities and relationships and the applies successive top-down refinements, to identity lower-level
entities, relationship, and the associated attributes.
- Some other approaches to database design are inside-out approach, which is somehow related
to bottom-up, but differs by first identifying a set of major entities and then spreading out to consider
other entities, relationships, and attributes associated with those first identified. These are also the
mixed strategy approach, which uses both the bottom-up and top-down approach for various parts of
the model before finally combining all parts together.
- Before proceeding to discuss ER modeling, we note the following;
(i) The conceptual database design is the process of constructing a model of the data used in an
enterprise, independent of all physical consideration.
(ii) The logical database design is the process of constructing a model of the data used in an
enterprise based on a specific data model, but independent of a particular DBMS and other
physical considerations.
(iii) The physical database design, being the third and final phase of the database design process, is
the process of producing a description of the implementation of the database on secondary
storage; it describes the base relations, file organizations, and indexes used to achieve efficient
access to the data, and any associated integrity constraints and security measures.
(iv) DBMS selection is the process of selecting an appropriate DBMS to support the database
system. Where no DBMS exists, a decision and selection can be made between the conceptual
and logical phases of the database design lifecycle. The rain steps involved in selecting a DBMS
include; defining terms of reference of study, shortlisting two or three or more products,
evaluating products, and recommending selection and producing report.
Comment:
We leave the general description of the stages to the discretion of students, as it forms the first
assignment (Assignment 1).
3
4.2 ENTITY RELATIONSHIP (ER) MODELLING
- The design of a database commences once after the requirements collection and analysis stage
of the DBMS are completed. To ensure a precise understanding of the nature of the data and how it is
used by the enterprise, there is the need for a model for communication that is non-technical and free
of ambiguities. A typical example of such a model is the Entity Relationship (ER) model.
- The Entity Relationship (ER) model is a top-down approach to database design that begins by
identifying the important data called entities and relationships between the data that must be
represented in the model. Therefore, more details, such as the information that should be held about
the entities and relationships called attributes and any constraints on the entities, relationships, and
attributes, are added/included in the model. ER modeling is, therefore, an important technique for
any database designer to master.
- Remarkably, entities are used to represent real life objects, and therefore, are used
interchangeably with objects. However, entities are categorized according to types. Where an entity
type is a group of objects with the same properties, which are identified by the enterprise as having
an independent existence. Thus, the basic concept of the ER model is the entity type, having an
independent existence and can be objects with a physical (or “real ”) existence or objects with a
conceptual (or “abstract”) existence. Note that the definition of entity types is subjective and usually
a matter that is at the discretion of designers, implying that different designers may identify different
entities (no formal definition exist)
- Each uniquely identifiable object of an entity type is referred to simply as an entity occurrence.
A database normally contains many different entity types, as we find in examples.
- In a typical database, the entities thereof are often related to some ways to one another,
termed a relationship. The entities that participate in a relationship are also known as participants,
and each relationship is identified by a name that describes the relationship. The relationship name is
an active or passive verb; eg a STUDENT takes a CLASS, a PROFESSOR teaches a CLASS, a DEPARTMENT
employs a PROFESSOR, a DIVISION is managed by an EMPLOYEE an AIRCRAFT is flown by a CREW, etc.
We can identify the objects and the relationships.
- A relationship type is a set of meaningful associations between one or more participating entity
types, and are given names that describe their function. While a relationship occurrence is a uniquely
identifiable association that includes one occurrence from each participating entity type.
- Conventionally, diagrams are used for the representation of relationship types. The diagram
and the symbols are issues that are also subjective to the discretion of designers. In any case,
relationships are depicted with directions where some can only make sense in one direction, e.g
BRAHCH has staff makes more sense than STAFF has BRANCH. The number of participants in a
3
relationship type is called the degree of that relationship. Thus, the degree of a relationship indicates
the number of entity types involved in a relationship; where degree two is called binary, 3 tertiary etc.
Eg of tertiary relationship, involving entities; STAFF, BRANCH, and CLIENT

STAFF Registers BRANCH

CLIENT
- Also relationship of degree four is called quaternary
- Sometimes, there exists a relationship type in which the same entity type participates more
than once in different roles. This concept is normally regarded as recursive relationship. For example,
consider a situation where a staff (supervisee) is supervised by a supervisor who is also a member of
staff. This is depicted as follows;
supervises
(role supervisor)

Staff
(role supervisee)

- In some other cases, two entities can be associated through more than one relationship. For
example, the STAFF and BRANCH entity types can be associated through two distinct relationships
called manages and has: shown as

STAFF manages STAFF


ha
role-manager of staff s role-branch office

- Entities are known and described by their properties. The particular properties of entity types
are called attributes. For example, a STAFF entity type may be described by the staffNO, name,
position, and salary attributes.
- The attributes hold values that describe each entity occurrence and represent the main part of
the data stored in the database. A relationship type that associates entities can also have attributes
similar to those of an entity type.
- We note that the set of allowable values for one or more attributes is called the attribute
domain. Also, attributes can be simple, composed of a single component with an independent
existence, or composite, composed of multiple components, each with an independent existence. For
example, the address attribute of the BRANCH entity with the value (100 Agbani Road, Enugu, Nigeria)
can be subdivided into street (100 Agbani Road), city (Enugu and Country (Nigeria) attributes.
3
The decision for composite attributes or simple attributes depends on the requirement of the
user views which in the example refers to whether the address attribute should be treated as a simple
attribute or subdivided as shown.
- Furthermore, attributes can be singled-valued or multidivided. They are single-valued when
they hold a single value for each occurrence of an entity type and multi-valued otherwise. For
example, when a BRANCH entity has a telNO attribute, and the telNO has 2 values such as 0141-339-
2178 and 0141-339-4439.
- Sometimes, the values held by some attributes may be derived. A derived attribute is one that
represent, a value that is derivable from the value of a related attribute or set of attributes, not
necessarily in the same entity type. For example in a TRANSCRIPT entity type, the value of course-
duration attribute of the STUDENT entity can be calculated from yrStart and yrFinish attributes of also
the STUDENT entity type.
- Quite importantly for the reasons of sorting and searching purposes is a method to uniquely
identify each occurrence of an entity type. The minimal set of attributes that uniquely identifies each
occurrence of an entity type is called the candidate key.
- Whereas the primary key is the candidate key that is selected to uniquely identify each
occurrence of an entity type. For example, consider the use of studentNO attribute to uniquely
identity a student in STUDENT entity. It is observed that the choice of primary key for an entity is
based on considerations of attribute length, the minimal number of attributes required, and the
future certainty of uniqueness.
- Where there exists two candidate keys that can be used to identify each occurrence of entity
type, such as StaffNO and NIN (National Insurance Number) for a STAFF entity, StaffNO can be used as
primary key while NIN can be used as an alternate key.
- In some cases, the key of an entity type is composed of several attributes, known as composite
key, whose values together are uniquely for each entity occurrence but not separately. For example,
an ADVERT entity can have propertyNO, NewspaperName, advertDate, and Cost attributes. There can
exist several adverts in many newspapers on a given date. Therefore, to uniquely identify each
occurrence of the ADVERT entity will require a composite primary key made up of; PropertyNO,
NewspaperName, and AdvertDate attributes.
- By convention, in ER diagrams, the entities are often drawn in rectangles, where the first divide
of each rectangle shows the entity name and the second divide shows the attributes starting with the
primary key. Any composite attribute is usually indented to the right of the attribute the subdivisions
concern.
- Entities are sometime classified as strong or weak. Where a strong entity types is one that is
not existence-dependent on some other entity type. Eg. STUDENT, DEPARTMENT, COURSES, etc. A
3
characteristic of a strong entity type is that each entity occurrence is uniquely identifiable using the
primary key attributes of that entity type. Eg. We can uniquely identify a student using studentNO.
While a weak entity type is one that is existence-dependent on some other entity type. A
characteristic of a weak entity is that each entity occurrence cannot be uniquely identified using only
the attributes associated with that entity type (Students can find examples of weak entities in texts).
- Usually, constraints are placed on entity types that participate in a relationship. The
constraints should reflect the restrictions on the relationships as perceived in the “real world ”.
Example is that each department must have students and each course must have a lecturer. The main
type of constraint on relationship is called multiplicity. Where multiplicity is the number (or range) of
possible occurrences of an entity type that may relate to a single occurrence of an associated entity
type through a particular relationship.
- Multiplicity constrains the way that entities are related. It is a representation of the policies (or
business rules) established by the user or enterprise. Ensuring that all appropriate constraints are
identified and represented is an important part of modeling an enterprise.
- The most common degree for relationships is binary. Binary relationships are generally
referred to as being one-to-one (1:1), one-to-many (1:*), or many-to –many (*:*).
Examples:
- A DEPARTMENT has HEAD (1:1);
ha
DEPT HEAD
s

- A LECTURER teaches Courses (1:*);


LECTURER teaches COURSES
I m

- DEPARTMENT offer COURSES (*:*);


DEPT m m Courses

- Furthermore, multiplicity actually consists of two separate constraints known as cardinality and
participation. Where cardinality describes the maximum number of possible relationship occurrences
for an entity participating in a given relationship type; and participation determines whether all or
only some entity occurrence participate in a relationship to the case. Where all entity occurrences are
involved in a particular relationship we have mandatory participation, and optional participation
otherwise.
- For reasons of space and time, we conclude this subsection with the requirements for
developing an ER diagram. Building an ERD usually involves the following activities;
* Create a detailed narrative of the organisation’s description of operations
3
* Identify the business rules based on the description of operations
* Identify the main entities and relationships from business rules.
* Develop the initial ERD
* Identify the attributes and primary keys that adequately describe the entities.
* Revise and review the ERD.
NB
(1) During the review process, it is likely that additional objects, attributes, and relationships will
be uncovered. Therefore, the basic ERM will be modified to incorporate the newly discovered ER
components. As a matter of fact, the review process is repeated until the end users and designers
agree that the ERD is a fair representation of the organization’s activities and functions.
(2) During the design process, the database designer does not depend simply on interviews to help
define entities, attributes and relationships. A surprising amount of information can be gathered by
examining the business forms and reports that an organization uses in its daily operations.

Assignment 2 (20 marks)


Create an ER model for each of the ff descriptions;
(a) Each company operates four departments, and each department belongs to one company.
(b) Each department in (a) employs one or more employees, and each employee works for one
department.
(c) Each of the employees in (b) may or may not have one or more departments, and each
department belongs to one employee.
(d) Each employee in (c) may or may not have an employment history.
(e) Represent all the ER models described in (a), (b), (c), and (d) as a single ER model.
[4 Marks each]

4.3 NORMALIZATION
- The design of a database demands an accurate representation of the data, relationships
between the data, and constraints on the data that is pertinent to the enterprise. We already stated
that ER modeling is a model for communication that is non-technical and free of ambiguities, which
ensures a precise understanding of the nature of the data and how it is used by the enterprise.
Another database design technique is that of normalization.
- Normalization is a technique for producing a set of relations with desirable properties, given
the data requirements of an enterprise. The technique begins by examining the relationships (called
functional dependencies) between attributes. Normalization uses a series of tests (described as
normal forms) to help identify the optimal grouping for these attributes to ultimately identify a set of
suitable relations that supports the data requirements of the enterprise.
4
- It is important to note that the characteristics of a suitable set of relations include the
following;
* The minimal number of attributes necessary to support the data requirements of the
enterprise.
* The minimal number of attributes necessary to support the data requirements of the
enterprise.
* Attributes with a close logical relationship (descried) as functional dependency) are found in
the same relation.
* Minimal redundancy, with each attribute represented only once, with the important exception
of attributes that form all or part of foreign keys, which are essential for the joining of related
relations. Where, precisely, a foreign key is an attribute, or set of attributes, within one relation
that matches the candidate key of some (possibly the same) relation.
- The benefits of using a database that has a suitable set of relations is that the database will be
easier for the user to access and maintain the data, and take up minimal storage space on the
computer. These are the importance of the process of normalization.
- The process of normalization can be used as a bottom-up standalone database technique, or as
a validation technique to check the structure of relations, which may have been created using a top-
down approach such as ER modeling. Notwithstanding the approach, the common goal is to create a
set of well-designed relations that meet the data requirements of the enterprise.
- In order to further appreciate the problems of data redundancy and update anomalies which
normalization seeks to address, let us consider the following set of relations;
1. STAFF (staffNO, sName, Position, Salary, branchNO)
2. BRANCH (branchNO, bAddress)
3. STAFFBRANCH (staffNO, sName, Position, Salary, branchNO, bAddress)
- In the occurrence of relations described by the above attributes, we observe that the
STAFFBRANCH relation allows for redundant data, for the reason that different staff with their
respective staff numbers belonging to the same branch requires that both the branchNO and
bAddress be repeated for the respective tuples. This problem does not arise in the case of the STAFF
and BRANCH relations. The implication is that the STAFFBRANCH relations having redundant data will
have problems normally regarded as update anomalies, which are classified as insertion, deletion, or
modification anomalies.
- An important concept associated with normalization is functional dependency, which describes
the relationship between attributes in a relation. For example: if A and B are attributes of relation R, B
is functionally dependent on A (denoted by A  B), if each value of A is associated with exactly one
value of B. (A and B may each consists of one or more attributes).
4
- Functional dependency is a property of the meaning or semantics of the attributes in a relation.
The semantics indicate how attributes relate to one another, and specify the functional dependencies
between attributes. When a functional dependency is present, the dependency is specified as a
constraint between the attributes. Normally, the term determinant refers to the attributes, or group
of attributes, on the left-hand side of the arrow of a functional dependency.
- Functional dependencies can be identified through discussions with end-users, or
documentation, or by the experience of the designers. Functional dependencies can also be used to
identify primary keys.

The Process of Normalization


- The process of normalization is a progressive one, and executed as a series of steps. Each step
corresponds to a specific normal form that has known properties. As normalization proceeds, the
relations become more restricted in format and also less vulnerable to update anomalies. There is, of
course, the First Normal Form, 1NF, the 2NF, the 3NF, the BCNF (Boyce-Codd Normal Form), and the
4NF and 5NF.
- The INF takes the unnormalized tables (UNF) and removes all repeating groups to generate the
INF, from which partial dependencies are removed to generate the 2NF, and to the 3NF after
removing transitive dependencies. [We skip the details owing to the paucity of time and space].

NB: It is important to note the key issues in the process of normalization.


* Furthermore ==> illustrate the process of normalization.

4.4 INTRODUCTION TO SQL


- The structured Query Language is a particular language that has emerged from the
development of the relational model, and has become the standard relational database language,
defined by ANSI in 1986, and adopted as an international standard by ISO in 1987. the latest release
of the standard is SQL: 2008.
- The objectives of the SQL is to allow users to;
* Perform basic data management tasks, such as the insertion, modification, and deletion of data
from the relations.
* Perform both simple and complex queries.
* Support portability, by conforming to some recognized standard, to the extent that the same
command structure and syntax can be used even when moving from one DBMS to another.
- Notably, SQL is an example of a transform-oriented language, or a language designed to use
relations to transform inputs into required outputs.
- The two major components of the SQL are;
4
* a Data Definition Language (DDL) for defining the database structure and controlling access to
the data.
* a Data Manipulation Language (DML) for retrieving and updating data.

[Comment]
- SQL is reputable for being relatively easy language to learn. Thus, it suffices to treat the
fundamentals here.
- SQL is a nonprocedural language, which implies that the user simply has to specify `what’
information is required, rather than `how’ to get it. In other words, SQL does not require the
specification of the access methods to the data.
* In line with most modern languages, SQL is essentially free format, implying that parts of
statements do not have to be typed at particular locations on the screen.
* The command structure consists of Standard English words such as CREATE TABLE, INSERT,
SELECT.
For example;
- CREATE TABLE staff (staffNO VARCHAR(5), 1Name VARCHAR (15), salary DECIMAL (7,2));
- INSERT INTO Staff VALUES (`SG16’, `Ugwu’, 50000);
- SELECT StaffNo, 1Name, Salary
FROM Staff
WHERE Salary > 10000;
- SQL can be used by a range of users including database administrators (DBA), Management
Personnel, Application developers, and many other types end-users.
- There is also the UPDATE and DELETE Commands and others supported by the Language.
[Students can do much more on their own].

4.5 ANALYSIS OF OPEN DATABASE CONNECTIVITY STANDARD


- Database Connectivity refers to the mechanism through which application programs connect
and communicate with data repositions. Database connectivity software also known as database
middleware, provides an interface between the application program and the database.
- The data repositioning, also known as the data source, represents the data management
application, such as Oracle RDBMS, SQL Server DBMS, or IBM DBMS used to store data generated by
application program. A data repository could be treated anywhere and can hold any data type. For
example, the data source could be a relational database, a hierarchical database, a spreadsheet, or a
text data file.
4
- Just as SQL has become the defaults standard for data manipulation language, there is also the
need for a standard database connectivity interface that will enable applications to connect to data
repositions.
- The many different ways to achieve database connectivity include;
* Native SQL Connectivity (vendor provided)
* Microsoft’s Open Database Connectivity (ODBC), Data Access Objects (DAO), and Remote Data
Objects (RDO).
* Microsoft’s Object Linking and Embedding for Database (OLE-DB)
* Microsoft’s Activex Data Object (ADO.NET).
* Sun’s Jark Database Connectivity (JDBC).
- The Microsoft’s products are quite popular, supported by majority of database vendors, and
form the barkbone of Microsoft’s Universal Data Access (UDA) architecture, which is a collection of
technologies used to access and manage data through a common interface. [Students can read more
about UDA; ODBL etc].
4
LECTURE FIVE
PROGRAMMING AND APPLICATION DEVELOPMENT IN D-BASE ENVIRONMENT
Outline
- Database Planning, Design, and Administration
- Design and Implementation of Distributed Systems
- Object-oriented and Object-relational systems
- Transaction management
- Concurrency and Recovery
- Data Warehousing and Data Mining

5.1 DATABASE PLANNING, DESIGN, AND ADMINISTRATION


- This section is very related to the previous section, particularly lecture 4 that discussed the
design of databases and the concept of DBDLC. For the purposes of this course, the previous
discussion on Database Design is sufficient. Here, we shed a little more light on the concepts of
database planning and administration.

DATABASE PLANNING
- Database planning encompasses the activities that allows the stages of the database
development lifecycle (DBDLC) to be realized as efficiently and effectively as possible. Thus, database
planning must be integrated with the overall IS strategy of the organization.
- There are 3 main issues involved in formulating an IS strategy. There are;
* identification of enterprise plans and goals with subsequent determination of information
system needs;
* Evaluation of current information systems to determine existing strengths and weaknesses.
* Appraisal of IT opportunities that might yield competitive advantage.
- An important first step in database planning is to clearly define the mission statement for the
database system. The mission statement defines the major aims of the database system, and is
normally articulated by the key drivers of the Dbase project, such as the Director and/or owner. In
principle, a mission statement helps to clarity the purpose of the database system and provide a
clearer path towards the efficient and effective creation of the required database system.
- Further work to show illustration of the development of a doc planning activity.
* Database Design
Comment: For database design kindly refer to section 4.1.
Further work should consider specific illustration/determination of all the stages of database design.
4
DATABASE ADMINISTRATION
Database administration may be seen as the services of on-going activities involving the planning,
design, evaluation, and monitoring of performance and any consequential modification of the storage
schema for purposes of improvement.
- The role involved in database administration is an amalgam of technical and managerial
activities. Where the technical aspects of database administration involve the following areas of
operation;
* Evaluating, selecting, and installing the DBMS and related utilities
* Designing and implementing databases and applications.
* Testing and evaluating databases and applications
* Operating the DBMS, utilities, and applications.
* Training and supporting users
* Maintaining the DBMS, utilities, and applications
- Infact, many of the technical activities of database administration are logical extensions of the
managerial activities.
* Furthermore to include more details of managerial activities with illustrations, if possible.

5.2 DESIGN AND IMPLEMENTATION OF DISTRIBUTED SYSTEMS


Comment: Already introduced in DDBMS. [skip]

5.3 OBJECT-ORIENTED AND OBJECT RELATATIONAL SYSTEMS


- The above systems are evolving with the value proposition to provide adequate modeling
capabilities for the increasing complexity of database applications.
- By definition, an object-oriented data model (OODM) is a logical data model that captures the
semantics of objects supported in object-oriented programming. An object-oriented database
(OODB) is a persistent and sharable collection of objects defined by an OODM. While an Object-
oriented database management system (OODBMS) is the management of an OODB.
- The issue of finding precise definitions to these terminologies is still being debated in literature
and researches are still on-going in them.
- Some of the advantages of OODBMSs include; enriched modeling capabilities, extensibility,
more expressive query language, support for schema evolution and long-duration transactions,
improved performances etc. while some of the disadvantages include; lack of universal data model,
lack of experience and standards etc.
4

Comment: The design approach for OODBMS is beyond the scope of this course and is, therefore,
recommended for further work.

- Sometimes it is difficult to distinguish between OODBMS and ORDBMS. Nevertheless, the


concept of ORDMS can be seen as a hybrid of RDBMS and OODBMS, which takes advantage of the
two concepts.

Comment: The essence of this note is simply to mention the concepts that students should be aware
they exist.

5.4 TRANSACTION MANAGEMENT


- Of all the functions required of a DBMS, there are three closely related functions that are
intended to ensure that the database is reliable and remains in a consistent state. The three functions
are; transaction support, concurrency control services, and recovery services.

Transaction Management
- The term transaction refers to an action, or series of actions, carried out by a single user or
application program, that reads or updates the contents of the database. In this regard, therefore, a
transaction is a logical unit of work on the database. A transaction may be an entire program, a part
of a program, or a single statement (For example, the SQL statement, INSERT or UPDATE) and it may
involve any number of operations on the database.

* Further work to include illustrations and properties of transactions.

5.5 CONCURRENCY AND RECOVERY


- Concurrency control deals with the problems that can arise with concurrent access and the
techniques that can be employed to avoid the problems. By definition, concurrency control is the
process of managing simultaneous operations on the database without having them interfere with
one another.
- A major objective in developing a database is to enable many users to access shared data
concurrently. While there may not be problems with all performing read operations, the ones
involving some update operations may cause interference that can result in inconsistencies. This
situation is akin to multi-user computer systems, though.
4
- The potential problems that can arise by concurrency include; the lost update problem, the
uncommitted dependency problem, and the inconsistent analysis problem.
- The main concurrency control techniques are locking, time-stamping, and optimistic
concurrency control techniques.
* Furthermore, to include illustrations with discussion on the concepts of serialization and
recoverability, together work the details of the concurrency control techniques.

5.6 DATA WAREHOUSING AND DATA MINNING


The paradigm of data warehousing, which has continued to grow in popularity and prevalence
since inception, provides a critical storage resource and capabilities which now have become the core
service of database products.
- In addition to the growth in size and prevalence of datawarehouses, there is also increasing
growth in the scope and complexity of such systems. To the extent that current data warehouse
systems are expected not only to support traditional reporting but also provide more advanced
analysis such as multi-dimensional and productive analysis and this range is to meet the needs of a
growing number of different types of users.
- The concept of a data warehouse was deemed as a solution to meet the requirements of a
system capable of supporting decision making and receiving data from multiple operational data
sources. By definition, Data Warehousing is a subject-oriented, integrated, time-variant, and non-
volatile collection of data in support of management’s decision making process.

Next: Detailing of the definition, advantages, composition with OLTP, issues, architecture, tools and
techniques.

DATA MINNING
- This is a concept that is often associated with data warehousing. To realize the value of a data
warehouse, it is necessary to extract the knowledge hidden within the warehouse. Data mining
comes in handy, as one of the best ways to extract meaningful trends and patterns from huge amount
of data. Data mining discovers within data warehouses information that queries and reports cannot
effectively reveal.
- By definition, data mining can be seen as the process of extracting valid, previously unknown,
comprehensible, and actionable information from large databases and using it to make crucial
business decisions.
- Data mining is concerned with the analysis of data and the use of software techniques for
finding hidden and unexpected patterns and relationships in sets of data. It tends to work from the
4
data up, and the techniques that produce the most accurate results normally require large volumes of
data to deliver reliable conclusions.
- The process of analysis starts by developing an optional representation of the structure of
sample data, during which time knowledge is acquired. This knowledge is then extended to larger
sets of data, working on the assumption that the larger data set has a structure similar to the sample
data.
- Examples of data mining applications include; retail/marketing where data mining is used to
identify buying patterns of customers, finding associations among customer demographic
characteristics, predicting response to mailing campaigns, market basket analysis, etc.
- In banking where you have detection of patterns of fraudulent credit card use, identifying loyal
customers, predicting customers likely to change their credit card affiliation, determining credit card
spending by customer groups.
- Insurance-claims analysis, predicting which customers will buy new policies.
- Medicine-characterizing patent behaviour to predict surgery visits, identifying success medical
therapies for different illnesses.
- There are four main operations associated with data mining techniques, which include
predictive modeling, database segmentation, link analysis and deviation detection. Each of these
operations have their associated techniques that guarantee their effectiveness.

* Furthermore is to detail the techniques, discuss data mining tools. Then discussion of data
mining and data warehousing.
4
LECTURE SIX
FUTURE DIRECTIONS IN DBMS
- Database Security
- Distinguished DBMS
- Distributed Relational D-base Design
- Web Technology and DBMS
- Cloud Computing

6.1 DATABASE SECURITY


- Reasonably, data is a critical and valuable resource to any organisation/enterprise and,
therefore, is required to be strictly controlled and managed. In fact, security guarantees from a
typical function and services of a DBMS. These services are mainly authorization in value.
- The term security refers to the protection of the database against unauthorized access, either
intentional or accidental. By definition, database seemly can be viewed as the mechanisms that
protect the database against intentional or accidental threats.
- Database security may be considered in relation to the following structures; theft and frauds,
loss of confidentiality (secrecy), loss of privacy, loss of integrity, and loss of availability.
- For reasons of being proactive, it is important for any organization to identify all possible
threats. Where a threat is any situation or event, whether intentional or accidental, that may
adversely affect a system and consequently the organisation.
- Threats are quite identifiable, ranging from impersonation, unauthorized data manipulation,
hacking, physical abuse, down to malwares, etc. Nevertheless, the industry and research are gearing
up to evolve appropriate counter messages to deal with both the immediate and future trends in the
areas of the potential threats that can Lamper the placement of trust in databases and distributed
systems. Some of the counter measures forms of the following key access; authorization,
authentication, access control (Diswarehousing Access Control, DAC and Mandatory Access Control,
MDC), views (virtual relation), back-up and recovery, integrity (by cryptographic means), RAID
(Redundant Array of Independent Discuss) for fault tolerance.

The above measures and future projections provide the specification for the development of DBMSs
to gather their evolution.

6.2 DISTRIBUTED DBMS.


- In recent lines, the rapid developments in network and data communication technology, seen
in the Internet, music, and wireless computing, intelligent devices, and grid computing, are
5
occasioning the evolution of distributed database technology, which is enabling the migration from
centralized to decentralized mode of operations.
- The Distributed Database Management System (DDBMS) is a technology that allows users to
access not only data at their own site but also data showed at remote sites. In the first instance by
definition, a distributed database is a logically interrelated collection of shared data (and a description
of this data) physically distributed over a computer network. While a DDBMS is the software system
that permits the management of the distributed database and makes the distribution transparent to
users.
- Users access the distributed database via application, which are classified as those that do not
require data from other sites (local applications) and those that do require data from other sites
(global applications).
- As they stand, DDBMS are required to have at least one global application. For their
significance, it is obvious that future development in DBMS technologies will witness significant
evolution of DDBMSs.

Lectures: N/B Further details to include Component Architecture for a DDBMS.

6.3 DISTRIBUTED RELATIONAL D-BASE DESIGN


In contrast to the design of centralized relational database, in terms of the conceptual and
logical design, the consideration of some additional factors present the understanding of distributed
relational database as different for the centralized relational databases, that we have discussed.
- These additional factors include fragmentation, allocation, and replication.
Where fragmentation implies that a relation may be divided into a number of sub-relations, called
fragments, which are then distributed. The two main types of fragmentation are horizontal – when
the subsets are tuples, and vertical – where the subsets are attributes. Allocation deals with the
manner of showing the fragments at the sites with optimal distributions. While replication is the
process by which the DDBMS may maintain a copy of a fragment at different sites.
- Generally, the definition and allocation of fragments must be based on how the database is to
be used. This involves analyzing some of the transaction of the DDBMS. The design of DDBMS is
based on both quantitative and qualitative information. Where quantitative information is used in
allocation and qualitative information is used in fragmentation.

Further details may be necessary but we skip for this session.

6.4 WEB TECHNOLOGY AND DBMS


5
The World Wide Web, or simply Web, has had the most dramatic influence on information
dissemination since its inception over two decades ago (in 1989), and appears to remain the cradle of
present and most possibly the future information revolution.
- The ubiquity of the web has made it a reasonable platform for the delivery and dissemination
of data-active, inactive applications. The web provides global application availability to both users and
organizations. In effect, the architecture of the web has been designed to be platform-independent.
It has the potential to significantly lower deployment and training costs. Organisations are now
rapidly building new database applications or re-engineering existing ones to take full advantage of
the web as a strategic platform for implementing innovative business solutions, in effect/by
implication becoming web-centric organizations.
- The development in web-based technologies and databases is not leaving out government
agencies and educational institutions, and infact all facets of local and global interconnections. While
small web sites can afford to be file-based, large and complex web sites demand the approach that
allows databases to be accessed from the web for the management of systemic web content. Thus,
both web technology and DBMS together will continue to offer interesting new sites in the immediate
foreseeable future.
- Further work in this area should address the requirements for web-DBMS integration,
advantages and disadvantages of the Web-DBMS approach, and approaches to integrating the web
and DBMS.

6.5 CLOUD COMPUTING


- This is yet another big phrase in IT and is increasing in popularity and appears to have
tremendous value proposition for the future. Cloud computing is actually a way to increase capacity
or add capability to IT systems without investing in new infrastructure, training new personnel, or
licensing new software. It includes services that are subscription-based or pay-as-you-go.
- The services of cloud computing are, in actual fact, computing resources (hardware and
software) that are delivered as a service over a network (typically the Internet). There are many types
of public cloud computing including infrastructure as a service (Iaas), Network as a Service (Naas),
Storage as a Service (Saas), Security as a Service (SEaaS), Data as a Service (Daas), Database as a
Service (DBaas), etc.
Where Data as a Service (DaaS) is based on the concept that data, being a product, can be provided
on demand to the user regardless of geographic or organizational separation of provider and
consumer. [The evolution of Service-oriented architecture (SOA) has rendered the actual platform on
which the data resides as unimportant].
- Further details about the operation and supporting technique may be necessary.

You might also like