Professional Documents
Culture Documents
Cos 421 Note A
Cos 421 Note A
LECTURE ONE
DATABASE MANAGEMENT SYSTEMS
OUTLINE
1. Introduction
2. History of Database Management System
3. Significance of Database Management System
4. Roles in Database environment
5. Database Languages
6. Course Outline
1.1 Introduction
- Database systems are arguably the most important development in the field of software
engineering. The database, itself is now the underlying framework of the information system, and has
fundamentally influenced the way many organizations operate.
Nevertheless owing to the proliferation and simplicity of modern databases, several users
(novices, non-teacher) are creating databases and applications without the requisite knowledge and
skill, leading to the development of ineffective and inefficient systems. This is evidenced by the
increasing numbers of cases of software failures, crises, depression (if you like recession), and all the
ugly tales we hear today.
- Therefore, this course COS 321, is intended to provide the opportunity to sufficiently explore
the concepts of database design and management in order to evolve the necessary skills for the
design of new systems and for resolving the inadequacies of existing systems. [I hope this doesn t
sound too ambitious]. To this end, and in the main, attempts shall be made to introduce and discuss
the theory(ies) behind databases and also provide methodology(ies) for database design, which the
students can verify by whatever means, laboratory or otherwise.
- To introduce database systems, in the first instance, it is necessary to consider some
justifications for databases. In this vein, we attempt to understand what is from what is not .
Imagine operating a business without knowing who the customers are, what products to sell, details
of personnel, debtors, and creditors, etc.
- The above requirements presuppose that businesses have to keep the relevant data and much
more. Besides, data availability is critical to decision making. Business information systems help
businesses to use information as an organizational resource. Central to systems for decision making
and/or information processing are the issues of the collection, storage, aggregation, manipulation,
and management of data.
- For reasons of clarity, we differentiate between data and information as follows: Data are raw
facts, which imply that the facts have not yet been processed to reveal their meaning, while
information is the result of processing data to reveal its meaning. Furthermore, a database is
considerably a collection of related data. Also, the term database system can be described as a
collection of application programs; that interact with the database. While a database application is
simply a program that interacts with the database at some point in its execution. It is important to
note these terminologies in the parlance of database systems.
2
- Having figured out that a database is a shared collection of logically related data, and a
description of this data, designed to meet the information needs of an organization, it becomes
acceptable to define a database management system as software system that enables users to define,
create, maintain, and control access to the database.
- Fundamentally, some of the uses of database systems include but not limited to the following:
- Purchases from supermarkets
- Purchases/transactions using credit cards
- Bookings; and as a holiday from travel agents
- Using the library; and other catalogue-based resources
- Taking insurance, or other policies
- Internet usage; Amazon, Google searches, Facebook etc
- Studying at university; students information system, examination information system,
transcript processing, etc.
- To realize the above objectives, and even more, the Database Management System (DBMS),
being the software that interacts with the users application programs and the database, provides the
following facilities;
- Mechanisms for defining the database; usually through a Data Definition Language (DDL),
which allows users to specify the data types and structures and the constraints on the data to be
stored in the database.
- Mechanism for data manipulation; usually through a Data Manipulation Language (DML),
which allows users to insert, update, delete, and retrieve data from the database. Owing to the
centralized nature of modern data storage systems, the DML offers a general inquiry facility to data,
which is called a query language. The query language alleviates the problems with the traditional file-
based systems (to be discussed subsequently) where the use has to work with a fixed set of queries or
there is a proliferation of programs, giving major software management problems. The most common
query language is the Structured Query Language (SQL), which we shall study in detail.
- Mechanism for access control: The DBMS, for example provides for the following:
i. a security system: which prevents unauthorized access to the database.
ii. the integrity system: which maintains the consistency of stored data.
iii. a concurrency control system: which allows shared access of the database
iv. a recovery system: which restores the database to a previous consistent state following a
hardware or software failure. These days we hear of replication systems and servers,
redundancy, cloud compiling, etc.
v. a user-accessible catalog: which contains description of the data in the database.
The database approach info prog. Can be graphically elicited as follows:
3
Data Entry
and Report
Sales
application
Sales program DBMS
Database
Data Entry
Contracts and Report
Contracts
application
program
Owing to the peculiar nature of traditional file-based systems, as highlighted above, a number of
limitations are associated with the approach, which include;
o separation and location of data: which creates access problems when data is isolated in separate
files
o -Duplication of data: owing to the fact that each program manages its own data. In a
decentralized organisation, each department managing its own data definitely results in
uncontrolled duplication of data.
o Data dependence: owing to the fact that the application code defines the physical structure
and storage of the data files and records. This creates manipulative difficulties.
o Incompatible file formats: the fact that different application programs cannot share some data
because of the differences in file formats, such as COBOL, C, etc.
o Fixed queries/proliferation of application programs: new and different queries have to be
written for each type of request or operation: different jokes for different folks.
- Although, traditional file-based systems still exist in some specific areas, their limitations
provided fresh impetus for the development of Database Management Systems (DBMS), which
according to several texts/literature, originated in the 1960s.
- Although the above report was not formerly adopted by ANSI, a number of systems have
emerged following the DBTG proposal. These systems are known as CODASYL or DBTG systems. The
CODASYL and hierarchical approaches represented the first-generation of DBMSs. Nevertheless, these
systems shared some fundamental disadvantages which include;
- requirement of complex program to be written to answer simple queries based on
navigational record-oriented access;
- Minimal data independence; and
- absence of widely accepted theoretical foundation.
- It is tempting to continue the discussion by looking at the future data models [Coronel, et al. p.
40, 2011] but this section is becoming too lengthy but before we proceed by briefly looking at the
remaining sections; suffice to mention that 1990s also saw the rise of the Internet, the three-tier
client-server architecture and the demand to allow corporate databases to be integrated with web
applications.
The late 1990s (1998) saw the development of XML (eXtensible Markup Language), which has
had profound effect on many aspects of IT, including database integration, graphical interfaces,
embedded systems, distributed systems and database systems.
Currently, most major DBMS vendors provide what is called date warehousing solutions. Data
warehouses are specialized DBMSs, which have evolved, making it possible to store data drawn from
several data sources, possibly maintained by different operating units of an organization. Such
systems provide comprehensive data analysis facilities to allow strategic decisions to be made based
on, for example, historical trends. [We may be lucky to discuss this concept sometimes towards the
end]. Another example of the recent development of specialized DBMSs is the Enterprise Resource
Planning (ERP) system, which is an application layer built on top of a DBMS that integrates all the
business functions of an organization, such as manufacturing, sales, finance, marketing, shipping,
invoicing, and human resources. Popular ERP systems are SAP R/3 from SAP and Peoplesoft from
Oracle.
Have you heard of SISs?
- A DBMS performs several important functions, and has several promising potential advantages.
In the main, the functions of the DBMS guarantee the integrity and consistency of the data in the
database. Most of the functions are transparent to end users, while some others can be achieved
through the use of the DBMS. The functions include; data dictionary management, data storage
management, data integrity management, data transformation and presentation, security
management, multi-user access control, backup and recovery management, database access
languages and application programming interfaces (API), database communication interfaces, and a
host of others.
Comments [Students are required to explain these].
- The advantages of the DBMS are in line with the functions they perform. These advantages
include; control of data redundancy, data consistency, more information from the same amount of
data, data sharing, improved data integrity, improved security, enforcement of standards, economy of
scale, balance of conflicting requirements, improved data accessibility and responsiveness, increased
productivity, improved maintenance through data independence, increased concurrency, improved
backup and recovery services.
- Unfortunately, there are some Challenges associated with the implementation of the DBMS.
These include; complexity the functionality of DBMSs, gives rise to complex piece of software
demanding that designers, developers, administrators and end users must understand the
functionality to take advantage of the system. The size of storage, costs in terms of the DBMS,
associated hardware, and conversion; together with performance and higher impact of failure also
add to the disadvantages of DBMS.
- These are four distinct types of people that participate in the DBMS environment. Data and
database administrators, database designers, application developers, and the end users.
- The data and database administration are the roles generally associated with the management
and control of a DBMS and its data. Respectively, the Data Administrator (DA) is responsible for the
management of the data resource, including database planning, development and maintenance of
standards, policies and procedures, and conceptual/logical database design. The DA consults with
and advises senior managers, ensuring that the direction of database development will ultimately
support corporate objectives. On the other hand, the Database Administrator (DBA) is responsible for
the physical realization of the database, including physical database design and implementation,
security and integrity control, maintenance of the operational system, and ensuring satisfactory
performance of the application for users. The technical requirement of the DBA is higher than that of
the DA. Some organizations do not distinguish between the two.
The database designers can be of two categories; the logical and the physical DB designers.
- The former identifies the data (ie the entities and attributes), the relationship between the data,
and the constraints on the data. This demands understanding of the business rules. The
physical DB designer deals with the physical realization of the logical database. This involves
8
mapping the logical database design into set of tables and integrity constraints, selecting specific
storage structures and access methods, in addition to designing adopting appropriate security
measures required on the data.
- The application developers: deal with the development of application programs that provide
required functionality for the end-users. They work from/with specifications produced by system
analyst. Each program contains statements that request the DBMS to perform some operation on the
database.
- End-users are the clients for the database, which has been designed and implemented, and is
being maintained to serve their information needs. End-users can be classified as naïve users or
sophisticated users depending on their degree of competence and/or expense.
CONCLUSION
To conclude this introduction (which of need is long), we have to call the following to mind;
- Data are raw facts. Information is the result of processing data to reveal its meaning.
Accurate, relevant, and timely information is the key to good decision making, and good decision
making is the key to organizational survival in a global environment.
- Data are usually stored in a database. To implement a database and to manage its content,
you need a database management system (DBMS). The DBMS serves as the intermediary between
the user and the database. The database contains the data collected and the make data, data about
data.
- Database design defines the database structure and is required to be designed well. Databases
evolved from manual and their computerized file systems. In a file system, data are stored in
independent files, each requiring its own data management programs. The approach is now largely
outmoded, but acts as reference for DBMS improvements.
9
1.5 COURSE OUTLINE
- The titles to be discussed in this course, CS 321 (Database Management System I), are in
complication with NUCs requirements. However, consideration is given to the ongoing curriculum
improvement efforts in the Department, which seek to address the requirements of international
standards and best practices. The course content, as specified by NUC, as follows:
Recommended Texts
Introductory Comments
- The basic concepts and terminologies pertain to all the relevant aspects of database and the
management systems. In the regular parlance or language of database management systems, these
concepts will normally, range from issues of conceptualization of the database, analysis, modeling,
design, implementation, and management.
- Notably, this section of the course fundamentally cuts across all the sections of database
systems. We shall limit the discussions on the topics outlined above.
Central computer
Workstation 2
Workstation 1 Workstation n
Network
Medium
Data Data
(files returned)
Requests Response
Database
File-server
Figure 2: File-server Architecture
1
This approach can generate a significant amount of network traffic. The shortfalls includes;
large amounts of network traffic, the requirement of a full copy of the DBMS on each workstation,
and the issues of concurrency, recovery, and integrity control become more difficult as many DBMS
may be accessing the same files.
- In the context of the database, the client manages the user interface and the application logic,
acting as a sophisticated workstation on which to run database applications. The client takes the
users requests, checks the syntax and generates database requests in SQL or another database
language appropriate to the application logic. It then transmits the message to the server, waits for a
response, and formats the response for the end-user. The server accepts and processes the database
requests, then transmits the result back to the client.
- The processing involves checking authorization, ensuring integrity, maintaining system catalog,
and performing query and update processing. There is also provision for concurrency and recovery
controls.
- The advantages of client-server architecture include: wider access to existing database,
increased performance, reduction in hardware costs and communication costs, increased consistency,
and easy conformance to the open-systems architecture.
NB: The above discussion is not intended to be ignorant of distributed computing, web-based
systems, cloud computing, and recent developments, some of which will be discussed subsequently.
It is intended to open up reasoning in the area of database architectures, while keeping the section as
short as possible.
- This section, like its predecessor, further introduces more fundamental concepts in database
systems, by precisely introducing those found in database structures. Apparently, to store
information in a computer, consideration must be given to the pieces of information to be stored.
This is usually in terms of its size and what type of information is to be stored.
- The first consideration is about the required fields. Fields are the categories of information
that the database is going to store. For example, a university that issues transcripts to graduated
students. To produce a transcript database, relevant fields would include; student Name, RegNo,
Programme Type, Department, Courses, Year-of-Enrolment, Year of Graduation, etc.
Once a decision is reached with respect to the relevant fields, the next phase is to start
collecting the data.
- All the information for one person (or thing), is a collection of related fields is called a record.
Thus, each student in the transcript database would have a record in the database.
1
- Similarly, a collection of related records make up a table in the database. Tables are also
known as files, if you are using a flat file database.
- It is important to note that most database programs allow, or rather require, that each field in
the database be associated with a type. The type of information indicates what data type is to be
stored in that field. Common field/data types include; integers, float, character, dates, Boolean. More
advanced databases allow for multimedia objects, such as pictures, sound and video.
- Field types are used to enforce a basic type of validation (consistency) in that the database
does not allow an item of a different type to be stored in a field of another type, say, a text in a date
field. Field types also facilitate sorting, as the database understands the contents of the field.
- A certain field usually called the key field or primary key is the field that uniquely identifies
each record in the database. It significantly produces a logical structure for the database. We shall
discuss more about this when we begin to design our databases.
- Fields are also associated with indexes. An index (just like the index in a book) is an extra bit
added on to the database to help the database program find records quickly. Superfluous use of
indexes is discouraged owing to the overhead involved when creating the records. Nevertheless,
indexing of key fields and those used in searching or sorting is encouraged.
- The database language is a generic term referring to a class of languages used for defining and
accessing databases. A particular database language will be associated with a particular database
management system.
- There are two distinct classes of database languages: those that do not provide complete
programming facilities and are designed to be used in association with some general-purpose
programming language (the host language), and those that do provide complete programming
facilities (database programming languages).
- The products adopting the approach of incomplete programming facilities seek to minimize
host-language programming by the provision of fourth generation language (4GL) facilities.
- A database language must provide for both logical-schema specification and modification (data
description) and for retrieval and update (Data manipulation). In line with the CODASYL network
database standard, some products, complying with the standard, treat data description manipulation
distinctly using the Data Description Language (DDL) and the Data Manipulation Language (DML).
These are sometimes regarded as the data sublanguage.
- The DDL is used to specify the database schema, and can be defined as a language that allows
the DBA or user to describe and name the entities, attributes, and relationships required for the
application, together with any associated integrity and security constraints. While the DML is used to
both read and update the database, and can be defined as a language that provides a set of
operations to support the basic data manipulation operations on the data held in the database.
- The compilation of DDL statements results in a set of tables stored in special files collectively
called the system catalog. The system catalog integrates the meta data, which in turn is the data that
describes objects in the database and makes it easier for those objects to be accessed or manipulated.
The meta data contains definitions of records, data items, and other objects that are of interest to
users or are required by the DBMS. The terms data dictionary and data directory are also used to
describe the system catalog.
1
- With respect to DML, data manipulation operations usually include; data insertion,
modification, retrieval, or deletion from the database. One of the main functions of the DBMS is to
provide support for data manipulation such that the user can construct statements for data
manipulation. The part of the DML that involves data retrieval is called a query Language. A query
language is a high-level special-purpose language used to satisfy diverse requests for the retrieval of
data held in the database. Furthermore, DMLs can be classified as either procedural or non-
procedural depending on their underlying retrieval constructs. The procedural DMLs allow the user to
tell the system what data is needed and exactly how to retrieve the data while the non-procedural
DMLs allow the user to state what data is needed rather than how it is to be retrieved.
- DMLs provide users with considerable data independence, by freeing the user from having to
know how data structures are internally implemented and what algorithms are required to retrieve
and possibly transform the data. RDBMS usually include some form of non-procedural language for
data manipulation. For example SQL, or QBE (Query-by-Example). Non-procedural DMLs are
normally easier to use and learn than the procedural.
- We finish the section by looking at 4GLs. In actual fact, there is still some confusion as to what
constitutes a 4GL. In essence, a 4GL is non-procedural. The user defines what is to be done, not how.
A 4GL relies largely on much higher-level components known as fourth-generation tools.
Nevertheless, 4GLs are known as productivity enhancing tools. They encompass the following;
- Presentation language, such as query language and report generators. SQL and QBE are
examples.
- Speciality Languages, such as Spreadsheet and Database languages.
- Application generators that define, insert, update, and retrieve data from the database to build
applications.
- Very high-level languages that are used to generate application code.
Other types of 4GLs are; Form generators, Report generators, Graphics generators, and Application
generators.
- The terms data models and database models are often used interchangeably. Nevertheless,
there is a subtle difference between the two terminologies.
- A data model can be described as an integrated collection of concepts for describing and
manipulating data, relationships between data, and constraints on the data in an organization. On the
other hand, a database model is mainly used to refer to the implementation of a data model in a
specific database system.
- The essence of data modeling, which bridges the gap between real-world objects and the
database that resides in the computer, stems from the fact that designers, programmers, and end-
users view data in different ways. The different views of the same data can lead to database designs
that do not reflect an organisations actual operation, thus failing to meet end-user needs and data
efficiency requirements.
- Significantly, a model is a representation of `real world objects and events, and their
associations. It is an abstraction that concentrates on the essential, inherent aspects of an
organization and ignores the accidental properties. Generally, the scale of representation using a
model is usually to the extent that conclusions made on models are applicable to real-world scenarios.
1
Thus, a data model represents the organisation itself, and is required to provide the basic concepts
and notations that will allow database designers and end-users unambiguously and accurately to
communicate their understanding of the organizational data together with their association(s).
- While it is a matter of common knowledge that hardly any two designers can produce the same
model to solve the same problem, what matters is that correct model has to meet all the end-user
requirements. A data model, ready for implementation, should contain the following components.
1. A description of the data structure that will store the end-user data
2. A set of enforceable rules to guarantee the integrity of the data, and
3. A data manipulation methodology to support the real-world data transformation.
- In order to understand the importance of data models, imagine building a house without a
blueprint (a building plan). It is highly unlikely to create a good database without first creating an
appropriate data model. It is not possible to live in a building plan, one cannot draw the required data
out of the data model, which is simply an abstraction. Importantly, data models facilitate interaction
among the designer, the application programmer, and the end-user. Essentially, a data model can
foster improved understanding of the organization for which the database design is developed; data
models are a communication tool.
- There are about three common related data models. They are, the external data model, the
conceptual data model, and the internal data model, which reflects the ANSI-SPARC architecture.
- The external data model is used to represent each user s view of the organization, which is
sometimes called the universe of Discourse (UoD).
Illustration:
Attribute Domain Name Meanings Domain Definition
BranchNo Branch Numbers Set of all possible Char: size 25
Street Street Names branch name Date, range
DOB Date of Birth Set of all street 1-Jan, 1970
Names in Nsukka format: dd-mm-yy
Possible values of
staff birth dates
- Tuple a tuple is a row of a relation. Precisely the elements of a relation are the rows or tuples
in the table. The structure of a relation, together with a specification of the domains and any other
restriction on possible values, is sometimes called its intension, which is usually fixed, unless the
meaning of a relation is changed to include additional attributes. The tuples are called the extension
(or state) of a relation, which changes over time.
- Degree the degree of a relation is the number of attributes it contains. This implies that an n-
tuple relation contain n values. Where a relation of one degree is called a unary relation or one-tuple,
two is called binary, three ternary, and so on. The degree of a relation is a property of the intension of
the relation.
- Cardinality the cardinality of a relation is the number of tuples it contains. Cardinality
changes as tuples are added or deleted. The cardinality is a property of the extension of the relation
and is determined from the particular instance of the relation at any given moment. Finally, we look
at the definition of a relational database.
- Relational database This is thus a collection of normalized relations with distinct relation
name. Implying that a relational database consists of relations that are appropriately structured. This
appropriateness is normally called normalization. This paradigm we shall discuss subsequently.
- Owing to the variety of these terminologies in different texts, we present the alternatives to
the ones already defined.
Formal terms Alternative 1 Alternative 2
Relation Table File
Tuple Row Record
Attribute Column Field
- Furthermore, we note that as far as the concept of mathematical relation is concerned, the
Cartesian product of, say, two sets for instance D1 and D2, is a set of all ordered pairs such that the
first element is a member of D1 and the second element is a member of D2. Any subset of this
Cartesian product is a relation. For example, 1 relation R should be such that
R = {(x,y) \ x D1, y D2}
Demonstration:
D1 = {2, 4} and D2 = {1, 3, 5}
D1 x D2 = {(2, 1), (2, 3), (2,5), (4,1), (4,3), (4,5)}
1
And R can be;
R = {(2,1), (4,1)} coming from the rule, which says
R = {(x, y) \ x D1, y D2, and y = 1}
NB: We shall study properties of relations when we treat ER modeling prior to the section on
normalization.
- Finally, we can assert that we have picked on most of the concepts and terminologies
pertaining to database architecture, structure and languages. Together within those of data models
and relational database models. More concepts will be revealed in the subsequent sections alongside
their meanings as they become necessary.
2
LECTURER THREE
FUNCTIONS AND COMPONENTS OF DBMS
Outline:
- Database environment
- Functions of a DBMS
- Components of a DBMS
- Web Services and Services Oriented Architectures
- From our previous discussion, we stated/saw that the DBTG proposal by CODASYL in 1971, was
not formally adopted by ANSI. Recall that the DBTG proposal recognized the need for a two-
level approach with a system view called the Schema and user views called subschema.
- In spite of the specifications of the DBTG proposal, the ANSI s Standard Planning and
Requirements Committee (SPARS), otherwise called ANSI/X3/SPARC, produced a similar
terminology and architecture in 1975.
- The ANSI-SPARC architecture addresses the need for a three-level approach with a system
catalogue i.e. the specification/list of items that comprise the database environment.
- Even though the ANSI-SPARC model did not become a standard, it still
provides a basis for understanding some of the functionality of a DBMS, and it concentrated on
the need for an implementation independent layer to isolate programs from underlying
representational issues.
- The fundamental point of the ANSI-SPARC report is the identification of
three levels of abstraction ie. The three levels at which data items can be described. These
three levels give rise to a three-level architecture made up of an external, a conceptual, and an
internal levels.
- The stated three-levels are respectively such that; the external level describe
how users perceive the data, the internal level shows the way the DBMS and the operating
system perceive the data, in terms of the actual storage of data using the appropriate data
structures and file organizations. Thirdly, the conceptual level provides both the mapping and
the desired independence between the external and internal levels.
The three-levels are shown in the following figure:
2
User 1 User 2 User n
External View 1 View 2 ... View 3
Level
Conceptual Conceptual
Level Schema
Internal Internet
Level Schema
Physical data
organisation Database
Important Notes:
- The overall description of the database is called the database scheme.
- There are three different types of schemes in the database and these are defined according to
the levels of abstraction of the three-level ANSI-SPARC architecture.
- The highest level is the multiple external schemes (also called subschemes) that corresponds to
different views of the data.
- The conceptual level, conceptual scheme, describes all the entities, attributes, and
relationships together with integrity constraints.
- The lowest level is the internal scheme, which is a complete description of the internal model,
containing the definitions of stored records, the methods of representation, the data fields,
and the indexes and storage structures used. There is only one conceptual scheme and one
internal scheme per database.
- The DBMS is responsible for mapping between these three types of scheme. The DBMS must
check the schemes for consistency; in other words, the DBMS must check that each external
scheme is derivable from the conceptual scheme, and it must use the information in the
conceptual scheme to map between each external scheme and the internal scheme.
- The conceptual scheme is related to the internal through a conceptual/internal mapping, which
enables the DBMS to find actual record or combination of records in physical storage that
constitute a logical record in the conceptual scheme, together with any constraint to be
enforced on the operation of the logical record.
2
- The conceptual/internal mapping also allows the differences in entity names, attributes names,
attribute order, data types, and so on to be resolved.
- Finally, the external schemes is related to the conceptual scheme by the external/conceptual
mapping. This mapping enables the DMBMS to map names in the user s view to the relevant
part of the conceptual scheme. [You can find examples].
- Furthermore, it is worthy of note that a major objective for the three level architecture is to
provide data independence. This means that upper levels are unaffected by changes to lower
levels. There are two kinds of data independence; logical and physical.
- Logical data independence is used to describe the immunity of the external schemes to
changes in the conceptual scheme, such as the addition or removal of new entities, attributes,
or relationships. The actions are required to be possible without having to change existing
external schemes or having to rewrite the application programs. Clearly, the users for whom
the changes have been made need to be aware of them, but what is important is that other
users should not be.
- The physical data independence describes the immunity of the conceptual scheme to the
changes in the internal scheme. This implies that changes to the internal scheme, such as using
different file organizations or storage structures, using different storage devices, modifying
indexes or hashing algorithms, should be made possible without having to change the
conceptual or the external schemes. The only observation on the part of the user should be
change in performance. In fact, performance degradation is the most common reason for
internal scheme changes. The following figure illustrates where each type of data
independence occurs with the three level architectures.
External External External
scheme scheme ... scheme
Physical data
Conceptual/Internal
Independence
Mapping
Internet
Schema
- While the three-level ANSI-SPARC architecture guarantees data independence, there is still
concerns for efficiency. To improve efficiency, owing to the two-storage mapping, the model
2
allows the by-passing of the conceptual scheme, thus, providing for a direct external to internal
scheme mapping, but this reduces data independence, to the extent that changes in the
internal scheme can affect the external scheme and any dependent application.
3.2 Functions of a DBMS
- Recall that the relational data model was based on the paper by E.F. Codd in 1970. The same
man through an award lecture in 1982 published Relational Database: A Practical Foundation
for Productivity, where he enlisted eight services that should be provided by any full-scale
DBMS. Those and some others, to be discussed here, make up the functions of the DMBS that
follow:
1. Data Storage, Retrieval and Update
- This is the fundamental function of a DBMS, and implies that a DBMS must furnish users with
the ability to store, retrieve, and update data in the database. To achieve this purpose, the
DBMS should hide the internal physical implementation details (such as file organization and
storage structures) from the user.
2. A User-Accessible Catalog
- A key feature of the ANSI-SPARC architecture is the recognition of an integrated system catalog
to hold data about the scheme, users, applications, etc. This implies that the DMBS must furnish a
catalog in which description of data items are stored which is accessible to users.
Typically, the system catalog stores the following:
- names, types, and sizes of data items;
- names of relationship
- integrity constraints in the data
- names of authorized users who have access to the data
- the data items that each user can access and the types of access allowed e.g. insert, update,
delete, or read access.
- external, conceptual, and internal schemes and the mapping, between the schemes.
- usage statistics, such as the frequencies of transitions and counts on the number of accesses
made to object in the database.
Infact, the system catalog is one of the fundamental components of the system. Some of the benefits
of a system catalog includes;
- Information about data can be collected and stored centrally. This helps to maintain control
over the data as a resource.
- Definition of what the data means
- Simplification of communication, by conveying data on the meaning and identification of users
and their access rights.
2
- Ease of identification of redundancy and inconsistencies.
- Records of database changes, and
- Enforcement of security and integrity roles, together with audit trails.
NB Sometimes, systems catalog and data directory are used interchangeably.
3. Transaction Support
By definition, a transaction is a series of actions carried out by a single user or application
program, which accesses or changes the contents of the database. To this effect, a DBMS must
provide a mechanism that will ensure that all the updates corresponding to a given transaction are
made or that none of them is made.
4. Concurrency Control Services
- These range of services demand that the DMBS must provide a mechanism to ensure that the
database is updated correctly when multiple users are updating the database concurrently. One
important objective of using a DBMS is to enable many users to access shared data concurrently.
Concurrent access become more complicated when the users go beyond reading data to simultaneous
access moulding updates and interface that can result in inconsistencies. Thus, the DBMS is required
to ensure that when multiple users are accessing the database, interference cannot occur.
5 Recovery Services
These services imply that the DBMS is required to provide a mechanism for recovering the
database in the event that the database is damaged in any way. The failure of any transaction
requires that the database has to be returned to a consistent state. Failures may be the result of a
system crash, media failure (network problem), a hardware or software error causing the DBMS to
stop, or as a result of user action, such as aborting a transaction due to error detected by the user. In
all these cases, the DBMS is required to provide a mechanism to restore the database to a consistent
state.
6. Authorization Services
- The DBMS is required to ensure that only authorized users can access the database. The term
security refers to the protection of the database against unauthorized access, either intentional or
accidental. The DBMS is expected to ensure that the data is secure, using appropriate mechanism.
7. Support of Data Communication
This requires that a DBMS must be capable of integrating with communication software. Most
often, users operate from remote workstations, and have to communicate with the host of the DBMS
over a network. Thus, the DBMS receives requests as communication messages and responds in a
similar way. All such transmissions are handled by a data communication manager (DCM). Although
the DCM is not necessarily part of the DBMS, the DBMS is required to be capable of being integrated
2
with a variety of DCMs if the system is to be commercially viable. It is essential that (PC, workstation)
users should be able to access a centralized database from remote locations.
8. Integrity Services
The term Database integrity refers to the correctness and consistency of stored data: It is a
another type of database protection (requirement of security). To this end, the DBMS is required to
provide a means to ensure that both the data in the database and changes to the data follow certain
rules. Integrity is closely related to security, and is concerned with the quality of data itself. Integrity
is usually expressed in terms of constraints, which are consistency rules that the database is not
permitted to violate. For example, we may wish to specify a constraint that no student is allowed to
do more than 14 courses in one academic session in a students database. Therefore, the DBMS has to
check whether this limit is exceeded when assigning courses to students.
9. Services to Promote Data Independence
A DBMS must include facilities to support the independence of programs from the actual
structure of the database. Recall that data independence is normally achieved through a view or
subschema mechanism. Physical data independence is known to be easier achieved than logical data
independence. The addition of a new entity, attribute, or relationship can usually be accommodated,
but not their removal. In some systems, changes to an existing component in the logical structure is
prohibited.
10. Utility Services
A DBMS is required to provide a set of utility services. Utility programs help the DBA to
administer the database effectively. Some other utilities function internally and can be produced by
the DBMS vendor. Examples of utilities provided by the vendor include:
- Import facilities, to load the database from flat files, and export facilities, to unload the
database to flat files.
- Monitoring facilities, to monitor database usage and operation.
- Statistical analysis programs, to extract performance or usage statistics.
- Index reorganization in facilities, to reorganize indexes and their overflows.
- garbage collection and reallocation, to remove deleted records physically from the storage
devices, to consolidate the space released and to reallocate it where it is needed.
3.3 Components of a DBMS
For the fact that DBMS are highly complex and sophisticated pieces of software, which provide
the functionalities discussed, it may not be easy to generalize the component structure of a DBMS, as
it varies greatly from system to system. However, we present a rather simplistic view of the common
components structure of a DBMS and their relationship: using the figure that follow:
2
Programmers Users DBA
DBMS
File
Manager
Access
Methods
System
Buffers
Database and
System Catalog
- As depicted in the above figure, a DBMS is partitioned into several software components (or
modules), each of which is assigned a specific operation. Notably some of the functions of the
DBMS are supported by the underlying operating system. However, the operating system
2
provides only basic services and the DBMS must be built on top of it. Therefore, the design of a
DBMS must take into account the interface between the DBMS and the operating system.
- The figure shows how the DBMS interfaces with other software components, such as user
queries and access methods (file management techniques for storing and retrieving data
records) in accordance with the figure the components of the DBMS are:
* Query Processor: The major DBMS component that transforms queries into a series of low-level
instructions directed to the database manager.
* Database Manager (DM): which interfaces with user-submitted application programs and
queries. The DM accepts queries and examines the external and conceptual schemes to
determine what conceptual records are required to satisfy the request. The DM then places a
call to the file manager to perform the request. Within the DM are subcomponents that make
it possible for the DM to function. These software components (subcomponents) include;
authorization control, command processor, integrity checker, query optimizer, transaction
manager, scheduler, recovery manager, buffer manager, etc.
* File Manager:
The file manage manipulates the underlying storage files and manages the allocation of storage
space on disk. It establishes and maintains the list of structures and indexes defined in the
internal scheme. If hashed files are used, it calls on the hashing functions to generate record
addresses. However, the file manager does not directly manage the physical input and output
of data. Rather, it passes the request on to the appropriate access methods, which either read
data from or write data into the system buffer (or cache).
* DML Proprocessor:
This module converts DML statements embedded in an application program into standard
function calls in the host language. The DML preprocessor must interact with the query processor to
generate the appropriate code.
* DDL Compiler:
This compiler converts DDL statements into a set of tables containing metadata. These tables
are then stored in the system catalog while control information is stored in data file headers.
* Catalog Manager:
The catalog manager manages access to and maintains the system catalog. The system catalog
is accessed by most DBMS components.
NB: With the understanding of the fundamental components of a DBMS, it becomes possible to easily
relate with the components of most commercial DBMS, such as Oracle, which is based on the client-
server architecture. These are discussed in recommended texts and most literature in the subject.
3.4 Web Services and Service-Oriented Architectures
3
- A web service is a software system designed to support interoperable machine-to-machine
interaction over a network. Though the Internet has allowed companies to provide a wide
range of services to users, sometimes called B2C (Business to Consumer), web services allow
applications to integrate with other applications across the Internet and may be a key
technology that supports B2B (Business to Business) interaction.
- Unlike other web-based applications, web services have no user interface and are not aimed at
web browsers. Web services instead shared business logic, data, and processes through a
programmatic interface across a network. In this way, it is the application that interface and
not the users. Developers can then add the web service to a web page (or an executable
program) to offer specific functionality to users.
- Examples of Web services include:
* Microsoft Virtual Earth Web Services, which offer static map images (gif, jpeg, and png), direct
map file access, search functionality, geocoding, reverse geocoding, and routing. Microsoft
MapPoint Web service provides access to location based services, such as maps, driving
directions, and proximity searches.
* Amazon S3, which is a simple Web services interface that can be used to store and retrieve
large amounts of data, at any time, from anywhere on the Web. It gives any developer access
to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon
uses to run its own global network of Websites. Charges are based on the pay as you go
policy and about $0.15 per GB for the first 50TB/month of storage used, as at 2011.
* Geonames, which provides a number of location-related web services; for example, to return a
set of Wikipedia entries as XML, documents for a given place name or to return the time zone for a
give latitude/longitude.
* DOTS Web services from Objectservices, an early adopter of web services, provide a range of
services such as company information, reverse telephonenumber Lookup, email address
validation, weather information, IP address-to-location determination.
The Key to Web services approach is the use of widely accepted technologies and standards,
such as:
* XML (eXtensible Markup Language)
* SOAP (Simple Object Access Protocol), which is a communication protocol for exchanging
structure information over the Internet and uses a message format based on XML. It is both
platform-and language-independent.
* WSDL (Web Services Description Language) protocol, which is again based on XML, and used to
describe and locate a Web service.
3
* UDDI (Universal Discovery, Description, and Integration) protocol is a platform-independent,
XML-based registry for business to list themselves on the Internet. It was designed to be
integrated by SOAP messages and to provide access to WSDL documents describing the
protocol bindings and message formats required to interact with the Web services listed in its
directory.
- From the database perspective, webservices can be used both from within the database (to
invoke an external web service as a consumer) and the web service itself can access its own
database (as a provider) to maintain the data required to provide the requested service. The
details of SOAP, WSDL, and UDDI are outside the scope of this course.
* Service-Oriented Architecture (SOA)
In brief, a service-oriented architecture is a business-centric software architecture for building
applications that implement business processes as sets of services published at a granulainly relevant
to the service consumer. Services can be invoked, published, and discovered, and are abstracted
away from the implementation using a single standard-based form of interface.
- Some common SOA principles that provide a unique design approach for building web services
for SOA are; loose coupling, reusability, contract, abstraction, composibility, autonomy, stateless,
discoversibility, etc.
3
LECTURE FOUR
FILE DESIGN AND ACCESS PATH
Outline:
- Database Design
- Entity Relationship (ER) Modelling
- Normalisation
- Introduction to SQL Database Access
- Analysis of Open Database Connectivity Standard.
System Definition
Requirement collection
and analysis
Database
Design
Conceptual design
DBMS Application
selection Design
(optional)
Logical design
Physical design
Prototyping Implementation
(optional)
Data Conversion
and Loading
Testing
Operational
Maintenance
Assignment 1:
- Briefly discuss the main activities associated with each stage of the database system
development lifecycle suitable for medium to large database system.
- By definition, database design can be viewed as the process of creating a design that will
support the enterprises mission statement and mission objectives for the required database system.
- The two main approaches to the design of a database are referred to as bottom-up and top-
down approaches. Where are bottom-up approach begins at the fundamental level of attributes (ie,
3
properties of entities and relationships), which through analysis of the associations between
attributes are grouped into relations that represent types of entities and relationships between
entities. Fundamentally, the process of normalization (to be discussed) represents a bottom-up
approach to database design. Normalization involves the identification of the required attributes and
their subsequent aggregation into normalized relations based on fundamental dependencies between
the attributes. The bottom-up approach is more suitable for simple database with a relatively small
number of attributes.
- The top-down approach is a more appropriate strategy for the design of complex databases,
which are known to show difficulty in establishing all the attributes to be included in the data models.
The top-down approach starts with the development of data model that contain a few high-level
entities and relationships and the applies successive top-down refinements, to identity lower-level
entities, relationship, and the associated attributes.
- Some other approaches to database design are inside-out approach, which is somehow related
to bottom-up, but differs by first identifying a set of major entities and then spreading out to consider
other entities, relationships, and attributes associated with those first identified. These are also the
mixed strategy approach, which uses both the bottom-up and top-down approach for various parts of
the model before finally combining all parts together.
- Before proceeding to discuss ER modeling, we note the following;
(i) The conceptual database design is the process of constructing a model of the data used in an
enterprise, independent of all physical consideration.
(ii) The logical database design is the process of constructing a model of the data used in an
enterprise based on a specific data model, but independent of a particular DBMS and other
physical considerations.
(iii) The physical database design, being the third and final phase of the database design process, is
the process of producing a description of the implementation of the database on secondary
storage; it describes the base relations, file organizations, and indexes used to achieve efficient
access to the data, and any associated integrity constraints and security measures.
(iv) DBMS selection is the process of selecting an appropriate DBMS to support the database
system. Where no DBMS exists, a decision and selection can be made between the conceptual
and logical phases of the database design lifecycle. The rain steps involved in selecting a DBMS
include; defining terms of reference of study, shortlisting two or three or more products,
evaluating products, and recommending selection and producing report.
Comment:
We leave the general description of the stages to the discretion of students, as it forms the first
assignment (Assignment 1).
3
4.2 ENTITY RELATIONSHIP (ER) MODELLING
- The design of a database commences once after the requirements collection and analysis stage
of the DBMS are completed. To ensure a precise understanding of the nature of the data and how it is
used by the enterprise, there is the need for a model for communication that is non-technical and free
of ambiguities. A typical example of such a model is the Entity Relationship (ER) model.
- The Entity Relationship (ER) model is a top-down approach to database design that begins by
identifying the important data called entities and relationships between the data that must be
represented in the model. Therefore, more details, such as the information that should be held about
the entities and relationships called attributes and any constraints on the entities, relationships, and
attributes, are added/included in the model. ER modeling is, therefore, an important technique for
any database designer to master.
- Remarkably, entities are used to represent real life objects, and therefore, are used
interchangeably with objects. However, entities are categorized according to types. Where an entity
type is a group of objects with the same properties, which are identified by the enterprise as having
an independent existence. Thus, the basic concept of the ER model is the entity type, having an
independent existence and can be objects with a physical (or real ) existence or objects with a
conceptual (or abstract) existence. Note that the definition of entity types is subjective and usually
a matter that is at the discretion of designers, implying that different designers may identify different
entities (no formal definition exist)
- Each uniquely identifiable object of an entity type is referred to simply as an entity occurrence.
A database normally contains many different entity types, as we find in examples.
- In a typical database, the entities thereof are often related to some ways to one another,
termed a relationship. The entities that participate in a relationship are also known as participants,
and each relationship is identified by a name that describes the relationship. The relationship name is
an active or passive verb; eg a STUDENT takes a CLASS, a PROFESSOR teaches a CLASS, a DEPARTMENT
employs a PROFESSOR, a DIVISION is managed by an EMPLOYEE an AIRCRAFT is flown by a CREW, etc.
We can identify the objects and the relationships.
- A relationship type is a set of meaningful associations between one or more participating entity
types, and are given names that describe their function. While a relationship occurrence is a uniquely
identifiable association that includes one occurrence from each participating entity type.
- Conventionally, diagrams are used for the representation of relationship types. The diagram
and the symbols are issues that are also subjective to the discretion of designers. In any case,
relationships are depicted with directions where some can only make sense in one direction, e.g
BRAHCH has staff makes more sense than STAFF has BRANCH. The number of participants in a
3
relationship type is called the degree of that relationship. Thus, the degree of a relationship indicates
the number of entity types involved in a relationship; where degree two is called binary, 3 tertiary etc.
Eg of tertiary relationship, involving entities; STAFF, BRANCH, and CLIENT
CLIENT
- Also relationship of degree four is called quaternary
- Sometimes, there exists a relationship type in which the same entity type participates more
than once in different roles. This concept is normally regarded as recursive relationship. For example,
consider a situation where a staff (supervisee) is supervised by a supervisor who is also a member of
staff. This is depicted as follows;
supervises
(role supervisor)
Staff
(role supervisee)
- In some other cases, two entities can be associated through more than one relationship. For
example, the STAFF and BRANCH entity types can be associated through two distinct relationships
called manages and has: shown as
- Entities are known and described by their properties. The particular properties of entity types
are called attributes. For example, a STAFF entity type may be described by the staffNO, name,
position, and salary attributes.
- The attributes hold values that describe each entity occurrence and represent the main part of
the data stored in the database. A relationship type that associates entities can also have attributes
similar to those of an entity type.
- We note that the set of allowable values for one or more attributes is called the attribute
domain. Also, attributes can be simple, composed of a single component with an independent
existence, or composite, composed of multiple components, each with an independent existence. For
example, the address attribute of the BRANCH entity with the value (100 Agbani Road, Enugu, Nigeria)
can be subdivided into street (100 Agbani Road), city (Enugu and Country (Nigeria) attributes.
3
The decision for composite attributes or simple attributes depends on the requirement of the
user views which in the example refers to whether the address attribute should be treated as a simple
attribute or subdivided as shown.
- Furthermore, attributes can be singled-valued or multidivided. They are single-valued when
they hold a single value for each occurrence of an entity type and multi-valued otherwise. For
example, when a BRANCH entity has a telNO attribute, and the telNO has 2 values such as 0141-339-
2178 and 0141-339-4439.
- Sometimes, the values held by some attributes may be derived. A derived attribute is one that
represent, a value that is derivable from the value of a related attribute or set of attributes, not
necessarily in the same entity type. For example in a TRANSCRIPT entity type, the value of course-
duration attribute of the STUDENT entity can be calculated from yrStart and yrFinish attributes of also
the STUDENT entity type.
- Quite importantly for the reasons of sorting and searching purposes is a method to uniquely
identify each occurrence of an entity type. The minimal set of attributes that uniquely identifies each
occurrence of an entity type is called the candidate key.
- Whereas the primary key is the candidate key that is selected to uniquely identify each
occurrence of an entity type. For example, consider the use of studentNO attribute to uniquely
identity a student in STUDENT entity. It is observed that the choice of primary key for an entity is
based on considerations of attribute length, the minimal number of attributes required, and the
future certainty of uniqueness.
- Where there exists two candidate keys that can be used to identify each occurrence of entity
type, such as StaffNO and NIN (National Insurance Number) for a STAFF entity, StaffNO can be used as
primary key while NIN can be used as an alternate key.
- In some cases, the key of an entity type is composed of several attributes, known as composite
key, whose values together are uniquely for each entity occurrence but not separately. For example,
an ADVERT entity can have propertyNO, NewspaperName, advertDate, and Cost attributes. There can
exist several adverts in many newspapers on a given date. Therefore, to uniquely identify each
occurrence of the ADVERT entity will require a composite primary key made up of; PropertyNO,
NewspaperName, and AdvertDate attributes.
- By convention, in ER diagrams, the entities are often drawn in rectangles, where the first divide
of each rectangle shows the entity name and the second divide shows the attributes starting with the
primary key. Any composite attribute is usually indented to the right of the attribute the subdivisions
concern.
- Entities are sometime classified as strong or weak. Where a strong entity types is one that is
not existence-dependent on some other entity type. Eg. STUDENT, DEPARTMENT, COURSES, etc. A
3
characteristic of a strong entity type is that each entity occurrence is uniquely identifiable using the
primary key attributes of that entity type. Eg. We can uniquely identify a student using studentNO.
While a weak entity type is one that is existence-dependent on some other entity type. A
characteristic of a weak entity is that each entity occurrence cannot be uniquely identified using only
the attributes associated with that entity type (Students can find examples of weak entities in texts).
- Usually, constraints are placed on entity types that participate in a relationship. The
constraints should reflect the restrictions on the relationships as perceived in the real world .
Example is that each department must have students and each course must have a lecturer. The main
type of constraint on relationship is called multiplicity. Where multiplicity is the number (or range) of
possible occurrences of an entity type that may relate to a single occurrence of an associated entity
type through a particular relationship.
- Multiplicity constrains the way that entities are related. It is a representation of the policies (or
business rules) established by the user or enterprise. Ensuring that all appropriate constraints are
identified and represented is an important part of modeling an enterprise.
- The most common degree for relationships is binary. Binary relationships are generally
referred to as being one-to-one (1:1), one-to-many (1:*), or many-to many (*:*).
Examples:
- A DEPARTMENT has HEAD (1:1);
ha
DEPT HEAD
s
- Furthermore, multiplicity actually consists of two separate constraints known as cardinality and
participation. Where cardinality describes the maximum number of possible relationship occurrences
for an entity participating in a given relationship type; and participation determines whether all or
only some entity occurrence participate in a relationship to the case. Where all entity occurrences are
involved in a particular relationship we have mandatory participation, and optional participation
otherwise.
- For reasons of space and time, we conclude this subsection with the requirements for
developing an ER diagram. Building an ERD usually involves the following activities;
* Create a detailed narrative of the organisations description of operations
3
* Identify the business rules based on the description of operations
* Identify the main entities and relationships from business rules.
* Develop the initial ERD
* Identify the attributes and primary keys that adequately describe the entities.
* Revise and review the ERD.
NB
(1) During the review process, it is likely that additional objects, attributes, and relationships will
be uncovered. Therefore, the basic ERM will be modified to incorporate the newly discovered ER
components. As a matter of fact, the review process is repeated until the end users and designers
agree that the ERD is a fair representation of the organizations activities and functions.
(2) During the design process, the database designer does not depend simply on interviews to help
define entities, attributes and relationships. A surprising amount of information can be gathered by
examining the business forms and reports that an organization uses in its daily operations.
4.3 NORMALIZATION
- The design of a database demands an accurate representation of the data, relationships
between the data, and constraints on the data that is pertinent to the enterprise. We already stated
that ER modeling is a model for communication that is non-technical and free of ambiguities, which
ensures a precise understanding of the nature of the data and how it is used by the enterprise.
Another database design technique is that of normalization.
- Normalization is a technique for producing a set of relations with desirable properties, given
the data requirements of an enterprise. The technique begins by examining the relationships (called
functional dependencies) between attributes. Normalization uses a series of tests (described as
normal forms) to help identify the optimal grouping for these attributes to ultimately identify a set of
suitable relations that supports the data requirements of the enterprise.
4
- It is important to note that the characteristics of a suitable set of relations include the
following;
* The minimal number of attributes necessary to support the data requirements of the
enterprise.
* The minimal number of attributes necessary to support the data requirements of the
enterprise.
* Attributes with a close logical relationship (descried) as functional dependency) are found in
the same relation.
* Minimal redundancy, with each attribute represented only once, with the important exception
of attributes that form all or part of foreign keys, which are essential for the joining of related
relations. Where, precisely, a foreign key is an attribute, or set of attributes, within one relation
that matches the candidate key of some (possibly the same) relation.
- The benefits of using a database that has a suitable set of relations is that the database will be
easier for the user to access and maintain the data, and take up minimal storage space on the
computer. These are the importance of the process of normalization.
- The process of normalization can be used as a bottom-up standalone database technique, or as
a validation technique to check the structure of relations, which may have been created using a top-
down approach such as ER modeling. Notwithstanding the approach, the common goal is to create a
set of well-designed relations that meet the data requirements of the enterprise.
- In order to further appreciate the problems of data redundancy and update anomalies which
normalization seeks to address, let us consider the following set of relations;
1. STAFF (staffNO, sName, Position, Salary, branchNO)
2. BRANCH (branchNO, bAddress)
3. STAFFBRANCH (staffNO, sName, Position, Salary, branchNO, bAddress)
- In the occurrence of relations described by the above attributes, we observe that the
STAFFBRANCH relation allows for redundant data, for the reason that different staff with their
respective staff numbers belonging to the same branch requires that both the branchNO and
bAddress be repeated for the respective tuples. This problem does not arise in the case of the STAFF
and BRANCH relations. The implication is that the STAFFBRANCH relations having redundant data will
have problems normally regarded as update anomalies, which are classified as insertion, deletion, or
modification anomalies.
- An important concept associated with normalization is functional dependency, which describes
the relationship between attributes in a relation. For example: if A and B are attributes of relation R, B
is functionally dependent on A (denoted by A B), if each value of A is associated with exactly one
value of B. (A and B may each consists of one or more attributes).
4
- Functional dependency is a property of the meaning or semantics of the attributes in a relation.
The semantics indicate how attributes relate to one another, and specify the functional dependencies
between attributes. When a functional dependency is present, the dependency is specified as a
constraint between the attributes. Normally, the term determinant refers to the attributes, or group
of attributes, on the left-hand side of the arrow of a functional dependency.
- Functional dependencies can be identified through discussions with end-users, or
documentation, or by the experience of the designers. Functional dependencies can also be used to
identify primary keys.
[Comment]
- SQL is reputable for being relatively easy language to learn. Thus, it suffices to treat the
fundamentals here.
- SQL is a nonprocedural language, which implies that the user simply has to specify `what
information is required, rather than `how to get it. In other words, SQL does not require the
specification of the access methods to the data.
* In line with most modern languages, SQL is essentially free format, implying that parts of
statements do not have to be typed at particular locations on the screen.
* The command structure consists of Standard English words such as CREATE TABLE, INSERT,
SELECT.
For example;
- CREATE TABLE staff (staffNO VARCHAR(5), 1Name VARCHAR (15), salary DECIMAL (7,2));
- INSERT INTO Staff VALUES (`SG16, `Ugwu, 50000);
- SELECT StaffNo, 1Name, Salary
FROM Staff
WHERE Salary > 10000;
- SQL can be used by a range of users including database administrators (DBA), Management
Personnel, Application developers, and many other types end-users.
- There is also the UPDATE and DELETE Commands and others supported by the Language.
[Students can do much more on their own].
DATABASE PLANNING
- Database planning encompasses the activities that allows the stages of the database
development lifecycle (DBDLC) to be realized as efficiently and effectively as possible. Thus, database
planning must be integrated with the overall IS strategy of the organization.
- There are 3 main issues involved in formulating an IS strategy. There are;
* identification of enterprise plans and goals with subsequent determination of information
system needs;
* Evaluation of current information systems to determine existing strengths and weaknesses.
* Appraisal of IT opportunities that might yield competitive advantage.
- An important first step in database planning is to clearly define the mission statement for the
database system. The mission statement defines the major aims of the database system, and is
normally articulated by the key drivers of the Dbase project, such as the Director and/or owner. In
principle, a mission statement helps to clarity the purpose of the database system and provide a
clearer path towards the efficient and effective creation of the required database system.
- Further work to show illustration of the development of a doc planning activity.
* Database Design
Comment: For database design kindly refer to section 4.1.
Further work should consider specific illustration/determination of all the stages of database design.
4
DATABASE ADMINISTRATION
Database administration may be seen as the services of on-going activities involving the planning,
design, evaluation, and monitoring of performance and any consequential modification of the storage
schema for purposes of improvement.
- The role involved in database administration is an amalgam of technical and managerial
activities. Where the technical aspects of database administration involve the following areas of
operation;
* Evaluating, selecting, and installing the DBMS and related utilities
* Designing and implementing databases and applications.
* Testing and evaluating databases and applications
* Operating the DBMS, utilities, and applications.
* Training and supporting users
* Maintaining the DBMS, utilities, and applications
- Infact, many of the technical activities of database administration are logical extensions of the
managerial activities.
* Furthermore to include more details of managerial activities with illustrations, if possible.
Comment: The design approach for OODBMS is beyond the scope of this course and is, therefore,
recommended for further work.
Comment: The essence of this note is simply to mention the concepts that students should be aware
they exist.
Transaction Management
- The term transaction refers to an action, or series of actions, carried out by a single user or
application program, that reads or updates the contents of the database. In this regard, therefore, a
transaction is a logical unit of work on the database. A transaction may be an entire program, a part
of a program, or a single statement (For example, the SQL statement, INSERT or UPDATE) and it may
involve any number of operations on the database.
Next: Detailing of the definition, advantages, composition with OLTP, issues, architecture, tools and
techniques.
DATA MINNING
- This is a concept that is often associated with data warehousing. To realize the value of a data
warehouse, it is necessary to extract the knowledge hidden within the warehouse. Data mining
comes in handy, as one of the best ways to extract meaningful trends and patterns from huge amount
of data. Data mining discovers within data warehouses information that queries and reports cannot
effectively reveal.
- By definition, data mining can be seen as the process of extracting valid, previously unknown,
comprehensible, and actionable information from large databases and using it to make crucial
business decisions.
- Data mining is concerned with the analysis of data and the use of software techniques for
finding hidden and unexpected patterns and relationships in sets of data. It tends to work from the
4
data up, and the techniques that produce the most accurate results normally require large volumes of
data to deliver reliable conclusions.
- The process of analysis starts by developing an optional representation of the structure of
sample data, during which time knowledge is acquired. This knowledge is then extended to larger
sets of data, working on the assumption that the larger data set has a structure similar to the sample
data.
- Examples of data mining applications include; retail/marketing where data mining is used to
identify buying patterns of customers, finding associations among customer demographic
characteristics, predicting response to mailing campaigns, market basket analysis, etc.
- In banking where you have detection of patterns of fraudulent credit card use, identifying loyal
customers, predicting customers likely to change their credit card affiliation, determining credit card
spending by customer groups.
- Insurance-claims analysis, predicting which customers will buy new policies.
- Medicine-characterizing patent behaviour to predict surgery visits, identifying success medical
therapies for different illnesses.
- There are four main operations associated with data mining techniques, which include
predictive modeling, database segmentation, link analysis and deviation detection. Each of these
operations have their associated techniques that guarantee their effectiveness.
* Furthermore is to detail the techniques, discuss data mining tools. Then discussion of data
mining and data warehousing.
4
LECTURE SIX
FUTURE DIRECTIONS IN DBMS
- Database Security
- Distinguished DBMS
- Distributed Relational D-base Design
- Web Technology and DBMS
- Cloud Computing
The above measures and future projections provide the specification for the development of DBMSs
to gather their evolution.