Adele Kuzmiakova - The Creation and Management of Database Systems-Arcler Press (2023)

The Creation and Management of
Database Systems
THE CREATION AND
MANAGEMENT OF DATABASE
SYSTEMS
Edited by:
Adele Kuzmiakova
ARCLER
P r e s s
www.arclerpress.com
The Creation and Management of Database Systems
Adele Kuzmiakova
Arcler Press
224 Shoreacres Road
Burlington, ON L7L 2H2
Canada
www.arclerpress.com
Email: orders@arclereducation.com
e-book Edition 2023
ISBN: 978-1-77469-673-6 (e-book)
This book contains information obtained from highly regarded resources. Reprinted material
sources are indicated and copyright remains with the original owners. Copyright for images and
other graphics remains with the original owners as indicated. A Wide variety of references are
listed. Reasonable efforts have been made to publish reliable data. Authors or Editors or Publish-
ers are not responsible for the accuracy of the information in the published chapters or conse-
quences of their use. The publisher assumes no responsibility for any damage or grievance to the
persons or property arising out of the use of any materials, instructions, methods or thoughts in
the book. The authors or editors and the publisher have attempted to trace the copyright holders
of all material reproduced in this publication and apologize to copyright holders if permission has
not been obtained. If any copyright holder has not been acknowledged, please write to us so we
may rectify.
Notice: Registered trademark of products or corporate names are used only for explanation and
identification without intent of infringement.
© 2023 Arcler Press
ISBN: 978-1-77469-442-8 (Hardcover)
Arcler Press publishes wide variety of books and eBooks. For more information about
Arcler Press and its products, visit our website at www.arclerpress.com
ABOUT THE EDITOR
Adele Kuzmiakova is a machine learning engineer focusing on solving problems

in machine learning, deep learning, and computer vision. Adele currently works as a
senior machine learning engineer at Ifolor focusing on creating engaging photo stories
and products. Adele attended Cornell University in New York, United States for her
undergraduate studies. She studied engineering with a focus on applied math. Some of
the deep learning problems Adele worked on include predicting air quality from public
webcams, developing a real-time human movement tracking, and using 3D computer
vision to create 3D avatars from selfies in order to bring online clothes shopping closer
to reality. She is also passionate about exchanging ideas and inspiring other people
and acted as a workshop organizer at Women in Data Science conference in Geneva,
Switzerland.
TABLE OF CONTENTS
List of Figures.........................................................................................................xi
List of Tables........................................................................................................xiii
List of Abbreviations.............................................................................................xv
Preface............................................................................................................ ....xix
Chapter 1 Introduction to Database Systems.............................................................. 1

1.1. Introduction......................................................................................... 2
1.2. Why Databases?.................................................................................. 2
1.3. Data Vs. Information............................................................................ 4
1.4. Introducing the Database..................................................................... 6
1.5. Importance of Database Design......................................................... 12
1.6. Evolution of File System Data Processing........................................... 13
1.7. Problems with File System Data Processing....................................... 18
1.8. Database Systems.............................................................................. 24
References................................................................................................ 34
Chapter 2 Data Models............................................................................................. 45

2.1. Introduction....................................................................................... 46
2.2. Importance of Data Models............................................................... 47
2.3. Data Model Basic Building Blocks..................................................... 48
2.4. Business Rules................................................................................... 50
2.5. The Evolution of Data Models............................................................ 54
References................................................................................................ 66
Chapter 3 Database Environment............................................................................. 73

3.1. Introduction....................................................................................... 74
3.2. Three-Level Ansi-Sparc Architecture................................................... 74
3.3. Database Languages.......................................................................... 81
3.4. Conceptual Modeling and Data Models............................................. 86
3.5. Functions of a DBMS......................................................................... 91
3.6. Components of a DBMS.................................................................... 97
References.............................................................................................. 102
Chapter 4 The Relational Model............................................................................. 111

4.1. Introduction..................................................................................... 112
4.2. Brief History of the Relational Model............................................... 112
4.3. Terminology..................................................................................... 114
4.4. Integrity Constraints......................................................................... 125
4.5. Views............................................................................................... 128
References.............................................................................................. 132
Chapter 5 Database Planning and Design............................................................... 139

5.1. Introduction..................................................................................... 140
5.2. The Database System Development Lifecycle................................... 141
5.3. Database Planning........................................................................... 143
5.4. Definition of the System................................................................... 144
5.5. Requirements Collection and Analysis............................................. 145
5.6. Database Design.............................................................................. 149
References.............................................................................................. 154
Chapter 6 Data Manipulation................................................................................. 159

6.1. Introduction..................................................................................... 160
6.2. Introduction to SQL......................................................................... 160
6.3. Writing SQL Commands.................................................................. 165
6.4. Data Manipulation........................................................................... 167
References.............................................................................................. 173
Chapter 7 Database Connectivity and Web Technologies...................................... 179

7.1. Introduction..................................................................................... 180
7.2. Database Connectivity..................................................................... 180
7.3. Internet Databases........................................................................... 194
7.4. Extensible Markup Language........................................................... 204
References.............................................................................................. 208
viii
Chapter 8 Database Administration and Security................................................... 217
8.1. Introduction..................................................................................... 218
8.2. The Role of a Database in an Organization...................................... 220
8.3. Introduction of a Database............................................................... 222
8.4. The Evolution of Database Administration Function......................... 224
References.............................................................................................. 230
Index...................................................................................................... 235
ix
LIST OF FIGURES
Figure 1.1. Taking raw data and turning it into information

Figure 1.2. The database management system (DBMS) controls the interface between
the end-user then the database
Figure 1.3. The customer file’s contents
Figure 1.4. The agent file’s contents
Figure 1.5. A straightforward file structure
Figure 1.6. Comparing and contrasting database and file management organizations
Figure 1.7. The database system’s surroundings
Figure 1.8. Using Microsoft SQL server express to visualize information
Figure 1.9. Using oracle to demonstrate data storage management
Figure 2.1. Creating connections between relational tables
Figure 2.2. A diagram that shows how things are connected
Figure 2.3. Notations for Chen and crow’s foot
Figure 2.4. OO, UML, and ER models are compared
Figure 3.1. The ANSI-SPARC 3-level design
Figure 3.2. The distinctions between the 3 levels
Figure 3.3. The ANSISPARC 3-level architecture and data independence
Figure 3.4. This is an example of a relational schema
Figure 3.5. This is an example of a network schema
Figure 3.6. This is an example of a hierarchical schema
Figure 3.7. The lost update
Figure 3.8. Main elements of a DBMS
Figure 3.9. Elements of a database manager
Figure 4.1. Relationships between the branch and also the staff
Figure 4.2. Some Branch and Staff Relations qualities have their domains
Figure 4.3. A DreamHome rental database instance
Figure 5.1. The steps in the building of a database system
Figure 5.2. User views (1, 2, and 3) as well as (5 and 6) contain overlapping criteria
(represented as hatched regions), however, user view four has diverse demands
Figure 5.3. Multiple user views 1 to 3 are managed in a centralized manner
Figure 5.4. Controlling multiple users’ views 1 to 3 utilizing the view integration
technique
Figure 6.1. A graphical example of the manipulation of data files
Figure 6.2. SQL: A basic overview
Figure 7.1. ORACLE intrinsic connectivity
Figure 7.2. Utilizing ODBC, RDO, and DAO to access databases
Figure 7.3. Setting an oracle ODBC source of data
Figure 7.4. MS Excel utilizes ODBC to link to the database of oracle
Figure 7.5. OLE-DB design
Figure 7.6. Framework of ADO.NET
Figure 7.7. Framework of JDBC
Figure 7.8. Web-to-database middleware
Figure 7.9. Web server API and CGI interfaces
Figure 7.10. The productlist.xml page’s contents
Figure 8.1. The cycle of data-information-decision making
Figure 8.2. The IS department’s internal organization
Figure 8.3. The placement of the DBA function
Figure 8.4. A DBA functional organization
Figure 8.5. Multiple database administrators in an organization
xii
LIST OF TABLES
Table 1.1. The characteristics of many famous file organization systems are compared
Table 1.2. Terminology for basic files
Table 2.1. Major data models have changed over time
Table 4.1. Alternate names for terms in relational models
Table 5.1. A synopsis of the principal activities connected with each phase of the
DSDLC
Table 5.2. The requirements for an optimum data model
Table 6.1. Table of results for example #1
Table 6.3. With duplicates, the outcome table for example #3
Table 6.4. Duplicates are removed from the outcome table for example #3
Table 7.1. Example OLE-DB interfaces and classes
Table 7.2. Example ADO objects
Table 7.3. Features and advantages of internet technologies
LIST OF ABBREVIATIONS
3GL 3rd-generation language

4GL fourth-generation language
ADO ActiveX data objects
ANSI American National Standards Institute
API application programming interface
ASP active server pages
ATM automated teller machines
B2C business to consumer
BNF Backus Naur form
CGI common gateway interface
COM component object model
DBA database administrator
DBLC database life cycle
DBMS database management system
DBTG Data Base Task Group
DCM data communication manager
DDL data definition language
DFD data flow diagrams
DLL dynamic-link librarie
DM database manager
DML data manipulation language
DP data processing
DSDLC database system development lifecycle
DSN data source name
EDP electronic data processing
ER entity-relationship
ERD entity-relationship diagram
FIPS federal information processing standard
fName first name
GUI graphical interface
HIPO hierarchical input process output
INGRES interactive graphics retrieval system
IRDS information resource dictionary system
IRM information resource manager
IS information systems
ISAPI internet server API
ISLC information systems lifecycle
ISO International Organization for Standardization
JDBC java database connectivity
lName last name
MDM master data management
NDL network database languages
ODBC open database connectivity
OLAP OnLine analytical processing
OLE-DB object linking and embedding for database
OODM object-oriented data model
QBE query-by-example
RDA remote data access
RDBMS relational database management system
RDL relational database language
RDO remote data objects
SAA systems application architecture
SAD structured analysis and design
SDLC software development lifecycle
SGML standard generalized markup language
sNo staff number
SPARC Standards Planning and Requirements Committee
SQL structured query language
UDA universal data access
UML unified modeling language
xvi
UoD universe of discourse
W3C world wide web consortium
WSAPI WebSite API
XML extensible markup language
xvii
PREFACE
The database system has become one of the most significant advancements in software
engineering due to the extraordinarily productive history of database research over the
past 30 years. The database has now become the foundation of information systems (IS)
and drastically altered how organizations store and access information. This technology
has advanced significantly recently, resulting in more robust and user-friendly designs.
Database systems are becoming increasingly accessible to a broader audience due to
this. Unfortunately, users have created databases and apps without the skills essential to
create a reliable and successful system because of the systems’ seeming simplicity. And
thus, the “software crisis,” or “software slump,” as it is often frequently called, persists.
The authors’ experience in the industry, where they provided database design consulting
for new software systems and addressed shortcomings with existing systems, served
as the initial impetus for this book. Additionally, the authors’ transition to academia
resulted in the same issues being raised by other users—students. Therefore, the goals
of this book are to create a textbook that explains database theory as simply as possible.
The approach for relational Database Management Systems (DBMSs), the current
industry standard for business applications, is explained in this book and has been
tried and proven throughout time in both academic and industrial settings. Conceptual,
logical, and physical database design are the three key stages that make up this process.
The creation of a conceptual data model that is unrelated to any physical factors is the
first step in the first phase. By eliminating components that cannot be represented in
relational systems, this model is subsequently improved in the second step to becoming
a logical data model. The physical design for the target DBMS is converted from the
logical data model in the third step. The storage configurations and access techniques
needed for effective and safe access to the database on secondary storage are taken into
account during the physical design phase.
The book is divided up into a total of eight different chapters. The reader is given an
introduction to the principles of database systems in the first chapter. The data models
are discussed in considerable depth in Chapter 2. The Database environment is covered
in great detail in Chapter 3.
In Chapter 4, the relational model is presented to the readers of the book. The planning
and design of the database are given a lot of attention in Chapter 5. The concept of
data manipulation is broken down and shown in Chapter 6. Database connectivity
and web technologies are also discussed in Chapter 7. In the last chapter, “Chapter 8,
Administration and Security of Databases,” we discuss database administration and
security.
This book does an outstanding job of providing an overview of the many different
aspects of database systems. The content is presented in such a way that even an
untrained reader should have no trouble understanding the fundamental ideas behind
DBMSs if they read this text.
xx
CHAPTER 1
INTRODUCTION TO DATABASE
SYSTEMS
CONTENTS
1.1. Introduction......................................................................................... 2
1.2. Why Databases?.................................................................................. 2
1.3. Data Vs. Information............................................................................ 4
1.4. Introducing the Database..................................................................... 6
1.5. Importance of Database Design......................................................... 12
1.6. Evolution of File System Data Processing........................................... 13
1.7. Problems with File System Data Processing....................................... 18
1.8. Database Systems.............................................................................. 24
References................................................................................................ 34
2 The Creation and Management of Database Systems
1.1. INTRODUCTION
Databases and the systems that manage them are an indispensable part of
life within today’s contemporary civilization. The majority of us participate
in so many activities daily that need some level of connection with only a
database (Valduriez, 1993). For instance, if we’re in a financial institution
to deposit money or withdrawal, if we make reservations at a restaurant or
aviation, if we use computer-based library resources to seek a bibliometric
item, or even if we buy anything online like a book, plaything, or computer,
there is a good chance that our actions will engage somebody or computer
simulation accessing a dataset. This is the case regardless of whether we are
the ones trying to access it or not. Even just the act of buying things in a
supermarket may sometimes immediately access the data that maintains the
inventories of food products (Güting, 1994).
Such exchanges are instances of what we could refer to as classic
database systems. In such applications, the vast majority of data which is
stored as well as retrieved seems to be either text-based or numerical. In the
recent past, fascinating new uses of database management systems (DBMSs)
have been made possible as a result of developments in technology (Zaniolo
et al., 1997). The prevalence of online sites for social networks, including
Facebook, Twitter, as well as Flickr, amongst many others, has necessitated
the formation of massive databases that stockpile non-conventional data,
including such articles, Twitter posts, photos, and short videos, amongst
other things. These databases can be found on websites like Facebook,
Twitter, and photo sharing sites. To manage information for social media
sites, new kinds of database systems are developed. These new forms of
database systems are sometimes referred to as large data storage devices or
NoSQL platforms (Jukic et al., 2014).
Those certain types of technology are often used by firms such as Amazon
and Google to manage information needed in one’s search engine and also
provide cloud services. Cloud storage is a method by which customers are
supplied with systems that store on the web for the management of all types
of data which include files, photos, video files, and e-mails. Cloud storage
is supplied by companies like Google, Yahoo, and Amazon (Abadi et al.,
2013).
1.2. WHY DATABASES?

Imagine having to run a company without realizing what your consumers
are, which items you offer, who your employees are, whoever you owe that
Introduction to Database Systems 3
money to, and also who users owe to. All firms must store this sort of data,
as well as more, then they essentially make that information accessible to the
executive when they require it. It may be contended that the real goal of any
corporate data system is to assist organizations in using the information as a
resource. Data gathering, collection, compilation, modification, distribution,
and administration are at the core of all these technologies (Karp et al., 2000).
That data might range from just a few megabytes on only one or two
themes to tera-bytes spanning hundreds of subjects inside the firm’s micro and
macro environment, based on the kind of data system and the features of the
organization. Sprint and AT&T are both recognized for having organizations
that stock statistics on billions of telephone conversations (Tolar et al., 2019),
through fresh facts actuality uploaded to the network at rates of up and
around to 70,000 conversations. Such business must non-solitary store and
handle massive amounts of data, but they must also be able to swiftly locate
any given information within that information. Consider the situation of
Google, the Web’s most popular search engine. Although Google is reticent
to provide many specifics regarding its digital storage specs, the situation is
thought that the corporation replies to concluded 91 million inquiries each
diurnal over an information set that spans many terabytes. Surprisingly, the
outcomes of these queries are virtually instantaneous (Zhulin, 2015).
How are these companies going to handle all this information? How
will they be able to save everything and then rapidly retrieve just the
information that judgment needs, whenever they require it? The solution
would be that databases are being used. Databases were complex and
interconnected enabling computer-based programs to save, organize, and
retrieve information fast, as detailed in-depth inside this textbook (Grimm et
al., 2013). Practically completely present commercial organizations depend
on databases; therefore, each data management skill necessity consumes a
systematic grasp of what way these frameworks are produced and how to
utilize them properly. Although your professional path does not lead you
down the exciting route of file design and construction, databases would
be a serious constituent of the technology you deal with. In just about any
event, you’ll possibly type choices in your profession created on data-driven
knowledge. As a result, understanding the distinction between information
and data is critical (Thomson, 1996).
1.3. DATA VS. INFORMATION

You should first comprehend the distinction between information and data
to comprehend that drives database structure (Madnick et al., 2009).
Data are unprocessed facts. The term “bare” denotes that the information
has not been filtered to disclose its significance. Consider the following
scenario: you would like to understand what customers of a computer room
think about its services. For most cases, you’d start by polling users to
evaluate the computer lab’s efficiency (Changnon & Kunkel, 1999). The
Website survey form is shown in Figure 1.1, Panels A, which allows people
to answer your queries. All raw data from the survey questionnaire is stored
in a data source, similar to the ones illustrated in Figure 1.1, Panel B after
it has been finished. Although you will have the information, studying
page after page of 0 and 1 is unlikely to bring many insights (Chen et al.,
2008). As a result, you convert the raw data in the data overview shown in
that Panel C of Figure 1.1. It was now feasible to acquire rapid answers to
inquiries like “How diverse is our workshop’s consumer improper?” Popular
this scenario, you can easily see that the majority of consumers (24.59%)
are juniors and seniors (53.01%). You present the descriptive statistical bar
diagram in Figure 1.1, Panels D, since visuals may help you rapidly extract
information from data (Van Rijsbergen, 1977).
Information remains the outcome of analyzing uncooked statistics to
discover its significance. Dispensation might range from by way of basic as
arranging the statistics to highlight trends to as complicated as forecasting or
inferring conclusions using mathematical techniques (Batini & Scannapieco,
2016).
Information needs context to show its significance. The temperature
measurement of 105°C, for instance, means nothing until you know what it
means in sense: This is in Fahrenheit or Celsius degree courses? Is it just a
mechanical temperature, a physiological temperature, or the temperature of
the outside air? Data may be utilized as a starting point for making decisions
(Destaillats et al., 2008).
Figure 1.1. Taking raw data and turning it into information.
Source: https://www.researchgate.net/figure/The-transformation-process-from-
raw-data-into-knowledge-3-US-Joint-Publication-3–13_fig1_281102148.
The information description for each item here on the survey questionnaire,
for example, may highlight the workroom’s fortes and limitations, allowing
you to type more knowledgeable judgments about how to improve service
lab clients. Remember that original data should be correctly formatted
before being stored, processed, or presented. For instance, in Figure 1.1,
Panels C, the pupil classification is organized to present statistics based
on the Freshmen, Junior, Junior, Seniors, and Master’s Student categories
(McKelvey & Ordeshook, 1985). For data preservation, the yes/no replies
of the participants might necessity to be translated into a Y/N arrangement.
When dealing with sophisticated data types like audio, movies, or photos,
more extensive formatting is necessary treated, or accessible. In Figure 1.1,
Panel C, for example, the learner categorization is grouped to display data
for freshmen, Juniors, Seniors, and Master’s Students. The yes/no responses
of the respondents might have to be rehabilitated towards a Y/N platform
for information retention. The additional thorough format is obligatory once
employed through complicated statistics formats such as music, video, or
pictures (Cheng et al., 2002).
The ability to make appropriate decisions within that “information
era” requires the development of reliable, useful, and fast data. As a result,
competent judgment is critical to a firm’s success in a world market (Stowe et
al., 1997). The “knowledge era” is now thought to be upon us. 2 Facts are the
basis of knowledge, which would be the collection of facts and information
regarding a certain topic. Knowledge entails being acquainted with,

conscious of, or comprehending data as it relates to a certain environment.
The ability to generate “new” information from “old” information is an
important feature of knowledge (Schultze & Avital, 2011).
Let’s have a look at a few essential points (Rapps & Weyuker, 1985):
• The building components of knowledge are data.
• Data analysis generates information.
• The purpose of the information is to explain the sense of data.
• The key to creating excellent decisions is having reliable, relevant,
and real-time information.
• In a global world, a wise decision is essential for organizational
sustainability.
Reliable data is required for timely and meaningful data. Relevant data
should be correctly created and kept in an accessible and processable way
(Sidorov et al., 2002). The data ecosystem, like every fundamental asset,
should be properly maintained. The subject of managing data is concerned
with the correct creation, preservation, and recovery of statistics. Specifying
the importance of statistics, this would be unsurprising that data analysis
is a vital task for every company, government organization, non-profit, or
charitable (Kovach & Cathcart Jr, 1999).
1.4. INTRODUCING THE DATABASE

A computer database is usually required for effectively managing data.
This database is a centralized, interconnected computer system that holds a
combination of (Cox et al., 1964):
• Final data, or basic facts that are of importance to the customer.
• The final data is integrated and controlled via metadata, usually
data around data.
The metadata describes the data qualities by way of fine for example
the series of relates that attach the information in the folder. This metadata
section, for instance, preserves evidence about the identity of every statistics
item, the caring of standards placed happening every data item (numbers,
periods, or else textual), uncertainty the statistics component may be
left empty, etc. The metadata works in favor of and utility of the data by
providing additional information (Becerik-Gerber et al., 2014).
Metadata, in a husk, delivers an extra complete view of statistics in a

record. Since of the quality of data, a folder may be mentioned as a “gathering
of identity statistics” (Martinikorena et al., 2018).
A database management system (DBMS) is the usual program which
maintains the folder’s design and then restricts admission to the information
it stores. Popular about conducts, a catalog resembled a well-electronic
folder cupboard, through a classy software program called a management
system assisting in the administration of the coalition’s content (Zins, 2007).
1.4.1. Role and Advantages of the DBMS

The DBMS acts a link between the operator besides the server. Every data
model is kept as a set of files, with the DBMS being the sole mechanism
to retrieve the data within these file systems. The DBMS provides the
conclusion customer (or software applications) with a unified, combined
model of the statistics stored in the database, as shown in Figure 1.2. All
program requirements remain conventional through the DBMS, which
before converts them hooked on the complex procedures essential to fulfill
them. The folder organization system (DBMS) conceals an amount of the
record’s fundamental difficulty after requests and operators. The software
application could be developed by a developer in a computer language like
Visual Studio, Java, or C#, or it could be generated by a DBMS software
developed (Sigmund, 2006).
Figure 1.2. The database management system (DBMS) controls the interface
between the end-user then the database.
Source: https://slideplayer.com/slide/14646685/.
The employment of a DBMS among both the employer’s programs and

also the file has many benefits. The DBMS, for starters, allows the database’s
contents to also be shared across various applications and customers. Second,
the DBMS combines the many distinct users’ perspectives on information
hooked on a solitary, all-inclusive statistics source (Hübner, 2016).
While statistics is the most important rare physical out of which data
remains provided, you’ll need a strong data management system. The
DBMS, as you’ll see in this book, makes data administration more efficient
and productive. A DBMS, for instance, offers benefits such as (LaFree &
Dugan, 2007):
• Data Exchange has Been Improved: The database server
(DBMSs) facilitates the construction of an ecosystem wherein
programs have much more efficient access to information. With
this sort of accessibility, individuals can respond quickly to
changes within their surroundings.
• Data Security has Been Improved: The risk of information
leakage increases as the number of individuals who’ve had access
to data grows. The company invested a significant amount of time,
effort, and funds to ensure that its data is properly implemented.
A database server (DBMSs) creates a basis for much more
effectively implementing information security requirements.
• Integration of Data is Improved: The availability of excellent
data offers a more comprehensive view of the employee’s
performance as well as a better grasp of the larger picture. That’s
a lot easier to see how actions in one section of the company
affect the rest of the company this way (Freilich et al., 2014).
• Data Inconsistency is Reduced to a Minimum: Information
discrepancies occur when several copies with the same data hub
are in multiple places. Once a company’s sales department shops
a trying to sell representative’s individuality as “Bill Brown”
however its worker clothes shops the very same user’s self-
identify as “William G. Brown,” and when a company based on
geography marketing firm displays an overall sales of $45.95,
however, its geographic marketing firm displays the very same
overall sales as $43.95, information corruption is rampant. The
odds of having conflicting data are much reduced in a good
system (Broatch et al., 2019).
• Data Accessibility Has Been Improved: T The Database allows

for the quick production of replies to ad hoc questions. An enquiry
is a demand for data manipulation submitted to a DBMS (Database
server), including retrieving or changing data. Simply said, an
investigation is an enquiry, and an ad-hoc basis investigation is
a query posed in a spontaneous situation. The DBMS (Database
server) replies to the request with such a reply (recognized as the
exploration dataset). Ending people may require speedy responses
to enquiries (ad hoc queries) when working with big numbers of
sales statistics, for example (Degrandi et al., 2020).
1. How much money did each product sell in the last 6 months?
2. What was our selling bonus for the last 3 months for each
of your sales representatives?
3. Why do so many of our clients owe us $3,000 or more on
their accounts?
4. Better decision-making abilities. Information and greater
information connect directly essentially allowing again for
the creation of high information, that can then be used to
make wise choices. The effectiveness of the data generated
is determined by the honesty of the original information.
Quality of the data refers to a comprehensive approach to
guaranteeing data accuracy, authenticity, and accessibility.
Whereas a database server (DBMSs) cannot guarantee data
quality, it may provide a platform for such efforts.
5. End-user productivity has increased. Users will be able
to make quick, informed decisions depending upon the
availability of data as well as the technology that converts
data into usable information, which could be essential to a
good global economy. The advantages of using a database
model (DBMS) aren’t limited to those listed above. In
reality, as you get a better understanding of the technical
aspects of systems as well as how to correctly create them,
you may discover a slew of other advantages (Gerardi et al.,
2021).
1.4.2. Types of Databases

A file organization system (DBMS) may handle a wide range of data
stores. Databases are categorized based on the number of customers, record
position(s), and projected kind besides the degree of usage. The amount of
operators in such a database affects whether it is solitary or multiuser. Just
one user may access a central database at such a time. To put it another way,
if customer A used the data, customers B and C should wait unless he or she
is through. A desktop database is a centralized repository that operates on
such a personal computer (Chen et al., 2021).
Multiple access databases, on the other hand, may accommodate several
users simultaneously. A workplace database is just a multiple-user database
that serves a limited amount of users (typically less than 50) or a single
department within an organization. The organization database is used by the
entire company and accommodates a high amount of users (usually hundreds)
from various departments (Johnson, 2021; Kodera et al., 2019). The folder
could potentially be classified based on its location. A central database, for
instance, handles data from a single location. A dispersed database is a type
that allows data that is dispersed over several places. Consolidated Database
Systems define the extent to which a database may be extended and how that
dispersion is managed (Gavazzi et al., 2003).
Unfortunately, the most common method of categorizing databases
nowadays is depending on how they’ll be utilized and the temporal quality
of the data obtained from them. Events like product and service purchases,
payments, and supplier purchases, for instance, indicate important day-to-
day activities. Such activities must be correctly and promptly documented.
An operating database is a type that is mainly used to assist a firm’s day-
to-day activities (occasionally mentioned as a transactional or making
database) (Urvoy et al., 2012; Beets et al., 2015). A database server, on the
other hand, is mainly concerned with data storage that is utilized to create
information needed to make strategic or tactical choices. To create price
judgments, sales predictions, positioning strategies, and other decisions,
substantial “data massaged” (data modification) is usually required (Joshi
& Darby, 2013). Most judgment data is derived from database systems and
kept in data stores over time. Furthermore, the database server may store
information generated from a variety of sources. The data storage structure
differs from that of an operating or operational database to make it simpler
to obtain such data. Storage systems, Business Analytics, and Data Stores
are all terms that refer to the design, deployment, and usage of data stores
(Cole et al., 2007).
Databases may also be divided into categories based on how organized
the data is. Data that occurs in the situation unique (rare) form, that is, inside
the structure in which it remained collected, remains mentioned as formless

information. By way of an outcome, shapeless statistics happen in a form
which organizes non-advance of the situation to the dispensation of material.
Statistics processing is the outcome of collecting complex statistics and
organizing (organizing) them to make it easier to store, utilize, and generate
data (Rosselló-Móra et al., 2017). Architecture (format) is applied depending
on the sort of information processing you plan to undertake. Approximately
information may non remain prepared (shapeless) for about kinds of analysis,
nonetheless, they may remain suitable (organized) for others. The data point
37890, for example, maybe a postcode, a sales amount, or a product code.
You cannot do mathematical operations with just this value if it reflects a
postcode or an item number and is saved as text. If this number indicates a
sales transaction, however, it must be formatted as a numerical score (Yoon
et al., 2017).
Consider stacks of freshly printed bills to better show the structural
notion. You may be scanned and save all the receipts in a visual format
if you only wish to keep them as photos for search purposes and display.
Using visual storage, on either hand, will not be beneficial if you wanted
to generate statistics like quarterly totals or average revenue. Instead, you
might save the invoicing data in an (organized) spreadsheet file and conduct
the necessary calculations from there. Most data you see may be categorized
as data that can be organized (Kim et al., 2012).
Data that has been partially processed is referred to as semi-structured
information. When you glance at a standard Website page, for instance, the
material is displayed to you in a predetermined manner to communicate
some data. The database kinds covered so far are all concerned with storing
and managing highly organized data. Companies, on the other hand, are not
restricted to the usage of structured data. Semi-structured or unstructured
data are also used. Consider the very useful information included in
corporate e-mails, memoranda, procedures, and regulations papers, Web
sites, and so on. A younger group of databases termed XML databases has
been developed to handle the storing and administration of unorganized
and semistructured information. Expandable Mark – up language (XML)
is a text-based mark-up language for representing and manipulating data
components. Semi-structure XML data may be stored and managed in an
XML library (El Haddad et al., 2017) (Table 1.1).
Table 1.1. The Characteristics of Many Famous File Organization Systems are
Compared
1.5. IMPORTANCE OF DATABASE DESIGN

The efforts centered just on the development of a database schema that’s
used to store and organize finished data are referred to as data modeling.
A collection that fits all of the user’s needs does not appear anywhere; its
architecture must be meticulously planned. The majority of this textbook
is devoted to the preparation of powerful network design approaches since
the data model is an important component of dealing with networks. With
an improperly built database, even just a decent DBMS will perform badly
(Bell & Badanjak, 2019; Herodotou, 2016). To create a good database,
the designer must first figure out what the database will be used for. The
need for reliable and consistent information, as well as operating speed,
is emphasized while building a transaction database. The utilization of
archived and consolidated data is stressed when creating data storage
databases. A centralized, solitary database needs a different strategy than
a dispersed, multiple-access database. To use these double multiple tables
common in most databases, decomposing relevant data sources of data and
information is indeed a procedure. Each component of the data should be
appropriately deconstructed and kept in its table (Bissett et al., 2016). In
addition, the links between such tables should remain prudently researched
and then executed consequently that the unified opinion of both the statistics
may stand reproduced for example data again for the conclusion customer
afterwards. A very well file types statistics management calmer while also
producing precise and relevant information (Kam & Matthewson, 2017).
A seriously built folder is more likely to become a breeding place for
hard-to-trace mistakes, which may lead to poor making decisions, which
can ultimately to a group’s collapse. It’s just too critical to leave database
design to chance. This is why database design is a popular subject among
college students, why businesses of all kinds send employees to database
development workshops, and also why data modeling consultants may earn
a lot of money (Tiirikka & Moilanen, 2015).
1.6. EVOLUTION OF FILE SYSTEM DATA

PROCESSING
Examine what a file isn’t to have a better understanding of what this is, what
it accomplishes, and how to utilize it properly. Recognizing the shared data
limits that databases strive to overcome may be aided by a quick description
of the development of system files information processing. Recognizing
these constraints is important for database design professionals since
database technology does not miraculously solve these issues (Smith &
Seltzer, 1997); rather, they make it simpler to devise solutions which avoid
them. Developing database systems that avoid the errors of prior systems
necessitates a designer’s understanding of the previous systems’ flaws as
well as how to overcome them, or indeed the database technology will
be no superior (perhaps even better!) than just the bits of knowledge and
methodologies they have superseded (Maneas & Schroeder, 2018).
1.6.1. Manual File Systems

Every organization should develop systems to handle fundamental business
functions to remain effective. Such methods were frequently manual,
document systems in the past. The articles in such systems were structured
to make the data’s planned application as simple as possible. This has been
usually performed using a filing system consisting of manila folders and
filing systems. The manual approach performed effectively as a data source
as soon as data gathering was limited and a group’s business customers
had few reporting needs. Maintaining track of the data in a mechanical
file system got increasingly challenging as firms developed and reporting
needs became more sophisticated. As a result, businesses turned to digital
knowledge aimed at assistance (McKusick & Quinlan, 2009).
1.6.2. Computerized File Systems

Reporting after mechanical data files remained sluggish besides inconvenient.
Whenever a well-designed system was employed, some company managers
encountered state reporting obligations that demanded weeks of intense
labor each quarterly (Kakoulli & Herodotou, 2017). As a result, a statistics
dispensation (DP) expert remained employed to develop a computer-based
organization for tracking data and generating essential bits of intelligence.
The electronic records within system files were originally comparable to the
physical files. Figure 1.3 shows a basic example of client information files
for a modest insurance firm. (You’ll see immediately that, even though the
file system illustrated in Figure 1.3 is common data storage, it’s unsuitable
for a database) (Sivaraman & Manickachezian, 2014).
The explanation of digital data necessitates the use of specialist
terminology. To allow professionals to communicate effectively, each
profession generates its language. The fundamental file vocabulary presented
in Table 1.2 will make it easier for you to follow along with the rest of the
conversation (Heidemann & Popek, 1994).
Figure 1.3. The customer file’s contents.
Table 1.2. Terminology for Basic Files
You may recognize the file elements displayed in Figure 1.3 by using
the right file nomenclature listed in Table 1.2. There are ten entries in the
User file illustrated in Figure 1.3. C PHONE, C DESIGNATION, C ZIP,
C ADDRESS, A PHONE, AMT, TP, and REN are the 9 fields that make
up every record. The ten entries are saved in a file with a unique name.
The filename of the document in Figure 1.3 is Consumer since it includes
customer information again for an insurance agency (Pal & Memon, 2009).
Business customers issued demands to the DP expert for information
from the electronic file. For every demand, the DP expert had to write
programs to get data files, alter this as the consumer desired, and printed
them out. When a user requests a previously ran report, the DP expert may
restart the program and display the findings for them. Other corporate users
want to examine the data in the same inventive ways that consumer data
was being presented (Blomer et al., 2015; Zhang et al., 2016). This resulted
in increased demands for the DP expert to develop electronic files of those
other company data, which necessitated the creation of new data processing
and demands for reporting. The insurer company’s sales team, for instance,
developed a Sale file to monitor daily sales activities. The effectiveness of
the sales team remained so apparent that the employment section wanted
admission to a DP expert in automating payroll services as well as other
employment duties. As a result, the DP expert was requested to produce
the Agents file seen in Figure 1.4. The information in the Agents file was
utilized for a variety of functions, including writing checks, keeping track
of taxes collected, and summarizing health insurance (Adde et al., 2015).
Figure 1.4. The agent file’s contents.
The drawbacks of this sort of file structure became evident as more
digitized files were generated. Whereas these issues are discussed in further
depth in the following section, in summary, the issues revolved around
having a bunch of file systems containing relevant, frequently overlapped
data and no way to control or manage data uniformly across whole files.
Every file inside the network utilized one’s application software to store,
access, and modify data, as illustrated in Figure 1.5. Each file belonged to
the section or entity that had requested its development (Merceedi & Sabry,
2021).
Figure 1.5. A straightforward file structure.
The introduction of computer systems toward stock firm statistics was

crucial; it not solitary marked a turning point popular the usage of processer
technology then likewise marked a substantial advancement in a company’s
capacity to process data. Employees had immediate, practical admission to
altogether corporate statistics before. However, they lacked the necessary
tools to transform the data into the data they required (Saurabh & Parikh,
2021). The development of computerized data files provided them with
better tools for modifying firm data, allowing them to generate new data.
Unfortunately, it had the unintended consequence of creating a rift among
end consumers and their information. The need to bridge the gap among end-
users’ data inspired the development of a wide range of technological tools,
and designs, as well as the use (and misuse) of a wide range of technologies
and methodologies. However, these advances resulted in a divide in how
data was interpreted by DP professionals and end consumers (Ramesh et al.,
2016).
• The electronic files inside the operating organization remained
designed to remain identical to the physical records, according to
the DP expert. To add to, edit, and remove data files, information
management applications were invented.
• The technologies, from the standpoint of the end customer,
isolated the people from the information. The postponement
between when the consumers thought up a fresh way of creating
knowledge from big data and when the DP expert could start
creating the initiatives to start generating that information became
a cause of great disillusionment as the consumers’ competitive
landscape pushed the others to start making even more choices
in even less time.
1.6.3. File System Redux

The demand of operators aimed at direct, practical admission to the
information assisted fashionable the development of desktop processors for
corporate purposes. However, not straight connected to the development of
data files, the extensive practice of individual efficiency apps strengthens
sources similar subjects by way of older file types (Johnson & Laing, 1996).
Spreadsheet applications for personal computers, including Microsoft
Excel, are popular among business users because they enable users to input
information in a sequence of columns and rows and edit it using a variety of
purposes. The widespread usage of worksheet programs consumes allowed
users to do complex data investigations, considerably improving their

capacity to comprehend information and make better choices. However,
users have gotten so competent at dealing with a spreadsheet that they prefer
to use them to do jobs about which sheets aren’t fit, although in the old
saying “When all you have is just a hammer, every issue seems like a nail”
(Prabhakaran et al., 2005).
One of the most typical ways spreadsheets are misused is as a database
replacement. Consumers, for example, often store the limited information to
then they have immediate access to cutting-edge a worksheet in a structure
like that of conventional, mechanical information storing organizations—
exactly how early DP professionals did when building electronic data files.
The resultant “folder scheme” of spreadsheets suffered from the very same
difficulties as that of the file systems built by the initial DP specialists,
which are discussed in the following unit, owing to the high quantity of
operators using worksheets, respectively producing independent duplicates
of information (Oldfield & Kotz, 2001).
1.7. PROBLEMS WITH FILE SYSTEM DATA

PROCESSING
The new approach to organizing and storing information was a significant
advance over the physical process, and it helped a valuable role in information
processing for more than two decades—a significant amount of time inside
the computer age. Despite this, several flaws and limits in this technique
became apparent. A criticism of the system files technique accomplishes
two goals (Ghemawat et al., 2003):
• Knowing the file program’s flaws can help you comprehend the
evolution of current databases.
• Most of those issues aren’t exclusive to the file system. And
although database technology is making it simple to evade of
that kind problem, a lack of understanding of them will almost
certainly result in their redundancy in a data center.
The difficulties with data files, while produced by DP professionals or
by a succession of spreadsheets, pose a serious threat to the sorts of evidence
that may be derived after the statistics, and also the veracity of the data
(Ovsiannikov et al., 2013):
• Prolonged Development Times: The first and most apparent
fault including both file system systems is that even the most basic
data restoration activity necessitates a large amount of software.

Using older file systems, software engineers had to describe what
eventually happened and just how much remains to be improved.
As you’ll learn in the following chapters, modern databases
employ a behavior elements data management vocabulary that
allows the user to declare whatever needs to be changed while
also describing how it should be done.
Getting timely replies is tough. Because of the need to construct programs
to produce these most basic statistics, ad hoc enquiries are challenging. DP
experts who work with current data sources are often besieged with demands
for extra reports. Researchers are frequently required to say that the research
will be finished “in the following weeks” or “in the next month.” Having to
wait till next week from now will not serve if you require knowledge right
now (McKusick et al., 1984).
• Administration of a Complex System: System administration
becomes increasingly difficult as the number of files in the
system rises. Even just a simple file network with only a few files
demands the establishment and maintenance of several document
management subsystems (respectively binder requirement has
its folder society strategies that certify the worker to enhance,
regulate, and eliminate records; slope the file contents, and
produce bits of intellect). Because ad hoc queries are not possible,
document monitoring systems may quickly expand. The problem
is worsened by the fact that each department of a business “owns”
its data by creating its folders.
• Data Exchange is Restricted due to a Lack of Security: The
second drawback of a systems documents knowledge depository
is its lack of security and limited data interchange capabilities.
Information security and data transmission are tightly
intertwined. Data exchange among multiple individuals who are
geographically distributed creates several security issues. Most
spreadsheet software contains basic security protections, but they
are seldom used, and even when there are, these are insufficient
for successful data sharing between users. Because security and
informational capabilities are hard to implement, they are often
disregarded when developing data management and monitoring
solutions in a systems files context. Password protection security,
the capacity to turn down parts of files or whole systems, and
other measures to protect data secrecy are instances of these kinds
of abilities. Furthermore, when using alerts to improve data and

information protection, their scope and effectiveness are often
limited.
• A lot of Programming: Updating an existing filesystem in a
systems files environment might be difficult.
Modifying only one parameter in the main Client file, for instance,
would need the use of the software (Ergüzen & Ünver, 2018):
• Takings a track after the unique data and delivering it.
• Involves transforming data to encounter the storing needs of the
new construction.
• Converts the information and protects it in a different folder
construction.
• For every entry in the unique data, replication stepladders through
4.
Respectively alteration to a folder’s format, no issue in what way
diffident, requires variations in all requests that trust on the data within
this folder. Problems (germs) are expected to happen as an outcome of
alterations, and extra time is required to utilize a correcting technique to
find these mistakes. These limits, in turn, cause physical and information
dependency issues (Alange & Mathur, 2019).
1.7.1. Structural and Data Dependence

The mechanical dependence of a folder organization designates that
convenience to a document remains determined by its structure. Introducing
a client date-of-birth column towards the Consumer file as Figure 1.3, for
instance, would need the four procedures outlined in the preceding section.
Neither of the prior apps will operate the with new Consumer file format as
a result of this modification (Tatebe et al., 2010; Shafer et al., 2010). As a
consequence, all organization files request determination essential to being
rationalized to follow the fresh filesystem. In brief, system file software
applications display structural dependency since they are influenced by
changes in the file structure. Architectural independence, on the other hand,
arises when alterations to the folder structure may be made without impacting
the client system’s capability to obtain the contents (Eck & Schaefer, 2011).
Adjustments in information properties, such as converting an integers
ground to a decimals ground, need modifications in all applications that use
the folder. The filing system is considered to be data-dependent since all
shared data routines are liable to change when some of the application’s data
storage properties change (that is, altering the type of data). Data integrity,
on the other hand, arises when changes in digital storing possessions may
be made deprived of affecting the agenda’s capacity to admissibility the
information (Veeraiah & Rao, 2020).
The contrast between both the conceptual file formats (how a person
sees data) as well as the actual file format is the actual relevance of data
reliance (in what way the processer must effort by the data). Any application
that reads a file from an operating system must somehow express to the
processor what to do, as well as how to do it. As a result, each application
must include lines that indicate the creation of a certain folder kind, as well
as records and feature descriptions. From the perspective of a developer
and database management, data dependency makes a data structure highly
inconvenient (Jin et al., 2012).
1.7.2. Data Redundancy

The design of the storage device makes it hard to combine statistics from
many bases. Additionally, the file program’s lack of protection makes it
subject to security vulnerabilities. The organizing structure encourages
diverse sites to store the same core data. (Database specialists refer to such
dispersed data sites as “islands of knowledge”) (Mahmoud et al., 2018).
The usage of a spreadsheet to store information adds to the dispersal of
data. Through the data monitoring and management tools established either
by DP experts, the whole sales team would have access to a Sales figures
file on system files. Each employee of the sales team may produce him as
well as her version of the sales figures using a spreadsheet (Magoutis et al.,
2002). Although data kept in multiple places is difficult to be continuously
updated, communication islands often include a distinct version with the
same material. The agent identities and telephone numbers, for instance,
appear both in the Client and the Agents files in Figures 1.3 and 1.4. Just one
accurate copy of both the agent’s contact numbers is required. Redundancy is
created when they exist in many locations. While the same data is collected
redundantly in many locations, data redundancy occurs (Lin et al., 2012).
Data redundancy that is uncontrolled lays the scene for (McKusick &
Quinlan, 2009):
• Poor Data Security: Once you have numerous copies of the data,
the odds of one of them being susceptible to hacking grow. The
concerns and strategies related to data protection are explored in

Chapter 15, Data Management and Security.
• Data Inconsistency: Once various and contradictory copies
with the same information emerge in different locations, this is
known as data inconsistencies. Consider changing the contact
information or address of an operator in the Agents file.
When you don’t make the necessary adjustments to the Consumer
file, the information for the same agents will be different in the two files.
Depending on which versions of the data are utilized, reports will provide
conflicting findings. Complicated entries (including such 10-digit telephone
statistics) that are made in many files and/or repeated often in one or even
more files are much more probable to occur in entering data mistakes. In
reality, the third entry in the Consumer file includes a reversed number in the
owner’s contact quantity, as seen in Figure 1.3. (615–882–2144 rather than
615–882–1244) (Kim et al., 2021).
Although it is conceivable to add the contact info of a fictional salesperson
into the Client file, consumers are unlikely to be pleased if the insurance
company provides the contact info of a fictitious agent. Must a non-existent
agency be eligible for incentives and perks, according to the personnel
department? In reality, a data input mistake, including an inaccurately spelt
name or a wrong mobile number, might cause data security issues (Menon
et al., 2003).
• Data Anomalies: “An irregularity,” according to the dictionary. A
modification to an attribute value would generally be performed
just once. Redundancy, on the other hand, creates an irregular
situation by causing field changes in the price in several places.
Figure 1.3 indicates the Customer file. The identity, location, and
mobile number of agent Leah F. Hahn are likely to shift if she
marries and moves. Rather than making a specific identity and/
or mobile transformation in a single document (AGENT), you
should make the switch in the Consumer file every time a particular
agent’s identity, mobile number, and location appear. You may be
confronted with hundreds of revisions to make, one in each of the
consumers handled by such an agent! Whenever an agent chooses
to leave, the same issue arises. A new agent should be allocated
to each client handled by the agent. To ensure data integrity, each
update in an attribute value should be done accurately in several
locations. Whenever not many of the essential modifications to
the data redundancy are completed correctly, a data anomaly

occurs. The following are some popular definitions for the data
anomalies seen in Figure 1.3 (Welch et al., 2008):
• Anomalies Should be Updated: When agent Leah F. Hahn
gets a new number, it must be recorded in each Consumer file
entry where Ms Hahn’s contact information appears. Only three
modifications are required in this situation. A modification like
this might affect hundreds or thousands of entries in big system
files. There is a lot of room for data discrepancies.
• Insertion Anomalies: When enhancing a new mediator, you
would indeed construct a fake customer information entry to
represent the new agent’s inclusion perhaps if the Consumer file
exists. Again, there’s a lot of possibility for data contradictions
to arise.
• Deletion Anomalies: You would also erase John T. Okon’s
agent information when you remove the clients Amy B. O’Brian,
George Williams, and Olette K. Smith. This is not an ideal
situation (Maneas & Schroeder, 2018).
1.7.3. Lack of Design and Data-Modeling Skills

Users generally lack the understanding of good techniques used for data
abilities, which is a new challenge that has emerged with the usage of
electronic productivity apps (such as spreadsheets and desktops database).
People, by nature, have a holistic picture of the facts in their surroundings.
Imagine a student’s school schedule, for instance (Mukhopadhyay et al.,
2014). The student’s identity card and identity, as well as the classroom
code, class descriptions, class credits, the title of the teacher conducting
the classroom, the meetings and training dates and timings, as well as the
regular classroom numbers, are all likely to be found on the timetable. These
many data components form a single entity in the child’s consciousness.
When a student group wants to maintain track of all of its members’ plans,
an individual might create a spreadsheet to maintain track of the data. Even
though the student ventures into the domain of desktops databases, he or
she is unlikely to construct a structure consisting of a single database that
closely resembles the agenda’s layout. As you’ll see in the next episodes,
cramming this much information into a data flat table construction is a bad
data architecture that results in a lot of duplication for numerous data items
(Liao et al., 2010; Lu et al., 2017).
The ability to model data is also an important aspect of a design

phase. It is critical to adequately record the style that is generated. To
ease communication between the database developer, the end consumer,
as well as the programmer, design documentation is needed. The most
popular approach to describing database systems is data modeling, which
will be discussed later in this work. The implementation of a consistent
data-modeling approach guarantees that the information model performs
its function of easing communication between the architect, customer,
and programmer (Wang et al., 2014). When it comes to maintaining or
upgrading a database as commercial technology changes, the database
schema is a significant resource. Final data designs are seldom documented,
and they are never accompanied by a consistent data-modeling approach.
On a more positive note, if you are reading a book, users are undergoing the
sort of training required to develop the database techniques used for data
model construction skills required to create a database which guarantees
consistency of data, imposes honesty, and provides a secure and adaptable
framework for providing people with timely, reliable information (Salunkhe
et al., 2016).
1.8. DATABASE SYSTEMS

Because of the issues with data files, the DBMS is a far better option. The
DBMS, unlike system files, which have many independent and unconnected
files, is made up of legally connected data kept in a solitary rational data
source. (The “rational” name points to the belief that the information
repository’s information may well be physically spread among numerous
data storage sites and/or regions, despite the article appearing towards the
end-user to be a single unit) (Paton & Diaz, 1999).
The database marks substantial changes to the way final information
is stored, retrieved, and controlled since the database’s dataset is a single
rational entity. The database’s DBMS, seen in Figure 1.6, offers various
benefits over system file administration, as seen in Figure 1.5, by removing
the majority of the filing system’s inconsistent data, data abnormality, data
reliance, and organizational dependence issues. Better still, today’s DBMS
software maintains not just database systems, but also connections between
them and access pathways to them, all in one place. All essential access
pathways to those elements are likewise defined, stored, and managed by the
present generation of Database systems (Güting, 1994).
Figure 1.6. Comparing and contrasting database and file management organiza-
tions.
Keep in mind that such a DBMS is only one of the numerous essential
aspects of a DBMS. The DBMS is sometimes described to this as the database
program’s beating heart. Therefore, just like a human person requires more
than a heartbeat to operate, a database system needs as much as a DBMS to
work. You’ll discover come again what a DBMS is, whatever its elements
are, as well as how the DBMS works through into the record management
picture in the following section (DeWitt & Gray, 1992).
1.8.1. The Database System Environment

The phrase DBMS describes a set of elements that describe and control how
data is collected, stored, managed, and used in a data center. A database
system consists of the five key pieces depicted in Figure 1.7: software,
hardware, personnel, processes, and information from a basic managerial
standpoint (Connoly et al., 2002).
Now let us look at the five factors in Figure 1.7 in more detail (Zaniolo
et al., 1997):
• Hardware: Hardware includes all physical equipment in a
network, such as processors (Personal computers, workstations,
servers, and quantum computers), storage systems, printing,
network equipment (centers, switching, router, fiber optic), and
other components (automated teller machines (ATM), ID book
lovers, and so on) (Bonnet et al., 2001).
Figure 1.7. The database system’s surroundings.
Source: https://www.slideshare.net/yhen06/database-system-environment-
ppt-14454678.
• Software Even though the DBMS has been the most well-known
piece of software, the database system would require three kinds
of software to work properly: an operational software program, a
DBMS, plus applications and tools.
• Operating System Software All hardware devices are managed,
and other such software can operate on the machines. Microsoft
Windows, Linux, Mac OS, UNIX, and VMS are samples of
operating systems.
• The DBMS program administers the database inside the DBMS.

Microsoft SQL Server, Oracle Corporation’s Oracle, Sun’s
MySQL, and IBM’s DB2 are instances of Database systems
(Abadi et al., 2013).
• Application programs and utility software are being used to
govern the computing system wherein the access control and
modification take place, as well as to access and modify data
in the DBMS. A typical function of software applications is to
retrieve data from a database to create presentations, tables, and
charts, or other statistics to aid judgment. Utilities are software
programs which are used to maintain the computer parts of a
DBMS. For instance, every one of the main DBMS suppliers
now offers graphical interfaces (GUIs) to assist in the creation of
database architecture, security systems, and database monitoring
(Özsu & Valduriez, 1996).
• People: Each component comprises all database management
users. In such a DBMS, five groups of users may be distinguished
based on their principal employment features: network managers,
database managers (DMs), database designers, system analysts
and developers, and home customers. Every user-defined has
distinct and complementary functions, as outlined below.
• System administrators manage the overall operations of the
DBMS.
• Database administrators, DBAs, or database administrators,
are in charge of the DBMS and ensuring that it is up and running.
The function of the DBA is significant enough to justify Data
Processing and Protection (Silberschatz et al., 1991).
• Database designers construct the application’s design Engineers
are, in a sense, database designers. Even the greatest app developers
and devoted DBAs will be unable to create a functional DBMS if
indeed the database architecture is bad. Because companies want
to get the most out of their knowledge, the database developer’s
position description has grown to include additional dimensions
and duties.
• System Analysts and programmers: The software applications
must be designed and implemented. Ending access to users
and editing database data via data input screens, statistics, and
processes that they design and develop (Özsu, & Valduriez, 1999).
• End users are the individuals who utilize software applications

to carry out the day-to-day activities of the company. Consumers
include salespeople, administrators, managers, and executives,
for instance. That information acquired form of the database is
used by top-end customers to make operational and strategic
choices.
• Procedures: Processes are the guidelines and regulations that
control the database program’s design and operation.
Processes are an essential portion that is sometimes overlooked.
Procedures are crucial in a corporation because they establish the rules by
which the firm and its customers do business. Processes are often used to
guarantee that either the data that enters the network or the knowledge that
is created through the usage of that data are monitored and audited in an
ordered manner (Tamura & Yokoya, 1984).
• Data: The term “data” refers to the document’s collection of
information. Although the information is the main ingredient out
of which knowledge is formed, the database developer’s work
includes determining what data should be added to the system
and how certain data should be arranged.
A database system gives an organization’s operational system one new
dynamic. The complexity of this administrative structure is affected by the
size, responsibilities, and company culture of the firm. As a result, database
systems may be built and operated at various layers of difficulty and with
differing degrees of respect to strict rules. Consider a local film renting
system to a state pension claiming system, for instance (Kießling, 2002). The
movie streaming system could be run by two persons, the hardware would
most likely be a single PC, the processes would most likely be basic, as well
as the data volume would be little. The state pension claims framework is
likely to obtain one technological dimension, many comprehensive DBAs,
and many developers and designers; this same hardware is available in most
countries across the U.s.a; the processes are highly probable to be multiple,
complicated, and stringent; and the volume of information is likely to be
significant (Abadi et al., 2009).
Management must consider another significant factor in addition to
various degrees of database management difficulty: database systems
must be cost-efficient, operational, and conceptually successful. Creating
a million-dollar resolution for a thousand-dollar issue that’s hardly a prime
illustration of database management choice, database architecture, or
database administration. Lastly, the database technology currently in use is

likely to influence the DBMS chosen (Bitton et al., 1983).
1.8.2. DBMS Functions

A DBMS performs numerous critical operations that ensure the database’s
information’s continuity and accuracy. The majority of these tasks are
invisible to target consumers and could only be accomplished with the help
of a DBMS (Florescu & Kossmann, 2009). Database structure organization,
digital storage strategic planning, the transformation of data and appearance,
information security, multiple user security systems, backup, and restore
managerial staff, data security strategic planning, database server language
groups and application software, and database networking technology are
some of the topics covered. The next sections go through each of those
functions in detail (Kifer et al., 2006).
• Data Dictionary Management: A data dictionary is used by
the DBMS to record the descriptions of data components and
associated relations (meta-data). As just a result, the DBMS is
used by all applications that Microsoft access information. The
DBMS looks for the relevant data element structures that exist
in the data definition language (DDL), which saves you the
trouble of having to write such complicated relationships for
each application. Furthermore, any modifications made to the
data structure are immediately stored in the data dictionaries,
saving you the trouble of having to alter all of the applications
that use the modified structure. To put it another way, the DBMS
abstracts data and frees the systems from architectural and data
dependencies. Figure 1.8 depicts the data description again for
Customer id in Microsoft SQL Express (Rabitti et al., 1991).
• Data Storage Management: The DBMS builds and administers
complicated digital storage systems, freeing you from the
laborious process of specifying and implementing physical data
properties.
Figure 1.8. Using Microsoft SQL server express to visualize information.
Source: https://docs.microsoft.com/en-us/sql/relational-databases/graphs/sql-
graph-architecture?view=sql-server-ver16.
A contemporary DBMS stores, not just data, but also associated data
input form or screen specifications, reporting meanings, validation criteria,
operational code, and frameworks to manage audiovisual and image formats,
among other belongings. Database performance optimization also requires
digital stowage administration (Jagadish et al., 2007).
Performance tuning mentions the movements that improve database
performance in terms of both storage and retrieval speed. Even though
the database appears to the user as a unified digital storage block, the data
is stored in a variety of physical file systems by the DBMS. (For further
information, see Figure 1.9.) These datasets can be saved on a variety of
storage mediums. As a result, the DBMS does not have to wait for one disk
demand to complete before moving on to another. In those other terms, the
DBMS can handle several database queries at the same time. In Chapter 11,
Performance Monitoring Engineering and Query Processing, you’ll learn
about digital storage control and business tuning (Chaudhuri & Narasayya,
2007).
• Data Transformation and Presentation: The folder organization
(DBMS) variations statistics such that it imitates the desirable
data structures. You don’t have to worry about an individual
between physical and Because the DBMS performs it just for
you, you don’t have to worry about logic types. That is, the
DBMS arranges the physically received data to fulfill the user’s

logical needs. Consider a corporate database utilized by a global
corporation. In England, an end consumer might input data like
July 11, 2010, as “11/07/2010.” In U. S., the very same day would
be recorded as “07/11/2010.” The DBMS must handle the data in
the correct format for each nation, irrespective of the display type
(Thomson & Abadi, 2010).
• Management of Security: The database server (DBMSs)
increases the security framework that protects individuals and
their data. Security practices govern whose users have access
to the data, what data types they may view, and which data
operations (read, add, delete, or modify) they can do. This is
especially important in multiuser database systems. Database
Administration and Security, examines data security and privacy
issues in greater detail. All database users may be authenticated
to the DBMS through a username and password or through
biometric authentication such as a fingerprint scan. The DBMS
uses this information to assign access privileges to various
database components such as queries and reports.
Figure 1.9. Using oracle to demonstrate data storage management.
Source: https://docs.oracle.com/cd/E11882_01/server.112/e10897/storage.htm.
All users should use a login and password or biometrics identification,
including a fingerprint reader, to log in with the database systems. That
information is used by the DBMS to give access rights to database elements
including searches and reporting (Jarke & Koch, 1984).
• Access Control for many Users: To guarantee consistency of

data, computer DBMS utilizes a complicated mechanism that
assures that multiple users may access the single database without
endangering the security of the program. The Transactions
Processing Usually includes the Administration section goes into
the details of different users’ internet connectivity.
• Management of Backups and Recoveries: The Database offers
security protocols and the rest preserve information security and
integrity. Most DBMS platforms have capabilities that allow the
DBA to perform both standard and bespoke backup and restore
procedures. Restoration management is primarily concerned
with the recovery of a database after a failure, such as a power
loss or damaged disk sector. This functionality is required for
the website’s security to be maintained (Kao & Garcia-Molina,
1994).
• Management of Data Integrity. The database server promotes
and enforces authenticity principles, which lowers duplicate data
and improves data integrity. The association between the data
contained in the data definitions is used to assure data integrity.
Maintaining data integrity is crucial in payment processing data
stores (Abadi et al., 2006).
• Database Access Languages and Application Programming
Interfaces (APIs): A querying language is used to access data in
the DBMS. A programming language is a behavioral components
language that allows users to indicate what needs to be done but
not how it should be done. The Structured Query (SQL) is indeed
the de facto querying language and shared data standards, and
this is shared by the majority of DBMS providers. The usage of
SQL is covered in the overview of Organized Enquiry Linguistic
(SQL) and Chapter 8, Progressive SQL. The DBMS also support
model languages including COBOL, C, Java, Visual Basic .NET,
and C# via APIs. The DBMS also includes administration tools
that the DBA and network designer may utilize to construct,
organize, manage, and manage the database (Stefanidis et al.,
2011).
• Interfaces for Communicating with Databases: End-user
queries are accepted by existing DBMSs across a variety of
network configurations. For instance, the DBMS may allow
users to access the database via the Web using Web browsers like
Mozilla Firefox or Microsoft Internet Explorer. Interactions in
this setting may be performed in a variety of ways:
• Consumers may fill out screen forms using their favorite Net
browser to produce replies to enquiries.
• The DBMS may issue planned statistics on a webpage regularly.
• The DBMS may interconnect with third-party platforms to
direct material through e-mail or even other efficiency packages
(Ramakrishnan & Ullman, 1995).
REFERENCES
1. Abadi, D. J., Boncz, P. A., & Harizopoulos, S., (2009). Column-oriented
database systems. Proceedings of the VLDB Endowment, 2(2), 1664,
1665.
2. Abadi, D., Boncz, P., Harizopoulos, S., Idreos, S., & Madden, S.,
(2013). The design and implementation of modern column-oriented
database systems. Foundations and Trends® in Databases, 5(3), 197–
280.
3. Abadi, D., Madden, S., & Ferreira, M., (2006). Integrating compression
and execution in column-oriented database systems. In: Proceedings of
the 2006 ACM SIGMOD International Conference on Management of
Data (Vol. 1, pp. 671–682).
4. Adde, G., Chan, B., Duellmann, D., Espinal, X., Fiorot, A., Iven, J.,
& Sindrilaru, E. A., (2015). Latest evolution of EOS filesystem. In:
Journal of Physics: Conference Series (Vol. 608, No. 1, p. 012009).
IOP Publishing.
5. Alange, N., & Mathur, A., (2019). Small sized file storage problems in
Hadoop distributed file system. In: 2019 International Conference on
Smart Systems and Inventive Technology (ICSSIT) (Vol. 1, pp. 1202–
1206). IEEE.
6. Batini, C., & Scannapieco, M., (2016). Data and Information Quality
(Vol. 1, pp. 2–9). Cham, Switzerland: Springer International Publishing.
7. Becerik-Gerber, B., Siddiqui, M. K., Brilakis, I., El-Anwar, O., El-
Gohary, N., Mahfouz, T., & Kandil, A. A., (2014). Civil engineering
grand challenges: Opportunities for data sensing, information analysis,
and knowledge discovery. J. Comput. Civ. Eng., 28(4), 04014013.
8. Beets, G. L., Figueiredo, N. L., Habr-Gama, A., & Van De, V. C. J.
H., (2015). A new paradigm for rectal cancer: Organ preservation:
Introducing the international watch & wait database (IWWD).
European Journal of Surgical Oncology, 41(12), 1562–1564.
9. Bell, C., & Badanjak, S., (2019). Introducing PA-X: A new peace
agreement database and dataset. Journal of Peace Research, 56(3),
452–466.
10. Bissett, A., Fitzgerald, A., Meintjes, T., Mele, P. M., Reith, F., Dennis, P.
G., & Young, A., (2016). Introducing BASE: The biomes of Australian
soil environments soil microbial diversity database. Gigascience, 5(1),
s13742–016.
11. Bitton, D., DeWitt, D. J., & Turbyfill, C., (1983). Benchmarking
Database Systems-A Systematic Approach (Vol. 1, pp. 3–9). University
of Wisconsin-Madison Department of Computer Sciences.
12. Blomer, J., Buncic, P., Meusel, R., Ganis, G., Sfiligoi, I., & Thain, D.,
(2015). The evolution of global scale filesystems for scientific software
distribution. Computing in Science & Engineering, 17(6), 61–71.
13. Bonnet, P., Gehrke, J., & Seshadri, P., (2001). Towards sensor database
systems. In: International Conference on Mobile Data Management
(Vol. 1, pp. 3–14). Springer, Berlin, Heidelberg.
14. Broatch, J. E., Dietrich, S., & Goelman, D., (2019). Introducing data
science techniques by connecting database concepts and dplyr. Journal
of Statistics Education, 27(3), 147–153.
15. Changnon, S. A., & Kunkel, K. E., (1999). Rapidly expanding uses of
climate data and information in agriculture and water resources: Causes
and characteristics of new applications. Bulletin of the American
Meteorological Society, 80(5), 821–830.
16. Chaudhuri, S., & Narasayya, V., (2007). Self-tuning database systems:
A decade of progress. In: Proceedings of the 33rd International
Conference on Very Large Data Bases (Vol. 1, pp. 3–14).
17. Chen, J. M., Norman, J. B., & Nam, Y., (2021). Broadening the stimulus
set: Introducing the American multiracial faces database. Behavior
Research Methods, 53(1), 371–389.
18. Chen, M., Ebert, D., Hagen, H., Laramee, R. S., Van, L. R., Ma, K. L., &
Silver, D., (2008). Data, information, and knowledge in visualization.
IEEE Computer Graphics and Applications, 29(1), 12–19.
19. Cheng, J., Greiner, R., Kelly, J., Bell, D., & Liu, W., (2002). Learning
Bayesian networks from data: An information-theory based approach.
Artificial Intelligence, 137(1, 2), 43–90.
20. Cole, J. R., Chai, B., Farris, R. J., Wang, Q., Kulam-Syed-Mohideen, A.
S., McGarrell, D. M., & Tiedje, J. M., (2007). The ribosomal database
project (RDP-II): Introducing myRDP space and quality controlled
public data. Nucleic Acids Research, 35(suppl_1), D169–D172.
21. Connoly, T., Begg, C., & Strachan, A., (1996, 2002). Database Systems
(Vol. 1, pp. 2–8). Addison-Wesley.
22. Cox, A., Doell, R. R., & Dalrymple, G. B., (1964). Reversals of the
earth’s magnetic field: Recent paleomagnetic and geochronologic data
provide information on time and frequency of field reversals. Science,

144(3626), 1537–1543.
23. Degrandi, T. M., Barcellos, S. A., Costa, A. L., Garnero, A. D., Hass,
I., & Gunski, R. J., (2020). Introducing the bird chromosome database:
An overview of cytogenetic studies in birds. Cytogenetic and Genome
Research, 160(4), 199–205.
24. Destaillats, H., Maddalena, R. L., Singer, B. C., Hodgson, A. T., &
McKone, T. E., (2008). Indoor pollutants emitted by office equipment:
A review of reported data and information needs. Atmospheric
Environment, 42(7), 1371–1388.
25. DeWitt, D., & Gray, J., (1992). Parallel database systems: The future
of high performance database systems. Communications of the ACM,
35(6), 85–98.
26. Eck, O., & Schaefer, D., (2011). A semantic file system for integrated
product data management. Advanced Engineering Informatics, 25(2),
177–184.
27. El Haddad, K., Torre, I., Gilmartin, E., Çakmak, H., Dupont, S., Dutoit,
T., & Campbell, N., (2017). Introducing amuS: The amused speech
database. In: International Conference on Statistical Language and
Speech Processing (Vol. 1, pp. 229–240). Springer, Cham.
28. Ergüzen, A., & Ünver, M., (2018). Developing a file system structure
to solve healthy big data storage and archiving problems using a
distributed file system. Applied Sciences, 8(6), 913.
29. Florescu, D., & Kossmann, D., (2009). Rethinking cost and performance
of database systems. ACM SIGMOD Record, 38(1), 43–48.
30. Freilich, J. D., Chermak, S. M., Belli, R., Gruenewald, J., & Parkin,
W. S., (2014). Introducing the United States extremis crime database
(ECDB). Terrorism and Political Violence, 26(2), 372–384.
31. Gavazzi, G., Boselli, A., Donati, A., Franzetti, P., & Scodeggio, M.,
(2003). Introducing GOLDMine: A new galaxy database on the WEB.
Astronomy & Astrophysics, 400(2), 451–455.
32. Gerardi, F. F., Reichert, S., & Aragon, C. C., (2021). TuLeD (Tupían
lexical database): Introducing a database of a South American language
family. Language Resources and Evaluation, 55(4), 997–1015.
33. Ghemawat, S., Gobioff, H., & Leung, S. T., (2003). The google
file system. In: Proceedings of the Nineteenth ACM Symposium on
Operating Systems Principles (Vol. 1, pp. 29–43).
34. Grimm, E. C., Bradshaw, R. H., Brewer, S., Flantua, S., Giesecke,
T., Lézine, A. M., & Williams, Jr. J. W., (2013). Databases and Their
Application, 1, 3–7.
35. Güting, R. H., (1994). An introduction to spatial database systems. The
VLDB Journal, 3(4), 357–399.
36. Heidemann, J. S., & Popek, G. J., (1994). File-system development
with stackable layers. ACM Transactions on Computer Systems
(TOCS), 12(1), 58–89.
37. Herodotou, H., (2016). Towards a distributed multi-tier file system for
cluster computing. In: 2016 IEEE 32nd International Conference on
Data Engineering Workshops (ICDEW) (Vol. 1, pp. 131–134). IEEE.
38. Hübner, D. C., (2016). The ‘national decisions’ database (Dec. Nat):
Introducing a database on national courts’ interactions with European
law. European Union Politics, 17(2), 324–339.
39. Jagadish, H. V., Chapman, A., Elkiss, A., Jayapandian, M., Li, Y.,
Nandi, A., & Yu, C., (2007). Making database systems usable. In:
Proceedings of the 2007 ACM SIGMOD International Conference on
Management of Data (Vol. 1, pp. 13–24).
40. Jarke, M., & Koch, J., (1984). Query optimization in database systems.
ACM Computing Surveys (CsUR), 16(2), 111–152.
41. Jin, S., Yang, S., Zhu, X., & Yin, H., (2012). Design of a trusted file
system based on Hadoop. In: International Conference on Trustworthy
Computing and Services (Vol. 1, pp. 673–680). Springer, Berlin,
Heidelberg.
42. Johnson, J. E., & Laing, W. A., (1996). Overview of the spiralog file
system. Digital Technical Journal, 8, 5–14.
43. Johnson, J., (2021). Introducing the military mutinies and defections
database (MMDD), 1945–2017. Journal of Peace Research, 58(6),
1311–1319.
44. Joshi, M., & Darby, J., (2013). Introducing the peace accords matrix
(PAM): A database of comprehensive peace agreements and their
implementation, 1989–2007. Peacebuilding, 1(2), 256–274.
45. Jukic, N., Vrbsky, S., Nestorov, S., & Sharma, A., (2014). Database
Systems: Introduction to Databases and Data Warehouses (Vol. 1, p.
400). Pearson.
46. Kakoulli, E., & Herodotou, H., (2017). OctopusFS: A distributed file
system with tiered storage management. In: Proceedings of the 2017
ACM International Conference on Management of Data (Vol. 1, pp.

65–78).
47. Kam, C. L. H., & Matthewson, L., (2017). Introducing the infantbook
reading database (IBDb). Journal of Child Language, 44(6), 1289–
1308.
48. Kao, B., & Garcia-Molina, H., (1994). An overview of real-time
database systems. Real Time Computing, 261–282.
49. Karp, P. D., Riley, M., Saier, M., Paulsen, I. T., Paley, S. M., &
Pellegrini-Toole, A., (2000). The EcoCyc and MetaCyc databases.
Nucleic Acids Research, 28(1), 56–59.
50. Kießling, W., (2002). Foundations of preferences in database systems.
In: VLDB’02: Proceedings of the 28th International Conference on
Very Large Databases (Vol. 1, pp. 311–322). Morgan Kaufmann.
51. Kifer, M., Bernstein, A. J., & Lewis, P. M., (2006). Database Systems:
An Application-Oriented Approach (Vol. 1, pp. 4–8). Pearson/Addison-
Wesley.
52. Kim, J., Jang, I., Reda, W., Im, J., Canini, M., Kostić, D., & Witchel, E.,
(2021). LineFS: Efficient SmartNIC offload of a distributed file system
with pipeline parallelism. In: Proceedings of the ACM SIGOPS 28th
Symposium on Operating Systems Principles (Vol. 1, pp. 756–771).
53. Kim, O. S., Cho, Y. J., Lee, K., Yoon, S. H., Kim, M., Na, H., & Chun,
J., (2012). Introducing EzTaxon-e: A prokaryotic 16S rRNA gene
sequence database with phylotypes that represent uncultured species.
International Journal of Systematic and Evolutionary Microbiology,
62(Pt_3), 716–721.
54. Kodera, Y., Yoshida, K., Kumamaru, H., Kakeji, Y., Hiki, N., Etoh, T.,
& Konno, H., (2019). Introducing laparoscopic total gastrectomy for
gastric cancer in general practice: A retrospective cohort study based
on a nationwide registry database in Japan. Gastric Cancer, 22(1),
202–213.
55. Kovach, K. A., & Cathcart, Jr. C. E., (1999). Human resource
information systems (HRIS): Providing business with rapid data access,
information exchange and strategic advantage. Public Personnel
Management, 28(2), 275–282.
56. LaFree, G., & Dugan, L., (2007). Introducing the global terrorism
database. Terrorism and Political Violence, 19(2), 181–204.
57. Liao, H., Han, J., & Fang, J., (2010). Multi-dimensional index on
Hadoop distributed file system. In: 2010 IEEE Fifth International
Conference on Networking, Architecture, and Storage (Vol. 1, pp.
240–249). IEEE.
58. Lin, H. Y., Shen, S. T., Tzeng, W. G., & Lin, B. S. P., (2012). Toward
data confidentiality via integrating hybrid encryption schemes and
Hadoop distributed file system. In: 2012 IEEE 26th International
Conference on Advanced Information Networking and Applications
(Vol. 1, pp. 740–747). IEEE.
59. Lu, Y., Shu, J., Chen, Y., & Li, T., (2017). Octopus: An {RDMA-
enabled} distributed persistent memory file system. In: 2017 USENIX
Annual Technical Conference (USENIX ATC 17) (Vol. 1, pp. 773–785).
60. Madnick, S. E., Wang, R. Y., Lee, Y. W., & Zhu, H., (2009). Overview
and framework for data and information quality research. Journal of
Data and Information Quality (JDIQ), 1(1), 1–22.
61. Magoutis, K., Addetia, S., Fedorova, A., Seltzer, M. I., Chase, J. S.,
Gallatin, A. J., & Gabber, E., (2002). Structure and performance of the
direct access file system. Management, 21, 31.
62. Mahmoud, H., Hegazy, A., & Khafagy, M. H., (2018). An approach
for big data security based on Hadoop distributed file system. In:
2018 International Conference on Innovative Trends in Computer
Engineering (ITCE) (Vol. 1, pp. 109–114). IEEE.
63. Maneas, S., & Schroeder, B., (2018). The evolution of the Hadoop
distributed file system. In: 2018 32nd International Conference on
Advanced Information Networking and Applications Workshops
(WAINA) (Vol. 1, pp. 67–74). IEEE.
64. Martinikorena, I., Cabeza, R., Villanueva, A., & Porta, S., (2018).
Introducing i2head database. In: Proceedings of the 7th Workshop on
Pervasive Eye Tracking and Mobile Eye-Based Interaction (Vol. 1, pp.
1–7).
65. McKelvey, R. D., & Ordeshook, P. C., (1985). Elections with limited
information: A fulfilled expectations model using contemporaneous
poll and endorsement data as information sources. Journal of Economic
Theory, 36(1), 55–85.
66. McKusick, M. K., & Quinlan, S., (2009). GFS: Evolution on fast-
forward: A discussion between kirk McKusick and Sean Quinlan About
the origin and evolution of the google file system. Queue, 7(7), 10–20.
67. McKusick, M. K., Joy, W. N., Leffler, S. J., & Fabry, R. S., (1984). A
fast file system for UNIX. ACM Transactions on Computer Systems
(TOCS), 2(3), 181–197.
68. Menon, J., Pease, D. A., Rees, R., Duyanovich, L., & Hillsberg, B.,
(2003). IBM storage tank—A heterogeneous scalable SAN file system.
IBM Systems Journal, 42(2), 250–267.
69. Merceedi, K. J., & Sabry, N. A., (2021). A comprehensive survey for
Hadoop distributed file system. Asian Journal of Research in Computer
Science, 1, 4–7.
70. Mukhopadhyay, D., Agrawal, C., Maru, D., Yedale, P., & Gadekar, P.,
(2014). Addressing name node scalability issue in Hadoop distributed
file system using cache approach. In: 2014 International Conference
on Information Technology (Vol. 1, pp. 321–326). IEEE.
71. Oldfield, R., & Kotz, D., (2001). Armada: A parallel file system for
computational grids. In: Proceedings First IEEE/ACM International
Symposium on Cluster Computing and the Grid (Vol. 1, pp. 194–201).
IEEE.
72. Ovsiannikov, M., Rus, S., Reeves, D., Sutter, P., Rao, S., & Kelly,
J., (2013). The Quantcast file system. Proceedings of the VLDB
Endowment, 6(11), 1092–1101.
73. Özsu, M. T., & Valduriez, P., (1996). Distributed and parallel database
systems. ACM Computing Surveys (CSUR), 28(1), 125–128.
74. Özsu, M. T., & Valduriez, P., (1999). Principles of Distributed Database
Systems (Vol. 2, pp. 1–5). Englewood Cliffs: Prentice Hall.
75. Pal, A., & Memon, N., (2009). The evolution of file carving. IEEE
Signal Processing Magazine, 26(2), 59–71.
76. Paton, N. W., & Diaz, O., (1999). Active database systems. ACM
Computing Surveys (CSUR), 31(1), 63–103.
77. Prabhakaran, V., Arpaci-Dusseau, A. C., & Arpaci-Dusseau, R. H.,
(2005). Analysis and evolution of journaling file systems. In: USENIX
Annual Technical Conference, General Track (Vol. 194, pp. 196–215).
78. Rabitti, F., Bertino, E., Kim, W., & Woelk, D., (1991). A model of
authorization for next-generation database systems. ACM Transactions
on Database Systems (TODS), 16(1), 88–131.
79. Ramakrishnan, R., & Ullman, J. D., (1995). A survey of deductive
database systems. The Journal of Logic Programming, 23(2), 125–149.
80. Ramesh, D., Patidar, N., Kumar, G., & Vunnam, T., (2016). Evolution and
analysis of distributed file systems in cloud storage: Analytical survey.
In: 2016 International Conference on Computing, Communication and
Automation (ICCCA) (Vol. 1, pp. 753–758). IEEE.
81. Rapps, S., & Weyuker, E. J., (1985). Selecting software test data using
data flow information. IEEE Transactions on Software Engineering,
(4), 367–375.
82. Rosselló-Móra, R., Trujillo, M. E., & Sutcliffe, I. C., (2017).
Introducing a digital protologue: A timely move towards a database-
driven systematics of archaea and bacteria. Antonie Van Leeuwenhoek,
110(4), 455–456.
83. Salunkhe, R., Kadam, A. D., Jayakumar, N., & Thakore, D., (2016).
In search of a scalable file system state-of-the-art file systems review
and map view of new scalable file system. In: 2016 International
Conference on Electrical, Electronics, and Optimization Techniques
(ICEEOT) (Vol. 1, pp. 364–371). IEEE.
84. Saurabh, A., & Parikh, S. M., (2021). Evolution of distributed file
system and Hadoop: A mathematical appraisal. Recent Advances in
Mathematical Research and Computer Science, 2, 105–112.
85. Schultze, U., & Avital, M., (2011). Designing interviews to generate rich
data for information systems research. Information and Organization,
21(1), 1–16.
86. Shafer, J., Rixner, S., & Cox, A. L., (2010). The Hadoop distributed
filesystem: Balancing portability and performance. In: 2010 IEEE
International Symposium on Performance Analysis of Systems &
Software (ISPASS) (Vol. 1, pp. 122–133). IEEE.
87. Sidorov, J., Shull, R., Tomcavage, J., Girolami, S., Lawton, N., &
Harris, R., (2002). Does diabetes disease management save money
and improve outcomes? A report of simultaneous short-term savings
and quality improvement associated with a health maintenance
organization–sponsored disease management program among patients
fulfilling health employer data and information set criteria. Diabetes
Care, 25(4), 684–689.
88. Sigmund, M., (2006). Introducing the database exam stress for speech
under stress. In: Proceedings of the 7th Nordic Signal Processing
Symposium-NORSIG 2006 (Vol. 1, pp. 290–293). IEEE.
89. Silberschatz, A., Stonebraker, M., & Ullman, J., (1991). Database
systems: Achievements and opportunities. Communications of the
ACM, 34(10), 110–120.
90. Sivaraman, E., & Manickachezian, R., (2014). High performance and
fault tolerant distributed file system for big data storage and processing
using Hadoop. In: 2014 International Conference on Intelligent
Computing Applications (Vol. 1, pp. 32–36). IEEE.
91. Smith, K. A., & Seltzer, M. I., (1997). File system aging—increasing
the relevance of file system benchmarks. In: Proceedings of the 1997
ACM SIGMETRICS International Conference on Measurement and
Modeling of Computer Systems (Vol. 1, pp. 203–213).
92. Stefanidis, K., Koutrika, G., & Pitoura, E., (2011). A survey on
representation, composition and application of preferences in database
systems. ACM Transactions on Database Systems (TODS), 36(3),
1–45.
93. Stowe, L. L., Ignatov, A. M., & Singh, R. R., (1997). Development,
validation, and potential enhancements to the second‐generation
operational aerosol product at the national environmental satellite,
data, and information service of the national oceanic and atmospheric
administration. Journal of Geophysical Research: Atmospheres,
102(D14), 16923–16934.
94. Tamura, H., & Yokoya, N., (1984). Image database systems: A survey.
Pattern Recognition, 17(1), 29–43.
95. Tatebe, O., Hiraga, K., & Soda, N., (2010). Gfarm grid file system.
New Generation Computing, 28(3), 257–275.
96. Thomson, A., & Abadi, D. J., (2010). The case for determinism in
database systems. Proceedings of the VLDB Endowment, 3(1, 2), 70–
80.
97. Thomson, G. H., (1996). The DIPPR® databases. International Journal
of Thermophysics, 17(1), 223–232.
98. Tiirikka, T., & Moilanen, J. S., (2015). Human chromosome Y and
haplogroups; introducing YDHS database. Clinical and Translational
Medicine, 4(1), 1–9.
99. Tolar, B., Joseph, L. A., Schroeder, M. N., Stroika, S., Ribot, E. M.,
Hise, K. B., & Gerner-Smidt, P., (2019). An overview of PulseNet USA
databases. Foodborne Pathogens and Disease, 16(7), 457–462.
100. Urvoy, M., Barkowsky, M., Cousseau, R., Koudota, Y., Ricorde, V., Le
Callet, P., & Garcia, N., (2012). NAMA3DS1-COSPAD1: Subjective
video quality assessment database on coding conditions introducing
freely available high quality 3D stereoscopic sequences. In: 2012
Fourth International Workshop on Quality of Multimedia Experience
(Vol. 1, pp. 109–114). IEEE.
101. Valduriez, P., (1993). Parallel database systems: Open problems and
new issues. Distributed and Parallel Databases, 1(2), 137–165.
102. Van, R. C. J., (1977). A theoretical basis for the use of co‐occurrence
data in information retrieval. Journal of Documentation, 1, 3–9.
103. Veeraiah, D., & Rao, J. N., (2020). An efficient data duplication
system based on Hadoop distributed file system. In: 2020 International
Conference on Inventive Computation Technologies (ICICT) (Vol. 1,
pp. 197–200). IEEE.
104. Wang, L., Ma, Y., Zomaya, A. Y., Ranjan, R., & Chen, D., (2014).
A parallel file system with application-aware data layout policies
for massive remote sensing image processing in digital earth. IEEE
Transactions on Parallel and Distributed Systems, 26(6), 1497–1508.
105. Welch, B., Unangst, M., Abbasi, Z., Gibson, G. A., Mueller, B., Small,
J., & Zhou, B., (2008). Scalable performance of the panasas parallel
file system. In: FAST (Vol. 8, pp. 1–17).
106. Yoon, S. H., Ha, S. M., Kwon, S., Lim, J., Kim, Y., Seo, H., &
Chun, J., (2017). Introducing EzBioCloud: A taxonomically united
database of 16S rRNA gene sequences and whole-genome assemblies.
International Journal of Systematic and Evolutionary Microbiology,
67(5), 1613.
107. Zaniolo, C., Ceri, S., Faloutsos, C., Snodgrass, R. T., Subrahmanian, V.
S., & Zicari, R., (1997). Advanced Database Systems (Vol. 1, pp. 4–9).
Morgan Kaufmann.
108. Zhang, J., Shu, J., & Lu, Y., (2016). {ParaFS}: A {log-structured} file
system to exploit the internal parallelism of flash devices. In: 2016
USENIX Annual Technical Conference (USENIX ATC 16) (Vol. 1, pp.
87–100).
109. Zhulin, I. B., (2015). Databases for microbiologists. Journal of
Bacteriology, 197(15), 2458–2467.
110. Zins, C., (2007). Conceptual approaches for defining data, information,
and knowledge. Journal of the American Society for Information
Science and Technology, 58(4), 479–493.
CHAPTER 2
DATA MODELS
CONTENTS
2.1. Introduction....................................................................................... 46
2.2. Importance of Data Models............................................................... 47
2.3. Data Model Basic Building Blocks..................................................... 48
2.4. Business Rules................................................................................... 50
2.5. The Evolution of Data Models............................................................ 54
References................................................................................................ 66
2.1. INTRODUCTION
When designing a database, the primary emphasis is already on the data
structures and how they will be utilized to store records for end-users. The
very first phase in building a database is known as data modeling, and it
relates to the procedure of establishing a unique data representation for a
particular issue domain. (A proposed approach is a very well-defined region
inside the real-world situation which is to be methodically handled. It
also has well-defined scope and bounds) (Mullahy, 1986). An information
model is an excellent simple depiction, often in graphical form, of more
intricate data structures that exist in the actual world. Any model is, in its
most fundamental sense, an approximation of a more complicated real-
world phenomenon or occurrence. The primary purpose of a model is to
improve one’s ability to comprehend the intricacies of the ecosystem that
exists. An information model is an example of data structures together
with their attributes, relationships, restrictions, transforms, as well as other
constructs that are used within the context of a database system to assist with
a particular problem area (Peckham & Maryanski, 1988).
The process of data modeling may be thought of as iterative and
continuous. You begin with a basic comprehension of the issue domain, but
as your knowledge of the given problem grows, so will the degree of detail
inside the data structure that you create. If everything is done correctly, the
resulting data structure will serve as a “blueprint” that contains all of the
directions necessary to construct a database that satisfies the needs of the
end-users. This design is both story and visual, which means that it includes
not just text explanations written in clear, straightforward language but also
diagrams that are both completely obvious and helpful and that illustrate the
primary data pieces (Brodie, 1984; Blundell et al., 2002).
In the past, database designers would rely on their sound judgments to
assist them in the process of developing a reliable data model. Unfortuitously,
having sound judgment is frequently in the eyes of society, and it frequently
emerges because of a great deal of practice and the accumulation of errors.
If all the participants in this class were given the assignment of developing a
database structure for a video shop, for instance, it is quite possible that every
one of them would come up with an original model for the business (Bond,
2002; Hellerstein & Mendelsohn, 1993). Whichever one of these would be
the appropriate choice? The straightforward response is “one which satisfies
all the needs of the end-user,” however there could be more than the right
option! The possibility for mistakes in database modeling has, thankfully,
Data Models 47
been significantly cut down because database administrators (DBAs)

make use of already established data-modeling components and effective
data modeling tools. In the following paragraphs, you will discover how
various degrees of data extraction may help ease statistical modeling and
how current data models can be used to reflect actual information (Elhorst,
2014). Additionally, you would learn why current data models can be used
to describe information. For illustration’s sake, let’s say that each person in
a class is tasked with developing a database schema for a video shop. It’s
quite probable that each student would devise a unique model (Bachman &
Daya, 1977).
2.2. IMPORTANCE OF DATA MODELS

Communication between the designer, the program developer, as well as the
end customer may be made easier by data structures. A well-designed data
model may even help you get a better knowledge of the company for whom
you’re creating the database architecture. In a nutshell, data structures are
a means of communication. A customer’s response to this crucial feature of
representing data was just as follows: “I founded this company, I worked
with this business a long time, so this is the first time I’ve fully grasped how
all parts fit altogether” (Blundell & Bond, 1998).
Information modeling’s relevance cannot be emphasized. Data are
the most fundamental information units used for a network. Programs are
designed to assist organize data and translating it into useful information.
However, different individuals see data in various ways. Compare and
contrast a business owner’s (data) perspective to that of a professional clerk.
Even though both the management as well as the clerk operate for the same
organization, the management is much more probable than just the clerks
to provide a corporation perspective of the user’s data (Aubry et al., 2017).
Although different managers have varied perspectives on data. A film
director, for instance, is likely to be taking a broad perspective of such data
since he or she should be able to connect the firm’s divisions to a single
(database) perspective. A buying manager, as well as the firm’s inventory
management, are going to have a much more limited perspective of the data.
Every management, in effect, deals with a portion of the user’s servers. The
inventory management system is more interested in stock levels, whereas the
buying management is more interested in item costs and close connections
with that item’s providers (Tekieh & Raahemi, 2015).
Data is seen differently by application developers, who are more

concerned about data placement, formatting, and specialized reporting
needs. In essence, application developers convert business rules and
processes from many sources into suitable interfaces, analytics, and inquiry
screens. The “blind people, as well as the elephant” metaphor, frequently
applies to research and data consumers and producers: because the blind
person that touched the elephant’s trunk had a different image of the animal
than the blind man who feels the elephant’s leg or tail. A full vision of
an elephant is required. A home, likewise, is not a collection of random
chambers; if somebody is planning to construct a house, they should first get
a general idea from blueprints. Similarly, a good data ecosystem necessitates
a database layout based on the right data model (Wooldridge, 2005).
It doesn’t issue if such an applications developer’s vision of a data
differs from those of the management and/or the final user whenever a
suitable database design is provided. Whenever a suitable database plan isn’t
accessible, though, issues are likely to arise. For example, a contradictory
commodity scheme between an inventory management software and also an
ordering key system might cost a significant amount of resources (Brown &
Lemmon, 2007).
Please remember that a home layout is only a representation; you never
live in one. Conversely, the database model is a representation from which
you cannot extract the needed data. You won’t be able to construct a nice
building without such a blueprint, and you won’t be able to establish a better
database before even designing a suitable database structure (Feeley &
Silman, 2010).
2.3. DATA MODEL BASIC BUILDING BLOCKS

Organizations, characteristics, connections, and restrictions are the
fundamental components of all data structures. Everything (an individual, a
location, an object, or perhaps an activity) over which data should be gathered
and maintained is referred to as an entity. In the actual world, an entity
denotes a certain sort of item. Organizations are “differentiated” because
they represent a certain sort of thing. Every incidence of such an entity is
uniquely different. A Consumer entity, for instance, might have numerous
distinct customer instances, including John Smith, Pedro Dinamita, Tom
Strickland, and so on (Kiviet, 1995).
Data Models 49
Entities might be actual items like consumers or goods, but they can
also be abstract concepts like airline paths or musical performances. Every
entity’s property is one of its characteristics. For instance, properties
including such customers’ last surname, customer initial name, consumer
telephone, customer location, and customer’s credit limitation might be used
to define a Consumer entity. With file systems, properties are the counterpart
of fields (Chang et al., 2017).
A relationship is a term used to indicate a connection between two or
more things. Clients with agents, for instance, have a connection that can
be listed as follows: a single agent can service a large number of customers,
and each consumer may be handled by a single agent. Individual, numerous,
and one-to-one connections are used in data structures. These abbreviated
mnemonic devices 1:M or 1‥*, M: N or *‥*, and 1:1 or 1‥1 are often
used by application developers. (While the M: N sign is the most common
labeling and for so many connections, the tag M: M can also be used.) The
differences between the three are seen in the following instances (Sudan,
2005).
• One-to-many (1:M or 1…*) relationship. A painter creates a
variety of paintings, yet each will be created by a single artist.
As a result, the painters (the “single”) and the paintings (the
“numerous”) are linked. As a result, data analysts provide a 1:M
value to the association “Artists paints Portrait.” (It’s worth noting
that organization names are frequently capitalized as a practice to
make them stand out.) A client (the “one”) may create a large
number of bills, but each statement (the “several”) is produced
by an individual client. The connection “Consumer produces
Receipt” would likewise be designated 1:M (Monakova et al.,
2009).
• Many-to-many (M:N or *…*) relationship. Every worker
may acquire a variety of work skills, so each professional skill
can be learnt by a large number of people. The connection
“Individual learns SKILL” is labeled M: N by system developers.
Consequently, a student may enroll in several courses, and each
classroom can enroll multiple students, resulting in the M: N
connection identifier for the connection described by “STUDENT
attends CLASS.”
• One-to-one (1:1 or 1…1) relationship. Each of a retailer’s
storefronts may be handled by a single worker, depending on the
company’s current structure. Every manager, who is also a worker,

is responsible for just one store. As a result, “EMPLOYEE controls
STORE” is designated as a 1:1 connection. Each connection was
discovered in both ways in the previous explanation; that is,
connections are multidirectional:
• Many INVOICES may be generated by a single CUSTOMER.
• Only one CUSTOMER creates each of the several INVOICES.
A restriction is a limitation imposed on data. Restrictions are significant
because they aid in the preservation of data security (Falk & Bassett, 2017).
Usually, constraints are specified in the form of regulations. Consider
the following scenario (Makarova et al., 2013):
• The compensation of a worker must be between 6,000 and
350,000 dollars.
• The GPA of a student should be between 0.00 and 4.00.
• A single instructor must be assigned to each class.
What are the best practices for identifying entities, characteristics,
connections, and constrictions? A first step is establishing the business rules
that apply to the issue area you’re modeling (Moonen et al., 2005).
2.4. BUSINESS RULES

Whenever DBAs are deciding which entities, characteristics, and connections
to employ to develop a data model, developers may begin by learning about
the many kinds of data that exist within the organization, how well the data is
utilized, and under what periods the data is used. However, this information
and data do not provide the essential knowledge of the whole firm on their
own (Nemuraite et al., 2010). The gathering of data becomes useful from
the standpoint of a system only if it follows correctly specified business
principles. A procedure is a clear, concise, and unambiguous definition of
a policy, practice, or principle inside a company. In some ways, procedures
are misidentified: essentially relate to any institution that keeps and utilizes
data to produce information, whether it’s a big or small firm, a national state,
a religious body, or a research facility (Whitby et al., 2007).
Business requirements, which are generated from a thorough description
of an organization’s strategy, assist in the creation and enforcement of
activities within that context. Requirements must be written down and
modified when the operating environment of the firm changes. Entities,
properties, relationships, and restrictions are all defined using well-written
Data Models 51
business rules. You’ll see business requirements at work whenever you

encounter comments like “an agent can service multiple clients, but each
consumer could only be serviced through one agent.” Instructions are used
extensively throughout this whole book, particularly in the chapters on
representing data and database architecture (Araujo‐Pradere, 2009).
Business rules should be simple to grasp and extensively communicated
for everyone in the company to have the same understanding of the rules.
The essential and differentiating qualities of the data as regarded by the
corporation are described by business rules in plain terms. Here are some
kinds of business rules (Pulparambil et al., 2017):
• A single client might create a large number of invoices.
• Only one client generates an invoice.
• A training exercise for less than 10 workers or even more than 30
individuals cannot be arranged.
These business rules define organizations, connections, and limitations,
to name a few. The very first two different business rules, for instance, create
two objects (CUSTOMER and INVOICE) and a 1:M connection between
them. The third business rule specifies a limit (no less than 10 persons but
no and over 30 persons), two things (EMPLOYEE and TRAINING), and a
connection between them (Bajec & Krisper, 2005).
2.4.1. Discovering Business Rules

Company management, legislators, management staff, and paper records
including an industry’s policies, regulations, and operational guides are the
primary sources of business requirements. Informal interviews involving
end-users are a quicker and much more immediate part of business rules
(Taveter & Wagner, 2001). Unfortunately, since people’s views vary, end
users aren’t always the most dependable source for defining business
rules. A maintenance team mechanic, for instance, may assume that any
technician may commence a maintenance operation while only technicians
with inspection authority can do so. Although such a discrepancy may seem
little, it might have significant legal ramifications (Knolmayer et al., 2000).
Even though end-users are important contributors to the establishment
of business rules, it is prudent to double-check end-user views. Interviews
with numerous persons who do the same work can give completely diverse
perspectives about what the job entails. Though such a finding may indicate
“management issues,” the DBA does not benefit from such a diagnostic. The
system designer’s task is to reconcile these discrepancies and double-check
the findings to verify that the business requirements are correct and suitable
(Herbst et al., 1994).
For various reasons, the act of discovering and documenting business
requirements is critical to database configuration (Wan-Kadir & Loucopoulos,
2004):
• They aid in the standardization of the business information
perspective.
• They may be used as a means of communication between
consumers and designers.
• They enable the designer to comprehend the information’s nature,
function, and scope.
• They make it possible for the designer to comprehend business
operations.
• They enable the designer to establish a reliable data description
and define suitable relationship participation criteria and
limitations.
Not even all instructions, of obviously, can be represented. A business
rule such as “no pilot may fly and over 10 hours in just about any period of
24 hours” could be represented, for instance. The software systems, on the
other hand, may implement such a business rule (Zur Muehlen & Indulska,
2010).
2.4.2. Translating Business Rules into Data Model Components

Business rules provide the groundwork for identifying entities, properties,
connections, and constraints correctly. Identities are being used to identify
items in the actual world. There will also be special business rules for the
entities if indeed the business climate needs to keep note of them. Every
noun in a commercial contract will, on average, convert into an organization
in the system, as well as a verb (passive or active) connecting nouns could
very well convert into a connection between the organizations (Rosenberg
& Dustdar, 2005).
The business rule “a client may create several payments” for example,
has two objects (customer and invoices) as well as a verb (produce) that
connects them. You may conclude the following from such a commercial
contract (Van Eijndhoven et al., 2008):
• Customers and invoices are both environmental items that should
be expressed by the entities.
Data Models 53
• Between both the client and the statement, there is a “produce”

connection.
Remember that connections are unidirectional, meaning they include
both ways when determining the sort of connection. The business rule “a
client may create several invoices” is, for instance, supplemented by the
business rule “just one consumer generates an invoice.” The connection is
one here situation (1:M). The “one” side is indeed the customer, as well as
the “several” part seems to be the invoice (Charfi & Mezini, 2004).
As just a basic rule, you must ask two fundamental questions to
appropriately define the connection type (Kardasis & Loucopoulos, 2004):
• How many B cases are linked with one A example?
• How many instances are linked with one B instance?
Two problems, for example, may be used to examine the connection
between student and class (Rosca et al., 1997):
• What is the maximum number of courses a student may take?
Many courses, to be sure.
• What is the maximum number of students that may be enrolled in
a single class? Several students, to be sure.
As a result, the student-class connection is one-to-many (M: N). As
you progress through this textbook, students will have several chances to
identify the connections between objects, and the procedure will quickly
become part of the routine to you (Herbst, 1996).
2.4.3. Naming Conventions

Visitors identify entities, properties, connections, and restrictions while
translating business requirements to data model elements. That method of
identification comprises naming the item in a manner that makes it distinct
from other things in the issue area. As a result, it’s critical essential you
pay close attention to how you label the items you find (Demuth et al.,
2001). The names of entities should be indicative of the items within the
business climate and utilize vocabulary which the users are acquainted with.
The naming of the property should be informative of the data it represents.
Prefixing the title of a property with both the names of the object (or an
approximation of the identifier) where it appears is also a recommended
practice. The customer’s credit card limit, for instance, could be designated
CUS CREDIT LIMIT in the CUSTOMER object (Graml et al., 2007).
The CUS specifies that now the property is informative of the Customer
id, and the CREDIT LIMIT denotes that the values in the property will
be straightforward to detect. Whenever we address the necessity to utilize
common properties to indicate connections between objects in a subsequent
section, this becomes more significant. The capacity of the data structure
to promote communication among some of the architects, application
developers, and end consumers will increase with the implementation of a
correct filename. Using a consistent naming standard might help your model
become a personality (Grosof et al., 1999).
2.5. THE EVOLUTION OF DATA MODELS

The search for improved data administration has resulted in several models
that aim to address the file system’s key flaws. Those models reflect different
schools of thinking on what a system is, how it should accomplish, what
sorts of patterns this should utilize, and how to execute these patterns using
technology. Those models, like the graphic models we’ve been examining,
are termed data models, which may be perplexing (Navathe, 1992).
In this part, we’ll go through the primary data models in approximately
historical sequence. Several of the “latest” database ideas and concepts are
very similar to several of the “old” database design concepts and patterns,
as you shall see. The development of the key data models is shown in Table
2.1 (Fry & Sibley, 1976).
Table 2.1. Major Data Models Have Changed Over Time

Data Models 55
2.5.1. Hierarchical and Network Models

Within the 1960s, the hierarchical model was constructed to handle vast
volumes of data for complicated industrial projects like the Apollo rocket
which landed on the moon in 1969. A positive side tree represents its core
logical structure. There are levels, or sections, in the hierarchical system.
A segment is the counterpart of a specific record in a system file. A greater
layer is regarded also as the parent of a part right underneath it, known as the
kid, within the hierarchy. A collection about one (1:M) connection between
such a parent as well as its children sections is shown in the hierarchical data
model. (Every parent may have several babies, but every kid does have one)
(Zinner et al., 2006).
The networking model was developed to better express complicated data
linkages than that of the hierarchical data model, increase query processing,
and establish a system standard. Every user understands the networking
database is a set of entries with 1:M associations inside the network
structure. The network structure, unlike the hierarchical approach, permits
a reference to having several parents. Whereas networking data modeling is
no longer widely utilized, the networking model’s descriptions of common
database principles are nevertheless employed by current data models. The
following are some key topics that were defined during the time (Kumar &
Van Hillegersberg, 2000):
• The schema would be the database supervisor’s conceptual
arrangement of the whole database.
• The subschema specifies the “visible” area of the database even
by application applications.
• create the necessary information from the database’s data
• A language for data management (DML), specifies the context
wherein data may be handled and database information can be
worked with.
• A schema data description language (DDL) is a programming
language that allows a DBA to create schema elements.
The network approach became too unwieldy as information demands
rose and more complex databases and services were needed. The absence
of ad hoc querying placed a lot of pressure on developers to provide the
code needed to create even the most basic reports. However, although old
databases allowed some data isolation, any fundamental changes to the

database may cause chaos in any software applications that took data from
it. The hierarchical database models were mainly supplanted either by the
relational schema in the 1980s due to their drawbacks (Stearns, 1983).
2.5.2. The Relational Model

E. F. Codd (of IBM) first proposed the relational database model in his
seminal article “A Relational Structure of Data with Large Sharing Datasets”
in 1970. Both consumers and designers hailed the relational approach as a
big advance. To take an example, the relational paradigm resulted in the
creation of an “automatic transmission” system to replace the previous
“standard transmitting” databases. Because of its theoretical simplicity, it
paved the way for true database revolutions (Pinzger et al., 2005).
A relationship is a mathematical construct that serves as the cornerstone
of the relational data model. You may conceive of a relation (also known as
a table) as a matrix made up of overlapping columns and rows to escape the
complication of theoretical mathematical concepts. A tuple is just a single
row inside a relation. Every column corresponds to a different property. A
precise collection of data manipulating structures based on sophisticated
mathematical principles is likewise described by the conceptual data
(D’Ambros et al., 2008).
Codd’s technique was regarded as innovative but unworkable in 1970.
The conceptual clarity of the relational data model was purchased at the cost
of computing complexity; computers just have a time-limited capacity to
execute the relational data model. Luckily, computer power and operational
system efficiency both increased tremendously. Even better, as computational
power increased, the price of computers decreased fast. Although PCs can
now run complex relational database technology including Oracle, DB2,
Microsoft SQL, MySQL, as well as other classic relational software for a
fraction of the price of respective mainframe predecessors (Blundell et al.,
1999).
A complex relational data software is used to support the relational
model (RDBMS). The RDBMS provides the same fundamental operations
as hierarchical database DBMS systems, as well as a slew of extra details
that help the relational model be simpler to comprehend and execute.
The capability of the RDBMS to conceal the complexity of the relational
data model first from the consumer is perhaps its most significant asset.
The RDBMS is in charge of any physical aspects, but the user views the
Data Models 57
relationship system as a collection of databases where data is kept. The data

may be manipulated and queried logically and understandably by the user
(Manegold et al., 2009).
Tables are linked by the fact that they have a similar property (value in
a column). Figure 2.1’s CUSTOMER table, for instance, may comprise a
sales agent’s quantity that is also found inside the AGENT table (Yoder &
Yang, 2000).
Figure 2.1. Creating connections between relational tables.
Source: https://launchschool.com/books/sql/read/table_relationships.
However, if the consumer data is saved in one database as well as the
sales consultant data is maintained in another, having a common connection
between the Client and AGENT columns involves comparing the consumer
with his or her sales consultant. For instance, you may quickly discover
that consumer Dunne’s representative is Alex Alby since the CUSTOMER
table’s AGENT CODE is 501, which corresponds to the AGENT table’s
AGENT CODE (Elith et al., 2010).
Even though the tables are independent of each other, the data may
simply be linked between them. The relationship approach offers a regulated
amount of redundancy that eliminates most of the inconsistencies seen in
data files (Schilthuizen & Davison, 2005).
In such a relational schema, the connection type (1:1, 1:M, or M: N) is
frequently displayed, as seen in Figure 2.2. A connection diagram depicts the
objects, characteristics inside those organizations, and connections among

these organizations in a database system (Staub et al., 2010).
The correlation diagram shown In figure 2.2 depicts the linking fields
(in this example, AGENT CODE) and also the interaction type, 1:M. The
(universe) sign is used to denote the “many” sides in Microsoft Word, the
database software package used to create Figure 2.2. Whereas an AGENT
might have multiple CUSTOMERS, the CUSTOMER symbolizes the
“multiple” side in just this example. Because every CUSTOMER has just
one AGENT, every Agent usually has the “1” portion (Andreozzi et al.,
2008).
Figure 2.2. A diagram that shows how things are connected.
Source: https://www.nsf.gov/news/mmg/mmg_disp.jsp?med_id=79315.
A collection of connected items is stored in a relational table. In
this way, a relational data table is similar to a file. However, there’s one
significant distinction between such a table as well as a file: Because this
is a purely rational design, a table provides access to all information and
architectural independence. Both user and the developer are unconcerned
with how the information is kept in the databases; what matters that’s how
the information is seen. And this aspect of a relational schema, which will
be discussed in further subsequent chapters, became the catalyst for a true
database revolution (Schönrich & Binney, 2009).
The traditional relational model’s prominence is also due to its strong
and flexible programming language. The programming language like most
Data Models 59
relational data technology is Structured Query Language (SQL), which

enables users to express what needs to be done without defining how. SQL
is used by the RDBMS to interpret customer inquiries into commands for
getting the data. SQL allows you to retrieve information with significantly
less work than every database or file system. Any Query language relational
application software has three pieces from the customer viewpoint: an
interface, a collection of database tables, as well as the SQL “engines.” Each
one of these components is described in detail below (Manel et al., 2001).
• The User Interface is the Way the User Interacts with the
System: The interface essentially enables the end consumer
to utilize the data (by auto-generating SQL code). Every
interface seems to be the result of the program vendor’s vision
of significant data interactions. You may also create your own
services provider enabling using application developers, which
are already commonplace in database applications.
• A Database Table that Contains a Group of Tables: All
information inside a relational database seems to be saved in
tables. The tables merely “display” the facts in a simple manner
to the end customer. Every table is self-contained. Basic values
within common characteristics connect rows in various tables.
• SQL Engine: All inquiries, or data queries, are executed by
the SQL engine, which is mostly concealed first from the end
consumer. Remember that now the SQL engines are a component
of the database management system (DBMS). SQL is used by the
end-user to construct multiple tables, as well as data availability
and table management. All consumer requests are processed by
the SQL processor mostly behind the curtains but without the
awareness of the end consumer. As a result, SQL is described as
a descriptive language that specifies what can be performed not
just how.
It is not required to concentrate just on the physical components of
both the database since the RDBMS handles behind operations. The next
chapters, on the other hand, focus just on the logical aspect of both the
relational model and its architecture (Blaschka et al., 1999).
2.5.3. The Entity-Relationship Model

The desire for RDBMSs was sparked by the theoretical clarity of relational
data technologies. As a result of the fast-expanding transactional and
information needs, more complicated database application structures were

required, necessitating the development of more effective system design
tools. (Constructing a tower, for instance, necessitates more extensive design
processes than constructing a doghouse) (Harrison, 2015).
To be effective, complex design processes need conceptual simplicity.
Despite being a major advance so over hierarchical database designs, the
relational model required the qualities which would make it a useful data
modeling instrument. DBAs prefer to utilize a graphical tool that depicts
objects and their connections because it is simpler to evaluate structures
visually than it would be to explain them in language. As a result, the entity
connection (ER) model, sometimes known as the ERM, becomes a widely
acknowledged database design standard (Blaschka et al., 1999).
An ER database model was initially developed in 1976 by Peter Chen;
that was a graphical depiction of objects and their interactions in a database
schema that soon gained popularity as a supplement to traditional relational
model principles. The relational database model and ERM worked together
to provide the groundwork for a well-structured database. Entity-relationship
diagrams (ERDs) are a visual portrayal of separable datasets that are used to
depict ER models (Masud et al., 2010).
The following components make up the ER model (Liu et al., 2011):
• Entity: An organization was described previously in this chapter
as something over which data should be gathered and kept. A
rectangular, also termed an object box, is used to symbolize an
object in the ERD. The institution’s name, a name, is placed
inside the rectangle’s interior. PAINTER instead of PAINTERS,
while EMPLOYEE but instead of EMPLOYEES are examples of
entity names written in upper case letters as well as in the single
form. Each entity determines the truth to a related table when
using the ERD with a relational model. In ER model, every entry
within a relational database is referred to as an object instance or
object occurrence. Every entity is characterized by a collection
of attributes that explain the entity’s specific qualities. Contact
Information, the last surname, or first title, for instance, will be
characteristics of the type EMPLOYEE.
Data Models 61
• Relationships: The term “relationship” refers to the way data is

linked together. The majority of relationships are made up of two
or more entities. Three kinds of data connections were presented
when the fundamental data manufacturing performance was
presented: another (1:M), numerous (M: N), and just one (1:1).
The word “communication” is used in the ER model to describe
the different sorts of relationships. The connection is generally
named after a passive or active verb. A PAINTER, for instance,
creates several pictures; an EMPLOYEE acquires numerous
SKILLS, and an EMPLOYEE operates a STORE.
Figure 2.3 uses two ER notations to depict distinct sorts of relationships:
the classic Chen writing and the more recent Crow’s Foot notation (Schwartz
& Schäffer, 2017).
The Chen notation, founded on Peter Chen’s seminal article, is shown
on the left side of an ER diagram. The provides detailed information is
presented beside each item box in this manner. A diamond represents a
connection, which is linked to the associated entities via a relationship line.
The name of the connection is engraved just on a diamond. The Spider’s
Foot indication is shown on the given in figure 2.3. The three-pronged
sign used to depict the “many” end of the connection gave rise to the term
“Crow’s Foot.” Note that the provides detailed information is expressed by
signs in the basic Crow’s Foot ERD in Diagram 2.3. A quick line section,
for instance, represents the “1,” while the three-pronged “crow’s foot”
represents the “M.” The connection name is printed above the connection
line in just this example (Herrmannsdoerfer et al., 2010).
Entities and connections are shown horizontally in Figure 2.3, although
they can also be positioned vertically. It makes no difference where the
objects are located or in what sequence they are displayed; just takes some
time to read a 1:M connection from the “1” edge to the “M” side. Under this
textbook, the Crow’s Foot symbol is employed as the design specification.
When required, though, the Chen notation is employed to demonstrate a few
of the ER modeling ideas. The Crow’s Foot notation is available in most
information modeling software. The Crow’s Foot drawings you’ll see in
the following chapters were created using Microsoft Visio Expert software
(Bolker et al., 2009).
Figure 2.3. Notations for Chen and crow’s foot.
Source: http://www.myreadingroom.co.in/notes-and-studymaterial/65-
dbms/471-the-entity-relationship-model.html.
The ER model is the most popular database modeling and designing
tool because of its unparalleled visual clarity. Nonetheless, even as the data
environment evolves, the hunt for new relevant data continues (Moser et al.,
2011).
2.5.4. The Object-Oriented (OO) Model

Any need for a database schema that more accurately matched the actual
world became apparent as actual concerns became more complicated. This
information and its connections are stored in a single structure called an
item within an object-oriented data model (OODM). The object-oriented
database server is built on top of the OODM (OODBMS) (McLennan &
Taylor, 1980).
An OODM is a representation of a significantly different approach to
defining and using things. Objects, such as the entity in the relational data
model, are characterized by their actual information. Besides an entity,
however, an object has information about connections between both the facts
included inside the item, as well as data about the organism’s interactions
with some other objects. As a result, the facts contained inside the item are
given more weight. Because semantics denotes meaning, the OODM is
referred to as a semantics data structure (Warren et al., 2008).
Data Models 63
Following OODM developments, an object may now include all of the

actions that can be done on it, including updating data values, locating a
particular data value, and publishing data values. Even though objects
comprise data, multiple forms of connections, and operational processes,
they become an identity, possibly making them a fundamental building
element for autonomous systems (Calvet Liñán & Juan Pérez, 2015).
The following elements make up the OO dataset (Wu et al., 2013):
• An item is a representation of a physical thing. In general, an
item can be thought of as the entity of an ER model. A single
instance of such an object is represented by an object. (Many of
the components inside this list determine the semantic meaning
of the thing.)
• The qualities of an item are described via attributes. The
characteristics of Identity, Social Security Card, or Birth date, for
instance, are all present inside a PERSON’s object.
• Classes are groups of objects with similar features. A class is a
group of objects that have similar formats (characteristics) and
behaviors (methods). A class is similar to the fixed order in the Er
diagram generally. A class, on the other hand, differs from a fixed
order in that it has a collection of operations called methods. The
method of a class depicts a real-world activity like discovering a
PERSON’s name, updating a PERSON’s name, or publishing a
PERSON’s address. In those other words, in classical computer
languages, techniques are equal to processes. Methods describe
an object’s behavior in OO terminology (Robins et al., 1995).
• A class structure is used to arrange classes. That class architecture
resembles a positive side tree with just one parent in each class.
The CUSTOMER as well as EMPLOYEE classes, for instance,
have the same parental PERSON class. (Notice how this is
comparable to the hierarchical database.)
• Inheritance refers to that an object’s capacity to inherit the
characteristics and functions of the objects upwards in the class
structure. For instance, two classes may be built as subclasses of
the PERSON class: CUSTOMER or EMPLOYEE. CUSTOMER
or EMPLOYEE would inherit all properties and methods through
PERSON in this situation.
Unified Modeling Language (UML) classes and objects are often
used to illustrate entity database schemas. The Unified Modeling Language
(UML) is an OO-based language that provides a collection of diagrams

& icons for visually describing a system. Within the wider UML object-
oriented program’s markup language, a class diagram is used to model data
and their connections. For a more detailed explanation of UML, see here
(Sauter, 2007; Hansen & Martins, 1996).
Let’s look at a basic billing scenario to demonstrate the key elements of
the entity data architecture. Customers produce bills in this example, with
each invoice referencing one or even more sections, every line representing
a user’s purchase. The object format, and also the matching Uml sequence
diagram with ER model, are shown in Figure 2.4 for this basic invoices
scenario. An easy technique to depict a single item event is to use a type of
material (Lynch, 2007).
Figure 2.4. OO, UML, and ER models are compared.
Source: http://www.differencebetween.info/difference-between-uml-and-erd.
Take note of the following as you study Figure 2.4 (Manatschal, 2004):
• The INVOICE’s item format has all linked items in the same
item box. It’s worth noting that perhaps the strong positive
correlation (1 and M) denotes the associated items’ connection
to the INVOICE. The 1 beside the CUSTOMER object, for
instance, shows so each INVOICE is associated with just one
CUSTOMER. Every INVOICE has numerous LINEs, as shown
by the M besides the LINE objects.
Data Models 65
• To depict this basic invoicing issue, the UML class diagram

employs three independent classifiers (CUSTOMER, INVOICE,
and LINE) as well as two connections. The 1.1, 0.*, or 1..*
signs reflect the connection consubstantial, and the connections
are identified along both ends to symbolize the various
“responsibilities” that the items play inside the connection
(Ingram & Mahler, 2013).
• To illustrate this basic invoicing issue, the ER model employs
three different objects and two connections. The benefits of
OODM are being felt across the board, from systems modeling
to coding. The ODM’s new semantics allows for a more detailed
description of complex forms. As a result, programs were able
to accommodate more complex structures in novel ways. As
you’ll see in the following section, adaptive developments had an
impact just on the relational data model as well.
REFERENCES
1. Andreozzi, S., Burke, S., Field, L., & Konya, B., (2008). Towards
GLUE 2: Evolution of the computing element information model. In:
Journal of Physics: Conference Series (Vol. 119, No. 6, p. 062009).
IOP Publishing.
2. Araujo‐Pradere, E. A., (2009). Transitioning space weather models into
operations: The basic building blocks. Space Weather, 7(10), 33–40.
3. Aubry, K. B., Raley, C. M., & McKelvey, K. S., (2017). The importance
of data quality for generating reliable distribution models for rare,
elusive, and cryptic species. PLoS One, 12(6), e0179152.
4. Bachman, C. W., & Daya, M., (1977). The role concept in data models.
In: Proceedings of the Third International Conference on Very Large
Data Bases (Vol. 1, pp. 464–476).
5. Bajec, M., & Krisper, M., (2005). A methodology and tool support for
managing business rules in organizations. Information Systems, 30(6),
423–443.
6. Blaschka, M., Sapia, C., & Höfling, G., (1999). On schema evolution
in multidimensional databases. In: International Conference on
Data Warehousing and Knowledge Discovery (Vol. 1, pp. 153–164).
Springer, Berlin, Heidelberg.
7. Blundell, R., & Bond, S., (1998). Initial conditions and moment
restrictions in dynamic panel data models. Journal of Econometrics,
87(1), 115–143.
8. Blundell, R., Griffith, R., & Van, R. J., (1999). Market share, market
value and innovation in a panel of British manufacturing firms. The
Review of Economic Studies, 66(3), 529–554.
9. Blundell, R., Griffith, R., & Windmeijer, F., (2002). Individual effects
and dynamics in count data models. Journal of Econometrics, 108(1),
113–131.
10. Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J.
R., Stevens, M. H. H., & White, J. S. S., (2009). Generalized linear
mixed models: A practical guide for ecology and evolution. Trends in
Ecology & Evolution, 24(3), 127–135.
11. Bond, S. R., (2002). Dynamic panel data models: A guide to micro data
methods and practice. Portuguese Economic Journal, 1(2), 141–162.
12. Brodie, M. L., (1984). On the development of data models. In: On
Conceptual Modeling (Vol. 1, pp. 19–47). Springer, New York, NY.
Data Models 67
13. Brown, J. M., & Lemmon, A. R., (2007). The importance of data
partitioning and the utility of Bayes factors in Bayesian phylogenetics.
Systematic Biology, 56(4), 643–655.
14. Calvet, L. L., & Juan, P. Á. A., (2015). Educational data mining
and learning analytics: Differences, similarities, and time evolution.
International Journal of Educational Technology in Higher Education,
12(3), 98–112.
15. Chang, K. C., Dutta, S., Mirams, G. R., Beattie, K. A., Sheng, J., Tran, P.
N., & Li, Z., (2017). Uncertainty quantification reveals the importance
of data variability and experimental design considerations for in silico
proarrhythmia risk assessment. Frontiers in Physiology, 1, 917.
16. Charfi, A., & Mezini, M., (2004). Hybrid web service composition:
Business processes meet business rules. In: Proceedings of the 2nd
International Conference on Service Oriented Computing (Vol. 1, pp.
30–38).
17. Collins, A. G., Schuchert, P., Marques, A. C., Jankowski, T., Medina,
M., & Schierwater, B., (2006). Medusozoan phylogeny and character
evolution clarified by new large and small subunit rDNA data and an
assessment of the utility of phylogenetic mixture models. Systematic
Biology, 55(1), 97–115.
18. D’Ambros, M., Gall, H., Lanza, M., & Pinzger, M., (2008). Analyzing
software repositories to understand software evolution. In: Software
Evolution (Vol. 1, pp. 37–67). Springer, Berlin, Heidelberg.
19. Demuth, B., Hussmann, H., & Loecher, S., (2001). OCL as a
specification language for business rules in database applications. In:
International Conference on the Unified Modeling Language (Vol. 1,
pp. 104–117). Springer, Berlin, Heidelberg.
20. Elhorst, J. P., (2014). Spatial panel data models. In: Spatial Econometrics
21. Elith, J., Kearney, M., & Phillips, S., (2010). The art of modelling
range‐shifting species. Methods in Ecology and Evolution, 1(4), 330–
342.
22. Falk, E. B., & Bassett, D. S., (2017). Brain and social networks:
Fundamental building blocks of human experience. Trends in Cognitive
Sciences, 21(9), 674–690.
23. Feeley, K. J., & Silman, M. R., (2010). Modelling the responses of
Andean and Amazonian plant species to climate change: The effects of
georeferencing errors and the importance of data filtering. Journal of

Biogeography, 37(4), 733–740.
24. Fry, J. P., & Sibley, E. H., (1976). Evolution of data-base management
25. Graml, T., Bracht, R., & Spies, M., (2007). Patterns of business rules to
enable agile business processes. In: 11th IEEE International Enterprise
Distributed Object Computing Conference (EDOC 2007) (Vol. 1, pp.
365–365). IEEE.
26. Grosof, B. N., Labrou, Y., & Chan, H. Y., (1999). A declarative approach
to business rules in contracts: Courteous logic programs in XML. In:
Proceedings of the 1st ACM Conference on Electronic Commerce (Vol.
1, pp. 68–77).
27. Ha, D., & Schmidhuber, J., (2018). Recurrent world models facilitate
policy evolution. Advances in Neural Information Processing Systems,
1, 31.
28. Hansen, T. F., & Martins, E. P., (1996). Translating between
microevolutionary process and macroevolutionary patterns: The
correlation structure of interspecific data. Evolution, 50(4), 1404–1417.
29. Harrison, X. A., (2015). A comparison of observation-level random
effect and beta-binomial models for modelling overdispersion in
binomial data in ecology & evolution. PeerJ, 3, 4.
30. Hellerstein, D., & Mendelsohn, R., (1993). A theoretical foundation
for count data models. American Journal of Agricultural Economics,
75(3), 604–611.
31. Herbst, H., (1996). Business rules in systems analysis: A meta-model
and repository system. Information Systems, 21(2), 147–166.
32. Herbst, H., Knolmayer, G., Myrach, T., & Schlesinger, M., (1994). The
specification of business rules: A comparison of selected methodologies.
In: Methods and Associated Tools for the Information Systems Life
Cycle (Vol. 1, pp. 29–46).
33. Herrmannsdoerfer, M., Vermolen, S. D., & Wachsmuth, G., (2010). An
extensive catalog of operators for the coupled evolution of metamodels
and models. In: International Conference on Software Language
Engineering (Vol. 1, pp. 163–182). Springer, Berlin, Heidelberg.
Data Models 69
34. Ingram, T., & Mahler, D. L., (2013). SURFACE: Detecting convergent
evolution from comparative data by fitting Ornstein‐Uhlenbeck models
with stepwise Akaike information criterion. Methods in Ecology and
Evolution, 4(5), 416–425.
35. Kardasis, P., & Loucopoulos, P., (2004). Expressing and organizing
business rules. Information and Software Technology, 46(11), 701–718.
36. Kiviet, J. F., (1995). On bias, inconsistency, and efficiency of various
estimators in dynamic panel data models. Journal of Econometrics,
68(1), 53–78.
37. Knolmayer, G., Endl, R., & Pfahrer, M., (2000). Modeling processes
and workflows by business rules. In: Business Process Management
38. Kumar, K., & Van, H. J., (2000). ERP experiences and evolution.
Communications of the ACM, 43(4), 22–22.
39. Liu, C., Mao, Y., Van, D. M. J., & Fernandez, M., (2011). Cloud
resource orchestration: A data-centric approach. In: Proceedings of
the Biennial Conference on Innovative Data Systems Research (CIDR)
(Vol. 1, pp. 1–8).
40. Lynch, V. J., (2007). Inventing an arsenal: Adaptive evolution and
neofunctionalization of snake venom phospholipase A2 genes. BMC
Evolutionary Biology, 7(1), 1–14.
41. Makarova, K. S., Wolf, Y. I., & Koonin, E. V., (2013). The basic
building blocks and evolution of CRISPR–Cas systems. Biochemical
Society Transactions, 41(6), 1392–1400.
42. Manatschal, G., (2004). New models for evolution of magma-poor
rifted margins based on a review of data and concepts from west Iberia
and the Alps. International Journal of Earth Sciences, 93(3), 432–466.
43. Manegold, S., Kersten, M. L., & Boncz, P., (2009). Database architecture
evolution: Mammals flourished long before dinosaurs became extinct.
Proceedings of the VLDB Endowment, 2(2), 1648–1653.
44. Manel, S., Williams, H. C., & Ormerod, S. J., (2001). Evaluating
presence–absence models in ecology: The need to account for
prevalence. Journal of Applied Ecology, 38(5), 921–931.
45. Masud, M. M., Chen, Q., Khan, L., Aggarwal, C., Gao, J., Han, J., &
Thuraisingham, B., (2010). Addressing concept-evolution in concept-
drifting data streams. In: 2010 IEEE International Conference on Data
Mining (Vol. 1, pp. 929–934). IEEE.
46. McLennan, S. M., & Taylor, S. R., (1980). Th and U in sedimentary
rocks: Crustal evolution and sedimentary recycling. Nature, 285(5767),
621–624.
47. Monakova, G., Kopp, O., Leymann, F., Moser, S., & Schäfers, K.,
(2009). Verifying business rules using an SMT solver for BPEL
processes. Business Process, Services–Computing and Intelligent
Service Management, 1, 2–8.
48. Moonen, N. N., Flood, A. H., Fernández, J. M., & Stoddart, J. F.,
(2005). Towards a rational design of molecular switches and sensors
from their basic building blocks. Molecular Machines, 1, 99–132.
49. Moser, T., Mordinyi, R., Winkler, D., Melik-Merkumians, M., & Biffl,
S., (2011). Efficient automation systems engineering process support
based on semantic integration of engineering knowledge. In: ETFA2011
(Vol. 1, pp. 1–8). IEEE.
50. Mullahy, J., (1986). Specification and testing of some modified count
data models. Journal of Econometrics, 33(3), 341–365.
51. Navathe, S. B., (1992). Evolution of data modeling for databases.
52. Nemuraite, L., Skersys, T., Sukys, A., Sinkevicius, E., & Ablonskis,
L., (2010). VETIS tool for editing and transforming SBVR business
vocabularies and business rules into UML&OCL models. In: 16th
International Conference on Information and Software Technologies,
Kaunas: Kaunas University of Technology (Vol. 1, pp. 377–384).
53. Peckham, J., & Maryanski, F., (1988). Semantic data models. ACM
Computing Surveys (CSUR), 20(3), 153–189.
54. Pinzger, M., Gall, H., Fischer, M., & Lanza, M., (2005). Visualizing
multiple evolution metrics. In: Proceedings of the 2005 ACM
Symposium on Software Visualization (Vol. 1, pp. 67–75).
55. Pulparambil, S., Baghdadi, Y., Al-Hamdani, A., & Al-Badawi,
M., (2017). Exploring the main building blocks of SOA method:
SOA maturity model perspective. Service Oriented Computing and
Applications, 11(2), 217–232.
Data Models 71
56. Robins, J. M., Rotnitzky, A., & Zhao, L. P., (1995). Analysis of
semiparametric regression models for repeated outcomes in the
presence of missing data. Journal of the American Statistical
Association, 90(429), 106–121.
57. Rosca, D., Greenspan, S., Feblowitz, M., & Wild, C., (1997). A
decision making methodology in support of the business rules lifecycle.
In: Proceedings of ISRE’97: 3rd IEEE International Symposium on
Requirements Engineering (Vol. 1, pp. 236–246). IEEE.
58. Rosenberg, F., & Dustdar, S., (2005). Business rules integration in
BPEL-a service-oriented approach. In: Seventh IEEE International
Conference on E-Commerce Technology (CEC’05) (Vol. 1, pp. 476–
479). IEEE.
59. Sauter, T., (2007). The continuing evolution of integration in
manufacturing automation. IEEE Industrial Electronics Magazine,
1(1), 10–19.
60. Sauter, T., (2010). The three generations of field-level networks—
Evolution and compatibility issues. IEEE Transactions on Industrial
Electronics, 57(11), 3585–3595.
61. Schilthuizen, M., & Davison, A., (2005). The convoluted evolution of
snail chirality. Naturwissenschaften, 92(11), 504–515.
62. Schönrich, R., & Binney, J., (2009). Chemical evolution with radial
mixing. Monthly Notices of the Royal Astronomical Society, 396(1),
203–222.
63. Schwartz, R., & Schäffer, A. A., (2017). The evolution of tumor
phylogenetics: Principles and practice. Nature Reviews Genetics,
18(4), 213–229.
64. Staub, R. B., E Souza, G. D. S., & Tabak, B. M., (2010). Evolution
of bank efficiency in Brazil: A DEA approach. European Journal of
Operational Research, 202(1), 204–213.
65. Stearns, S. C., (1983). A natural experiment in life-history evolution:
Field data on the introduction of mosquitofish (Gambusia affinis) to
Hawaii. Evolution, 1, 601–617.
66. Sudan, R., (2005). The basic building blocks of e-government.
E-Development: From Excitement to Effectiveness, 1, 79–100.
67. Taveter, K., & Wagner, G., (2001). Agent-oriented enterprise modeling
based on business rules. In: International Conference on Conceptual
Modeling (Vol. 1, pp. 527–540). Springer, Berlin, Heidelberg.
68. Tekieh, M. H., & Raahemi, B., (2015). Importance of data mining
in healthcare: A survey. In: Proceedings of the 2015 IEEE/ACM
International Conference on Advances in Social Networks Analysis
and Mining 2015 (Vol. 1, pp. 1057–1062).
69. Van, E. T., Iacob, M. E., & Ponisio, M. L., (2008). Achieving business
process flexibility with business rules. In: 2008 12th International IEEE
Enterprise Distributed Object Computing Conference (Vol. 1, pp. 95–
104). IEEE.
70. Wan-Kadir, W. M., & Loucopoulos, P., (2004). Relating evolving
business rules to software design. Journal of Systems Architecture,
50(7), 367–382.
71. Warren, D. L., Glor, R. E., & Turelli, M., (2008). Environmental niche
equivalency versus conservatism: Quantitative approaches to niche
evolution. Evolution: International Journal of Organic Evolution,
62(11), 2868–2883.
72. Whitby, M., Pessoa-Silva, C. L., McLaws, M. L., Allegranzi, B., Sax,
H., Larson, E., & Pittet, D., (2007). Behavioral considerations for
hand hygiene practices: The basic building blocks. Journal of Hospital
Infection, 65(1), 1–8.
73. Wooldridge, J. M., (2005). Simple solutions to the initial conditions
problem in dynamic, nonlinear panel data models with unobserved
heterogeneity. Journal of Applied Econometrics, 20(1), 39–54.
74. Wu, X., Zhu, X., Wu, G. Q., & Ding, W., (2013). Data mining with big
data. IEEE Transactions on Knowledge and Data Engineering, 26(1),
97–107.
75. Yoder, A. D., & Yang, Z., (2000). Estimation of primate speciation
dates using local molecular clocks. Molecular Biology and Evolution,
17(7), 1081–1090.
76. Zinner, E., Nittler, L. R., Gallino, R., Karakas, A. I., Lugaro, M.,
Straniero, O., & Lattanzio, J. C., (2006). Silicon and carbon isotopic
ratios in AGB stars: SiC grain data, models, and the galactic evolution
of the Si isotopes. The Astrophysical Journal, 650(1), 350.
77. Zur, M. M., & Indulska, M., (2010). Modeling languages for business
processes and business rules: A representational analysis. Information
Systems, 35(4), 379–390.
CHAPTER 3
DATABASE ENVIRONMENT
CONTENTS
3.1. Introduction....................................................................................... 74
3.2. Three-Level Ansi-Sparc Architecture................................................... 74
3.3. Database Languages.......................................................................... 81
3.4. Conceptual Modeling and Data Models............................................. 86
3.5. Functions of a DBMS......................................................................... 91
3.6. Components of a DBMS.................................................................... 97
References.............................................................................................. 102
3.1. INTRODUCTION
The primary objective of a database system is to present clients with an
abstract picture of information, concealing the specifics of how it has been
managed and stored. Thus, the architecture of a database should begin with
an abstract and broad detail of the organization’s data needs that will be
reflected in the database. In this chapter, the term “organization” is used
generally to refer to the entire organization or a portion of it. In the case
study of Dream Home, for instance, we can be concerned with modeling
(Philip et al., 1992):
• Rental Property, Staff, Client, and Private Owner are ‘real-world’
elements;
• Every entity has characteristics that describe its characteristics
or features (for instance, Staff has a position, a name, and a pay)
(Yannakoudakis et al., 1999);
The connections that exist between such elements (For instance, Staff
Controls Rental Property) (Pieterse & Olivier, 2012).
In addition, because a database is a source that has been shared among
several users, every consumer may need a unique view of the information
that is stored in the database. Because of such requirements, the architecture
of many commercial database management systems (DBMSs) is based on
the ANSI-SPARC design. We will discuss several different functional and
architectural properties of DBMSs (Zdonik & Wegner, 1988).
3.2. THREE-LEVEL ANSI-SPARC ARCHITECTURE

The Data Base Task Group (DBTG), which had been constituted by the
Conference on Data Systems and Languages in 1971, provides a timely
recommendation for common nomenclature and basic design for database
systems (CODASYL, 1971). With a system view termed as the schema and
user views termed as the subschemas, the database task group recognized the
necessity for a 2-level method. The Standards Planning and Requirements
Committee (SPARC) of the American National Standards Institute (ANSI),
ANSI/X3/SPARC, established a comparable vocabulary and architecture in
1975 (ANSI, 1975) (Sy et al., 2018).
The requirement for a 3-level strategy with a system catalog was
acknowledged by ANSI-SPARC. Such ideas resembled those made a
few years before by IBM client groups Guide and Share, and focused on
the requirement for an execution-based layer to insulate programs from
Database Environment 75
fundamental representation difficulties (Share/Guide, 1970). Even though

the ANSI-SPARC model didn’t become a standard, still it serves as a
foundation for recognizing a few of the functionality of a DBMS (Grefen et
al., 2003).
The essential goal of these and subsequent reports, for our objectives,
has been the determination of 3-levels of abstraction, or 3 unique levels
during which data objects might be defined. As indicated in Figure 3.1, the
levels include a 3-level architecture consisting of an exterior, a conceptual,
and an interior level. The exterior level refers to how people interpret the
data. The interior level is how the DBMS and OS interpret the data and
where the data is kept utilizing data structures and files organization (Grefen
& Angelov, 2001).
Figure 3.1. The ANSI-SPARC 3-level design.
Source: https://www.geeksforgeeks.org/the-three-level-ansi-sparc-architec-
ture/.
Between the exterior and interior levels, the conceptual level offers
both the mapping and the needed independence. The purpose of the 3-level
design is to isolate the user’s perspective from the database’s physical

representation. Such separation is beneficial for several reasons (Kemp et
al., 2002):
• Every client must have access to the data, but with a unique
personalized perspective of it. Every user must be capable to
modify the method she or he perceives the data without affecting
other users.
• The consumers of the database must not be required to deal
directly with the details of the physical database storage, like
hashing or indexing (see Appendix C). That is to say, how a user
interacts with the database ought to be unaffected by concerns
about data storage.
• The Database Administrator (DBA) must be allowed to modify
database storage structures without impacting user views.
• Changes to the physical characteristics of storage, like switching
to a new storage device, will not affect the database’s inner
structure.
• The DBA should be allowed to alter the database’s conceptual
structure without impacting all clients (Grefen et al., 2002).
3.2.1. External Level

The external level has been comprised of a variety of external database
views. Every consumer sees a representation of the “actual world” that has
been recognizable to that person. The external view contains just the entities,
properties, and relationships from the “real world” that are of interest to the
user. Other irrelevant entities, properties, or relationships can be recorded in
the database, however, the client would be ignorant of them (Habela et al.,
2006).
Additionally, multiple perspectives can show the same data differently.
For instance, one client can see dates as (year, month, day), whereas another
may view dates as (year, month, day). Certain views may contain derived
or calculated data, which is data that is not saved in the database but is
constructed on demand (Aier & Winter, 2009). In the case study of Dream
Home, for instance, we can desire to observe the age of a staff member.
Furthermore, storing ages is improbable, as this information will need to be
updated regularly. Rather than, the employee’s date of birth will be recorded,
and the DBMS will determine the employee’s age when it was accessed
(Samos et al., 1998).
3.2.2. Conceptual Level

The conceptual level is the intermediate level of the 3-level design. This
level offers the DBA’s view of the complete database’s logical structure.
It is a comprehensive perspective of the organization’s data needs that has
been irrespective of any storage issues. The conceptual level denotes the
following (CLASS, 2001):
• all elements, their properties, and their connections;
• the limitations of the data;
• information about the data in terms of semantics; and
• security and integrity-related data.
Every external view is supported by the conceptual level, as all user-
accessible data should be confined in or derived from the conceptual level.
Nonetheless, this level cannot include every storage-dependent information.
For example, the detail of an element must only include the kinds of data of
its characteristics (like real, integer, and character) and its length (like the
maximal number of characters or digits), but not storage concerns like the
number of bytes allocated (Okayama et al., 1998).
3.2.3. Internal Level

The physical execution of the database at the internal level is covered to
ensure optimum runtime efficiency and storage space usage. It includes the
data structures and file systems that are utilized to save information on storing
devices. It works with OS access methods (file management methods for
retrieving and storing data records) to put data on storing devices, recover
data, and create indexes among other things. On the internal level, there are
concerns about (Biller, 1982):
• allotment of index and data storage space;
• record descriptors for storage (including data item sizes);
• location of the record; and
• approaches for data encryption and compression.
There has been a physical level beneath the interior level, which can be
controlled by the OS under the guidance of the DBMS. Furthermore, at the
physical level, the functions of the DBMS and the OS stay not straightforward
and differ from scheme to scheme. Certain DBMSs utilize several of the
OS access mechanisms, whilst others utilize only the fundamental ones and
develop their file structures. The physical level beneath the DBMS contains
information that is only known by the operating system, like how the
sequencing has been performed or if the fields of interior records are saved
as continuous bytes on the disk (Li & Wang, 2007).
3.2.4. Mappings, Schemas, and Instances

The comprehensive detail of a database has been referred to as the database
schema. There have been 3 distinct schema kinds in the database, each of
which is defined by the degrees of abstraction of the 3-level architecture
depicted in Figure 3.1. Various external schemas (also known as subschemas)
relate to distinct data views at the higher level (de Brock, 2018). At the
conceptual level, there is the conceptual schema, that specifies all items,
characteristics, and connections along with integrity limitations. The internal
schema is a thorough detail of the interior model, and it contains the detail
of contained records, the techniques of depiction, the data fields, as well as
the indexes and storage structures that are utilized. This schema exists at
the most fundamental level of abstraction. There has been one conceptual
schema and one inner schema associated with a particular database (Fischer
et al., 2010).
The DBMS oversees mapping such 3 kinds of schema together. It should
also ensure that the schemas are consistent; in other words, every external
schema should be deducible from the conceptual schema, and the conceptual
schema should be used to map internal and external schemas. Internal/
conceptual mapping links the conceptual and internal schemas (Koonce,
1995). This allows the DBMS to locate the actual record or set of records
in physical stores that make up a logical record in the conceptual schema,
as well as any restrictions that need to be applied to those actions. It also
provides for the resolution of any discrepancies in attribute names, entity
names, data types, attribute order, and so on. Ultimately, the conceptual or
external mapping connects every exterior schema to the conceptual schema.
The DBMS may then map names in the user’s view to the appropriate
component of the conceptual schema (van Bommel et al., 1994).
Figure 3.2. The distinctions between the 3 levels.
Figure 3.2 depicts an illustration of the many levels. There are 2
alternative external views of employee information: one with the last name
(lName), first name (fName), age, staff number (sNo), and salary, and the
other with the last name (lName), a staff number (staffNo), and the number
of the branch where the employee works (branchNo) (Rossiter & Heather,
2005). Such external perspectives have been combined into a single
conceptual perspective. The main difference in this merging procedure is
that the age field is replaced with a DOB field. The DBMS keeps track of
the conceptual/external mapping; for instance, it translates the 1st external
view’s staffNo field to the conceptual record’s staffNo field. After that, the
conceptual level has been translated to the interior level, which includes a
physical detail of the conceptual record’s structure (Li et al., 2009).
At this level, the structure is defined using a higher-level language. The
structure has a reference, next, that enables the list of staff records to be
physically chained together. Remember that the inner order of fields differs
from the conceptual order. The database manages the internal or conceptual
mapping once more. This is essential to disintegrate between the database’s
description and the database itself. The database schema is the detail of the
database (Flender, 2010). During the database design phase, the schema is
determined and has not been intended to change regularly. Though, the actual
data in the database can change often, for instance, if we add information
on a new employee or property. A database instance refers to the data in the
database at a specific point in time. Consequently, several database examples
may correspond to a single database schema. The schema sometimes is
referred to as the database’s intention, whereas an instance is referred to as
the database’s extension (or state) (Cooper, 1995).
3.2.5. Data Independence

A key purpose of the 3-level design is to guarantee data independence,
meaning that changes to lower levels do not affect higher levels. Two types
of data independence exist: physical and logical (Samos et al., 1998).
Figure 3.3. The ANSISPARC 3-level architecture and data independence.
Source: https://www.researchgate.net/figure/Schemas-data-independence-in-
ANSI-SPARC-three-level-architecture_fig2_326468693.
It ought to be possible to make modifications to the conceptual schema,
such as removing or adding new items, attributes, or relationships, without
having to rewrite application programs or alter existing external schemas.

It is necessary for the consumers who will be directly impacted by the
changes to be aware of them, but no other consumers must be aware of them
(Shahzad, 2007).
It must be feasible to make modifications to the internal schema, like
employing different organizations of file or storage structures, utilizing
various storing media, updating indexes, or hashing methods, without having
to make modifications to the external or conceptual schemas. The only
consequence that the consumers can perceive is a difference in performance,
as this is the only aspect that has been altered. A decline in efficiency is
the factor that leads to internal schema modifications the vast majority of
the time. The locations of each form of data independence are depicted in
Figure 3.3 concerning the 3-level architecture (Ligêza, 2006).
Although the ANSI-SPARC architecture’s two-stage mapping could be
wasteful, the increased data independence it offers is well worth the tradeoff.
The ANSI-SPARC approach, on the other hand, permits the mapping
of foreign schemas directly onto the internal schema. This enables the
conceptual schema to be skipped through, which results in a more effective
mapping process. This, however, lessens the data’s independence, Therefore,
whenever the internal schema is updated, the external schema as well as any
application programs that are based upon the internal schema may also need
to be updated. This is because the internal schema is dependent upon the
external schema (Neuhold & Olnhoff, 1981).
3.3. DATABASE LANGUAGES

The 2 components of a data sublanguage are DDL and DML. The DDL
has been utilized to define the database schema, whereas the DML has
been utilized to read and modify the database. Such languages have been
referred to as data sublanguages since they lack features for all computing
requirements, like conditional and iterative expressions, which have been
given by higher-level programming languages. Numerous DBMSs provide
the ability to incorporate a sublanguage into a higher-level programming
language, like Fortran, COBOL, Ada, Pascal, C++, 81C, and Java, in this
instance, the higher-level language has been also termed the host language
(Vossen, 1991).
To generate the embedded file, the data sublanguage instructions in the
host language program are 1st substituted by function calls. The pre-processed
file has been subsequently built, stored in an object module, associated with
DBMS-specific library comprising the replacement functions, and performed
as necessary. The majority of data sublanguages also have interactive or
non-embedded instructions that may be entered directly from a terminal
(Bidoit, 1991).
3.3.1. The Data Definition Language (DDL)

A series of definitions defined in a particular language known as a DDL
defines the database schema. The DDL has been utilized to create a schema
or alter one that already exists. This isn’t possible to alter data using it. The
DDL statements are compiled into a collection of tables that are stored in
special files known as the system catalog. The system index incorporates
metadata, or data that explains things in the database, and makes it simpler
to access and change those objects (Ramamohanarao & Harland, 1994).
The metadata defines data items, records, and other things which are of
significance to consumers or are needed by the DBMS. Before accessing
data in the database, the DBMS usually examines the system index. The
names information dictionary and information directory have been utilized
to detail the system catalog, while the word ‘data dictionary’ generally refers
to a broader software system rather than a catalog for a DBMS (Chen et al.,
1990).
Theoretically, we may identify distinct DDLs for every schema in the
3-level design, including a DDL for external schemas, a DDL for conceptual
schemas, and a DDL for inner schemas. In practice, however, there is a
single DDL that permits the description of minimum the conceptual and
external schemas (Bach & Werner, 2014).
3.3.2. The Data Manipulation Language (DML)

The following are common data manipulation tasks (Liu, 1999):
• modifications to database-stored information;
• retrieval of information from the database; and
• data is removed from the database.
As a result, one of the DBMS’s primary roles is to enable a data
transformation language wherein the client may write statements that would
trigger data transformation. Data manipulation occurs on three levels:
internal level, conceptual level, and external. Internally, however, we should
create some quite intricate low-level routines that enable effective access
to data. At elevated levels, however, the focus is on the simplicity of use,

and efforts are made to provide effective user engagement with the system
(Kanellakis, 1995).
A query language has been the component of a DML that deals with
data retrieval. A query language is a higher-level special-purpose language
that has been utilized to respond to a variety of requests for data retrieval
from a database. As a result, the term ‘query’ is only used to refer to a
retrieval statement written in a query language. While technically wrong,
the phrases ‘query language’ and ‘DML’ have been frequently utilized
interchangeably. The fundamental retrieval structures differentiate DMLs
(Atkinson & Buneman, 1987). DML may be divided into 2 categories: non-
procedural and procedural. The primary distinction between these 2 DMLs
has been that procedural languages indicate how a DML statement’s output
should be acquired, whereas non-procedural DMLs just identify what result
should be acquired. Procedural languages, on the other hand, work with
individual records, while non-procedural languages work with groups of
records (Chandra, 1981).
3.3.2.1. Procedural DMLs

The client, or more commonly the programmer, decides what data has been
required and how it will be obtained in a procedural DML. This implies
that the client should specify all data access actions by invoking the proper
processes to retrieve the information needed. Usually, a procedural DML
obtains a record, evaluates it, and then retrieves another record that will
be handled in the same way, etc. This retrieval procedure continues till all
the data sought by the retrieval is collected. Procedural DMLs are often
incorporated into a higher-level programming language that includes tools
for iteration and navigational logic. The majority of hierarchical and network
DMLs are procedural (Chen & Zaniolo 1999).
3.3.2.2. Non-Procedural DMLs

The needed data can be supplied in a single retrieve or update statement with
non-procedural DMLs. The client specifies that data has been wanted but
not how it will be collected in non-procedural DMLs. A DML expression is
translated into more or one procedure by the database management, which
alters the appropriate sets of records. This relieves the client of the need to
understand how data structures have been performed internally and what
techniques are necessary to access and maybe change data, giving users a
significant level of data independence. Declarative languages are a type

of non-procedural language. Non-procedural data manipulation languages,
such as Query-By-Example (QBE) or Structured Query Language (SQL),
are commonly included in relational DBMSs. Non-procedural DMLs are
typically easier to understand and use than procedural DMLs since the
consumer does less effort and the DBMS does more (Date, 1984).
3.3.3. Fourth-Generation Languages (4GLs)

A 4th-generation language is essentially a shorthand programming language;
there is no agreement upon what characterizes it. In a fourth-generation
language (4GL), an operation that needs hundreds of lines in a 3rd-generation
language (3GL), like COBOL, usually uses many fewer lines (Stemple et
al., 1992).
A 4GL is non-procedural in comparison to a 3GL, that is procedural: the
consumer determines what has to be completed, not how. A 4GL is planned
to rely heavily on 4th-generation tools, which are significantly high-level
elements. The consumer doesn’t specify the stages that a program must
take to complete a job; rather, the user specifies variables for the types of
equipment that utilize them to create an application program. 4GLs are
said to increase productivity by a factor of 10 while restricting the sorts of
problems that may be addressed. Languages of the 4th generation include
(Ceri et al., 1991):
• Report generators and query languages are examples of
presentation languages.
• Database and spreadsheets languages are examples of specialist
languages.
• Application generators that create applications by defining,
inserting, updating, and retrieving data from databases;
• Application code is generated using very higher-level languages.
4GLs include the previously stated QBE and SQL. Now we’ll
take a look at some of the various varieties of 4GL.
3.3.3.1. Forms Generators

A formats generator is a software tool that allows you to quickly create data
input and display layouts for screen forms. The formats generator enables
the client to specify how the screen should appear, what information must
be presented, and where it should be displayed on the screen. Colors for
screen components and other properties, like underline, reverse video, bold,
flashing, and so on, can be defined. The improved form generators enable
the construction of derived characteristics, such as those generated from
aggregates or arithmetic operators, as well as the definition of data input
validation checks (Tresch & Scholl, 1994).
3.3.3.2. Report Generators

A report generator is a program that enables you to produce reports using
data from a database. It functions in the same way as a query language in
that it enables the consumer to ask database queries and get data for a report.
However, in the case of a report generator, we have a lot more control over
the end outcome. To create our customized output reports, we can either
let the report generator select how the output should look or utilize explicit
report-generator command instructions (Piatetsky-Shapiro & Jakobson,
1987).
Aesthetically oriented and language-oriented report generators are the
two basic categories. In the first scenario, we use a sublanguage command
to specify which data should be included in the report and how it should be
formatted. In the 2nd scenario, we describe the same data using a capability
comparable to a forms generator (Romei & Turini, 2011).
3.3.3.3. Graphics Generators

A graphics generator is a tool for retrieving data from a database and
displaying it as a graph that displays trends and correlations. It usually lets
users make pie charts, line charts, scatter charts, and bar charts, other types
of graphs (Solovyev & Polyakov, 2013).
3.3.3.4. Application Generators

An application generator is a tool which allows you to create software that
connects to a database. The usage of an application generator may cut down
on the time it takes to create a software application from start to finish. Most
application generators are made out of pre-written modules that include the
basic operations which most programs require. These modules, which are
often written in a higher-level language, form a ‘library’ of functions from
which to pick. The client defines what the software should accomplish,
and the application generator chooses how the tasks should be completed
(Decker, 1998).
3.4. CONCEPTUAL MODELING AND DATA MODELS

A data description language has been used to write a schema, as we said
previously. In actuality, it’s written in the data description language of a
certain DBMS. However, this level of language is insufficient to define
an organization’s data needs in a way that has been intelligible to a wide
range of users. What we need is a data model, which is a high-level detail
of the schema. A model is a depiction of items and events in the actual
world, as well as their relationships (Mathiske et al., 1995; Overmyer et al.,
2001). It’s an abstraction that focuses on an organization’s basic, inherent
characteristics while ignoring its incidental characteristics. A data model is
a depiction of an organization. It must include fundamental notations and
concepts that would enable end-users and database designers to interact
with their understanding of organizational information unambiguously and
appropriately. A data model may be conceived of as having 3 components
(Wang & Zaniolo, 1999):
• a structural component composed of a collection of rules that
may be used to build databases;
• a manipulating portion that specifies the sorts of operations
that may be performed on the data (for example, operations for
retrieving or updating data from the database and modifying the
database’s structure); and
• specify a set of integrity restrictions that guarantee the data’s
accuracy.
A data model’s goal is to describe data and make it intelligible to others.
If it may accomplish this, it can simply be utilized to create a database. We
may identify 3 related data models that mirror the ANSI-SPARC architecture
outlined in Section 2.1 (Andries & Engels, 1994):
• an exterior data model, commonly referred to as the Universe
of Discourse (UoD), to reflect every user’s perspective on the
organization;
• a conceptual data model to express the logical view which is
independent of the DBMS;
• An internal data model is used to express the conceptual schema
in a fashion that the DBMS may understand.
The research literature has several data models that are proposed.
Record-based, object-based, and physical data models are the 3 primary
classifications that they are placed into. The very first two have been utilized
to explain the data at the external and conceptual levels, whereas the third
has been utilized to explain data at the internal level (Trinder, 1990).
3.4.1. Object-Based Data Models

Characteristics, entities, and associations are used in object-based data
models. An entity is a distinctive item (place, event, person, concept, thing)
that must be recorded in the database. An attribute is a characteristic that
defines a certain feature of the item to be recorded, while a relationship is a
link between things. Among the most prevalent forms of object-based data
models are (Sutton & Small, 1995):
• Semantic;
• Entity–Relationship;
• Object-Oriented; and
• Functional.
The Entity-Relationship (ER) model has evolved as among the most
important methodologies for database design and serves as the foundation
for the database design methodology employed in this chapter. The object-
oriented data model (OODM) broadened the concept of an entity that
includes not just characteristics that define the state of an object, but as well
as its related behavior, or actions. It is stated that the object encapsulates
both behavior and state (Chorafas, 1986).
3.4.2. Record-Based Data Models

In a record-based architecture, the database has been composed of a variety
of constant-format records, which may be of different sorts. Every type of
record specifies a given number of fields with a constant length. Network
data model, relational data model, and hierarchical data model have been the
3 main forms of record-based logical data models. The network data models
and hierarchy had been created over a decade even before the relational data
model, therefore its connections to conventional file processing principles
are stronger (Bédard & Larrivée, 2008).
3.4.2.1. Relational Data Model

The mathematical relationships notion underpins the relational data
paradigm. Data and relationships have been expressed in the relational model
as tables, each one containing several columns having a unique identifier.

Figure 3.4 shows a prototype relational schema for a section of the Dream
Home case study, including branch and personnel information (Wang &
Brooks, 2007). Employee John White, for instance, is a manager with a
salary of £30,000 who operates at the branch (branchNo) B005, which is
located at 22 Deer Rd in London, according to the first table. It’s crucial to
remember that staff and branch have a relationship: a branch office contains
employees. Furthermore, there has been no explicit connection between
the two tables; we may only create a relationship by understanding that the
property branchNo in the Staff relation is like the branchNo in the Branch
relationship (May et al., 1997).
It’s worth noting that the relational data model just demands that the
database be seen as tables by the client (Robinson, 2011).
Figure 3.4. This is an example of a relational schema.
Source: https://www.researchgate.net/figure/Source-relational-schema-LI-
BRARY-REL_fig2_264458232.
Figure 3.5. This is an example of a network schema.
Source: https://creately.com/blog/examples/network-diagram-templates-cre-
ately/.
Nonetheless, this view only refers to the database’s logical structure, for
example, the conceptual and external layers of the ANSI-SPARC design. It
doesn’t pertain to the database’s physical structure, which may be executed
utilizing several storage types (Roussopoulos & Karagiannis, 2009).
3.4.2.2. Network Data Model

Data has been depicted as the sets of records in the network model, whereas
connections have been portrayed using sets. In contrast to the relational
model, the sets clearly describe the connections between the entities, and
these sets become pointers when the execution is carried out (Montevechi et
al., 2010). The records are arranged like that of extended graph structures,
with the records taking the place of nodes (also known as segments) and
sets taking the place of edges. An example of a network schema is shown
in Figure 3.5 for the same data collection that was shown in Figure 3.4.
IDMS/ R from Computer Associates has been the most widely used DBMS
for networks (Badia, 2002).
3.4.2.3. Hierarchical Data Model

Hierarchical models are a kind of network model. Likewise, information is
depicted as the sets of records and associations as the sets. Furthermore, the
hierarchical model permits just one parent per node. Hierarchical models
may be represented as tree graphs with records as nodes (or segments) and
sets as edges. Figure 3.6 shows a hierarchical framework for Figure 3.4’s
data. IMS from IBM is the primary hierarchical DBMS, however, it also
supports non-hierarchical functionality (Calvanese et al., 2009).
The general structure of the database and a high-level detail of the
execution are specified using record-based (logical) data models. Their
fundamental disadvantage is that they lack proper capabilities for explicitly
providing data constraints, while object-based data models cannot give
logical structure but give greater semantic content by enabling the client
to define data limitations. The relational paradigm underpins the bulk of
current commercial systems, while early database systems relied on either
hierarchical or network data models (Ram & Liu, 2006).
Figure 3.6. This is an example of a hierarchical schema.
Source: https://www.geeksforgeeks.org/difference-between-hierarchical-and-
relational-data-model/.
The latter 2 methods need the client to be familiar with the actual database
being viewed, while the former offers significant data independence. As a
result, whereas relational systems use a declarative method for database
operations (for example, they describe what data should be obtained),
hierarchical, and network systems use a navigational method (In other
words, they indicate how the data should be obtained) (Borgida, 1986).
3.4.3. Physical Data Models

The data that has been represented by physical data models includes record
orderings, record structures, and access pathways. These models are used
to explain how data has been stored in the computer. The number of logical
data models is far higher than the number of physical data models, the most
prevalent of which are the frame memory and the unifying model (Boehnlein
& Ulbrich-vom Ende, 1999).
3.4.4. Conceptual Modeling

Examining the 3-level design reveals that the ‘heart’ of the database is the
conceptual schema. It provides support for all outside views and has been
assisted by the internal schema. Nevertheless, the internal schema is only
the actual realization of the mental schema. The conceptual schema must be
an exhaustive and precise depiction of the enterprise’s data requirements.
If that’s not the case, certain business data may be missing or wrongly
displayed, and we’ll have problems executing one or more outside views in
their entirety (Jarke & Quix, 2017).
The procedure of creating a model of the data used in an organization
which is independent of execution details, like the target DBMS, programming
languages, application programs, or any other physical considerations, has
been referred to as conceptual modeling. Another name for this procedure
has been conceptual database design (Ceri et al., 2002). That paradigm has
been termed a conceptual data model. In the academic literature, conceptual
models can also be found under the name logical models. On the other hand,
we differentiate between logical and conceptual data models throughout this
work. In contrast to the logical model, which presupposes prior knowledge
of the target DBMS’s fundamental data model, the conceptual model has
been unaffected by any execution specifics (Delcambre et al., 2018).
3.5. FUNCTIONS OF A DBMS

In this part, we will examine the many kinds of functions and services that
we would anticipate being provided by a DBMS. Codd (1982) provides a
list of eight functions that every full-scale DBMS must be able to perform,
and we have added two more that can be anticipated to be accessible if one
is to have reasonable expectations (Jaedicke & Mitschang, 1998).
• Data Storage, Update, Retrieval: A DBMS should allow users
to save, access, and change data in the database. It is the most
basic function of a DBMS, and it goes without saying that to

provide this feature, the DBMS must hide the internal physical
details of the implementation (like storage structures and the
organization of files) from the client (He et al., 2005).
• A Client-Accessible List: A DBMS should have a list that
contains the detail of data objects and is available to consumers.
An essential aspect of the ANSI-SPARC architecture is the
determination of an integrated system catalog that will serve as
a repository for data regarding schemas, consumers, programs,
and other such things. Both customers and the DBMS ought
to have access to the catalog (Stonebraker et al., 1993). Often
referred to as a data dictionary, a system catalog is the repository
of information within a database that represents the data; in other
words, it is the “information about the data” or the metadata. The
DBMS has an impact not only on the total amount of data that is
stored but also on how that data has been utilized. In most cases,
the system catalog contains the following information (Dogac et
al., 1994):
• Types, sizes, and names of data elements;
• Names of associations;
• Integrity restrictions on the data;
• Names of authenticated persons who have access to the
information;
• Data elements that every client may access and the sorts of access
permitted, such as delete, update, insert or read access;
• Internal, conceptual, and external schemas as well as schema
mappings;
• Use statistics, like the frequency of database transactions and
the number of database object accesses. The DBMS catalog has
been one of the system’s main elements. Several of the software
elements which we will explain in the next part obtain their data
from the system catalog. Certain advantages of a system catalog
include (Arens et al., 2005):
• Data-related data can be gathered and kept securely. This
facilitates the maintenance of control over the information as a
resource;
• The meaning of information may be specified, allowing other

consumers to comprehend the data’s purpose;
• Communication has been facilitated by the storage of precise
meanings. Additionally, the system catalog can indicate the
consumer or consumers that own or have access to the information;
• It is easier to identify redundancies and discrepancies when data
is consolidated (Zalipynis, 2020);
• Database modifications may be logged;
• Because the system catalog captures every piece of data, all its
connections, and all of its consumers, the effect of an alteration
may be evaluated before it has been executed;
• Security may be implemented;
• Integrity is attainable; and
• Audit data is provided.
Certain writers distinguish between a system catalog and a data directory,
with a data directory including data about how and where data has been
kept. The ISO has approved the Information Resource Dictionary System
(IRDS) as a standard for information dictionaries (ISO, 1990, 1993). IRDS
is a software solution that may be utilized to manage and document the
data resources of a company. It defines the tables that make up the data
dictionary, as well as the actions that may be utilized to retrieve them. In this
book, we referred to all repository data as a “system catalog.” Other sorts
of statistical data contained in the system catalog are discussed to aid query
efficiency (Ng & Muntz, 1999).
• Transaction Support: A technique should be included in a
system for managing databases that can guarantee that either
all of the related changes with a specific transaction have been
executed or none of them has been performed at all. A collection
of operations carried out by a single user or piece of application
software that either accesses or updates the contents of a database
is referred to as a transaction. In the context of the Dream Home
case study, examples of straightforward transactions would be
adding a new member of staff to the database, revising the salary
of an existing member of staff, or removing a property from the
register (Carey et al., 1988).
Figure 3.7. The lost update.
Source: https://www.educba.com/hierarchical-database-model/.
The deletion of a member of staff from the database and the transfer
of the responsibilities that were previously held by that individual to
another member of staff is an example of a more complex use case. Several
database variations need to be done in this situation. If the transaction fails
to complete, maybe due to a computer crash, the database would be in an
unstable state, with a few changes performed and others not. As a result, the
modifications made would have to be reversed to restore consistency to the
database (Nidzwetzki & Güting, 2015).
• Concurrency Control Services: When several consumers are
changing the database at the same time, the DBMS should provide
a means to verify that the database has been updated appropriately.
Using a DBMS to achieve many users simultaneously seeing
shared data is one of the primary aims that should be pursued
(Chakravarthy, 1989). When all consumers are doing nothing
more than reading the material, concurrent access is a very
straightforward process because there has been no way for them
to communicate with one another. The interruption can take place
when two or many customers access the database at the same
time, and at the very least, one of them is making changes to the
data. This can lead to inconsistencies in the information. Consider
2 transactions T1 and T2, which have been running at the same
time, as shown in Figure 3.7 (Khuan et al., 2008).
T1 withdraws £10 from an account (having a positive balance) and
T2 deposits £100 into a similar account. If such transactions had been
conducted sequentially, without any interleaving, the total balance will be
£190 irrespective of which had been executed initially. Thus, in this case,
transactions T1 and T2 begin almost simultaneously and both display a
balance of £100. T2 then raises balx by £100 to £200 and saves the database
modification. In the meantime, transaction T1 decrements its copy of balx
by £10 to £90 and puts this value in the database, thus “losing” £100. The
DBMS should assure that interference may not happen when numerous
consumers access the database simultaneously (Linnemann et al., 1988).
• Recovery Services: A DBMS should include a technique for
restoring the database if it becomes corrupted. When addressing
transaction support, we emphasized that the database must be
restored to a stable state when a transaction fails. It can be the
consequence of a media failure, a system crash, a software or
hardware issue forcing the DBMS to stop, or the consumer can
have detected an error and aborted the transaction before its
completion. In each of these circumstances, the DBMS should
provide a method for recovering the database to a stable state
(Zlatanova, 2006).
• Authorization Services: A DBMS should provide a means to
restrict database access to just authenticated persons. This is
not hard to imagine situations in which we will wish to restrict
access to part of the information stored. For instance, we can
require only branch managers to view salary-related data to
the employee and restrict access to this information to all other
consumers. Furthermore, we can wish to prevent illegal access
to the database. The phrase security relates to the safeguarding
of a database from unintended or inadvertent unwanted access.
We antedate that the DBMS would include tools to assure data
security (Isaac & Harikumar, 2016).
• Support for the Communication of Data: A DBMS should be
compatible with the software application. Many database clients
connect via workstations. Occasionally, such workstations have
been directly linked to the machine containing the DBMS. In
other instances, distant workstations connect with the machine
hosting the DBMS over a network. In both scenarios, the DBMS
receives requests as communications messages and reacts in
the same manner (Härder et al., 1987). A Data Communication
Manager (DCM) oversees all of these transfers. Even though
the DCM is not a component of the DBMS, the DBMS must
be able to integrate with many DCMs for the system to be
economically successful. Even DBMSs for personal computers

must be capable of functioning over a local area network since
a centralized database may be created for consumers to share, as
opposed to having a collection of fragmented databases, one per
user. It doesn’t indicate that the database must be dispersed over
the network; instead, consumers must be able to remotely access
a centralized database (Burns et al., 1986).
• Integrity Services: A DBMS should offer a way to ensure
that the information contained in the database as well as any
modifications made to the data adhere to specific criteria. The
integrity of databases refers to the correctness and dependability
of the data that has been stored and is an additional type of
database protection. Integrity, which has traditionally been tied to
security, also has larger implications, as it is primarily concerned
with the quality of the data. It is common practice to use the term
“restrictions” to describe database integrity (Lohman et al., 1991).
Restrictions are defined as rules of consistency that the database
cannot violate under any circumstances. For instance, we may
desire to impose a cap on the number of properties that any one
member of staff is permitted to handle in any given location at
any given time. In this case, we’d want the DBMS to ensure that
this limitation isn’t exceeded when we allocate a property to a
member of staff and to prohibit the assignment from taking place
if it is. A DBMS might additionally fairly be expected to deliver
the following 2 functions in addition to such eight (Jovanov et
al., 1992).
• Services that Encourage Data Independence: A DBMS needs
to allow for program independence from the structure of the
database. Establishing data independence often requires the
utilization of a subschema or view approach. The independence
of the physical data is easy to obtain: there are typically many
different kinds of adjustments that may be made to the physical
features of the database without affecting the views (Joo et al.,
2006). On the other hand, achieving completely logical data
independence can be quite challenging. In most cases, it is
possible to incorporate a new object, attribute, or connection, but

that’s not possible to eliminate them. In some of these systems,
it is forbidden to make any changes to the logical structure by
modifying any of the elements that are already there (Ge &
Zdonik, 2007).
• Utility services: A DBMS must have a set of utility services.
Utility applications assist the DBA in properly administering the
database. Certain utilities operate on an external level and, as a
result, may be created by the DBA. Other utilities are internal
and may only be given by the DBMS provider. The following are
some instances of the latter type of utility:
• Exports and imports tools are available to load the database from
flat files and to unload the database to flat files;
• Capabilities for monitoring database utilization and operation;
• To assess the efficiency or use of statistics, statistical analysis
applications are used;
• Index rearrangement facilities, to rearrange indexes and their
overflows; and
• The process of reallocating garbage and collecting it requires
physically removing expired data from storage devices,
aggregating the space that is freed up as a result, and reallocating
it to the area where it has been requested (Gravano et al., 2001).
3.6. COMPONENTS OF A DBMS

DBMSs are extremely complicated and sophisticated software applications
that try to deliver the functions mentioned above. The constituent structure
of a DBMS cannot be generalized since it differs widely from system to
system. When attempting to comprehend database systems, although, it is
beneficial to try to visualize the elements and their relations. In this part,
we’ll look at a hypothetical DBMS architecture (Chakravarthy, 1989).
A DBMS has been divided into multiple software elements, that each
perform a specialized function. As earlier mentioned, the underlying OS
supports a few of the DBMSs capabilities. The OS, on the other hand, just
offers fundamental functions, and the DBMS must be developed on top
of that. As a result, the interaction between the DBMS and the operating
system should be considered while designing a DBMS. Figure 3.8 depicts
the primary software elements in a DBMS setup. This graphic depicts how
the DBMS interacts with other software elements including consumer
queries and access methods (Methods for managing files to store and
retrieve data records). We’ll go through how to organize your files and how
to access them. Weiderhold (1983), Fry, and Teorey (1982), Weiderhold
(1983), Ullman (1988), and Barnes and Smith (1987) recommended a more
extensive approach (Snodgrass, 1992).
Figure 3.8. Main elements of a DBMS.
Source: https://www.techtarget.com/searchdatamanagement/definition/data-
base-management-system.
Figure 3.9. Elements of a database manager.
Source: https://corporatefinanceinstitute.com/resources/knowledge/data-anal-
ysis/database/.
The following elements are depicted in Figure 3.8 (Hellerstein et al.,
2007):
• Query processor: this is a significant component of the DBMS
that converts queries into a set of lower-level commands that are
then sent to the database manager (DM).
• Database Manager (DM): The DM interacts with application
programs and questions sent by users. The DM takes queries
and looks at the external and conceptual schemas to see what
conceptual records are needed to fulfill the request. The DM then
contacts the file manager to complete the task. Figure 3.9 depicts
the elements of the DM (Härder et al., 1987).
• File manager: The file manager handles the allotment of storage

space on a disk and distorts the underlying storage files. It builds
and maintains the internal schema’s set of indexes and structures.
If hashed files are utilized, record addresses are generated using
the hashing functions. The file manager, on the other hand, does
not control the physical entry and extraction of information.
Instead, it forwards requests to the proper access techniques,
which either read from or write to the system buffer (or cache)
(Arulraj & Pavlo, 2017).
• DML preprocessor: This module translates DML statements
in an application program into regular host language function
calls. To build the proper code, the DML preprocessor should
communicate with the query processor.
• DDL compiler: The DDL compiler transforms DDL statements
into a collection of metadata-containing tables. The system catalog
is then used to keep such tables, whereas control information has
been saved in the file headers of data files (Zhang et al., 2018).
• Catalog manager: The catalog manager handles system catalog
access and maintenance. The system catalog has been utilized by
the majority of DBMS elements. The following are the primary
software elements of the DM:
• Authorization control: That module verifies that the consumer
has the requisite permissions to perform the requested operation
(Delis & Roussopoulos, 1993).
• Command processor: Control is handed to the command
processor once the system has verified that the consumer has the
authorization to perform the task.
• Integrity checker: The integrity checker verifies that the
proposed operation fulfills all essential integrity limitations when
it modifies the database (like key limitations).
• Query optimizer: This module selects the best query
implementation technique (Batory & Thomas, 1997).
• Transaction manager: This module is accountable for carrying
out the necessary processing of operations it has been given by
transactions.
• Scheduler: This module is in charge of making sure that
concurrent database activities don’t interfere with one another. It
determines the sequence in which transaction activities have been

carried out (Aref & Samet, 1991).
• Recovery manager: In the event of a failure, this module
guarantees that the database remains constant. It’s in charge of
committing and aborting transactions.
• Buffer manager: This module is in charge of data transport
among secondary storage and main memory devices like disks and
tapes. The data manager, which includes the buffer manager and
the recovery manager, is referred to as a single entity. The cache
manager is another name for the buffer manager (Rochwerger et
al., 2009).
REFERENCES
1. Aier, S., & Winter, R., (2009). Virtual decoupling for IT/business
alignment–conceptual foundations, architecture design and
implementation example. Business & Information Systems Engineering,
1(2), 150–163.
2. Andries, M., & Engels, G., (1994). Syntax and semantics of hybrid
database languages. In: Graph Transformations in Computer Science
3. Aref, W. G., & Samet, H., (1991). Extending a DBMS with spatial
operations. In: Symposium on Spatial Databases (Vol. 1, pp. 297–318).
4. Arens, C., Stoter, J., & Van, O. P., (2005). Modeling 3D spatial objects
in a geo-DBMS using a 3D primitive. Computers & Geosciences,
31(2), 165–177.
5. Arulraj, J., & Pavlo, A., (2017). How to build a non-volatile memory
database management system. In: Proceedings of the 2017 ACM
International Conference on Management of Data (Vol. 1, pp. 1753–
1758).
6. Atkinson, M. P., & Buneman, O. P., (1987). Types and persistence in
database programming languages. ACM Computing Surveys (CSUR),
19(2), 105–170.
7. Bach, M., & Werner, A., (2014). Standardization of NoSQL database
languages. In: International Conference: Beyond Databases,
Architectures and Structures (Vol. 1, pp. 50–60). Springer, Cham.
8. Badia, A., (2002). Conceptual modeling for semistructured data. In:
Web Information Systems Engineering Workshops, International
Conference (Vol. 1, pp. 160–170). IEEE Computer Society.
9. Batory, D., & Thomas, J., (1997). P2: A lightweight DBMS generator.
Journal of Intelligent Information Systems, 9(2), 107–123.
10. Bédard, Y., & Larrivée, S., (2008). Spatial database modeling with
pictrogrammic languages. Encyclopedia of GIS, 1, 716–725.
11. Bidoit, N., (1991). Negation in rule-based database languages: A
survey. Theoretical Computer Science, 78(1), 3–83.
12. Biller, H., (1982). On the architecture of a system integrating data base
management and information retrieval. In: International Conference
on Research and Development in Information Retrieval (Vol. 1, pp.
80–97). Springer, Berlin, Heidelberg.
13. Boehnlein, M., & Ulbrich-Vom, E. A., (1999). Deriving initial

data warehouse structures from the conceptual data models of the
underlying operational information systems. In: Proceedings of the 2nd
ACM International Workshop on Data Warehousing and OLAP (Vol.
1, pp. 15–21).
14. Borgida, A., (1986). Conceptual modeling of information systems. On
Knowledge Base Management Systems, 1, 461–469.
15. Burns, T., Fong, E., Jefferson, D., Knox, R., Mark, L., Reedy, C., &
Truszkowski, W., (1986). Reference model for DBMS standardization.
SIGMOD Record, 15(1), 19–58.
16. Calvanese, D., Giacomo, G. D., Lembo, D., Lenzerini, M., & Rosati,
R., (2009). Conceptual modeling for data integration. In: Conceptual
Modeling: Foundations and Applications (Vol. 1, pp. 173–197).
17. Carey, M. J., DeWitt, D. J., Graefe, G., Haight, D. M., Richardson, J.
E., Schuh, D. T., & Vandenberg, S. L., (1988). The EXODUS Extensible
DBMS Project: An Overview, 1, 3–9.
18. Ceri, S., Fraternali, P., & Matera, M., (2002). Conceptual modeling
of data-intensive web applications. IEEE Internet Computing, 6(4),
20–30.
19. Ceri, S., Tanca, L., & Zicari, R., (1991). Supporting interoperability
between new database languages. In: [1991] Proceedings, Advanced
Computer Technology, Reliable Systems and Applications (Vol. 1, pp.
273–281). IEEE.
20. Chakravarthy, S., (1989). Rule management and evaluation: An active
DBMS perspective. ACM SIGMOD Record, 18(3), 20–28.
21. Chandra, A. K., (1981). Programming primitives for database languages.
In: Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages (Vol. 1, pp. 50–62).
22. Chen, C. X., & Zaniolo, C., (1999). Universal temporal extensions for
database languages. In: Proceedings 15th International Conference on
Data Engineering (Cat. No. 99CB36337) (Vol. 1, pp. 428–437). IEEE.
23. Chen, W., Kifer, M., & Warren, D. S., (1990). Hilog as a platform for
database languages. In: Proceedings of the 2nd. International Workshop
on Database Programming Languages (Vol. 1, pp. 315–329).
24. Chorafas, D. N., (1986). Fourth and Fifth Generation Programming
Languages: Integrated Software, Database Languages, and Expert
Systems (Vol. 1, pp. 5–9). McGraw-Hill, Inc.

25. CLASS, A., (2001). To illustrate how this language can be used, Figure
6 shows the definition of the example. In: OOIS 2000: 6th International
Conference on Object Oriented Information Systems (Vol. 1, p. 11).
London, UK: Proceedings. Springer Science & Business Media.
26. Cooper, R., (1995). Configuring database query languages. In:
Interfaces to Database Systems (IDS94) (Vol. 1, pp. 3–21). Springer,
London.
27. Date, C. J., (1984). Some principles of good language design: With
especial reference to the design of database languages. ACM SIGMOD
Record, 14(3), 1–7.
28. De Brock, B., (2018). Towards a theory about continuous requirements
engineering for information systems. In: REFSQ Workshops (Vol. 1,
pp. 3–7).
29. Decker, S., (1998). On domain-specific declarative knowledge
representation and database languages. In: KRDB-98—Proceedings of
the 5th Workshop Knowledge Representation Meets Data Bases (Vol. 1,
pp. 4–9). Seattle, WA.
30. Delcambre, L. M., Liddle, S. W., Pastor, O., & Storey, V. C., (2018).
A reference framework for conceptual modeling. In: International
Conference on Conceptual Modeling (Vol. 1, pp. 27–42). Springer,
Cham.
31. Delis, A., & Roussopoulos, N., (1993). Performance comparison of
three modern DBMS architectures. IEEE Transactions on Software
Engineering, 19(2), 120–138.
32. Dogac, A., Ozkan, C., Arpinar, B., Okay, T., & Evrendilek, C., (1994).
METU object-oriented DBMS. In: Advances in Object-Oriented
Database Systems (Vol. 1, pp. 327–354). Springer, Berlin, Heidelberg.
33. Fischer, M., Link, M., Ortner, E., & Zeise, N., (2010). Servicebase
management systems: A three-schema-architecture for service-
management. INFORMATIK 2010: Service Science–Neue Perspektiven
für die Informatik. Band 1 (Vol. 1, pp. 3–9).
34. Flender, C., (2010). A quantum interpretation of the view-update
problem. In: Proceedings of the Twenty-First Australasian Conference
on Database Technologies (Vol. 104, pp. 67–74).
35. Ge, T., & Zdonik, S., (2007). Fast, secure encryption for indexing in a
column-oriented DBMS. In: 2007 IEEE 23rd International Conference
on Data Engineering (Vol. 1, pp. 676–685). IEEE.

36. Gravano, L., Ipeirotis, P. G., Jagadish, H. V., Koudas, N., Muthukrishnan,
S., Pietarinen, L., & Srivastava, D., (2001). Using q-grams in a DBMS
for approximate string processing. IEEE Data Eng. Bull., 24(4), 28–34.
37. Grefen, P. W. P. J., Ludwig, H., & Angelov, S., (2002). A Framework
for E-Services: A Three-Level Approach Towards Process and Data
Management (Vol. 1, pp. 02–07). Centre for Telematics and Information
Technology, University of Twente.
38. Grefen, P., & Angelov, S., (2001). A three-level process framework
for contract-based dynamic service outsourcing. In: Proceedings of the
2nd International Colloquium on Petri Net Technologies for Modelling
Communication Based Systems (Vol. 1, pp. 123–128). Berlin, Germany.
39. Grefen, P., Ludwig, H., & Angelov, S., (2003). A three-level framework
for process and data management of complex e-services. International
Journal of Cooperative Information Systems, 12(04), 487–531.
40. Habela, P., Stencel, K., & Subieta, K., (2006). Three-level object-
oriented database architecture based on virtual updateable views. In:
International Conference on Advances in Information Systems (Vol. 1,
41. Härder, T., Meyer-Wegener, K., Mitschang, B., & Sikeler, A., (1987).
PRIMA: A DBMS Prototype Supporting Engineering Applications, 1,
2–9.
42. He, Z., Lee, B. S., & Snapp, R., (2005). Self-tuning cost modeling
of user-defined functions in an object-relational DBMS. ACM
Transactions on Database Systems (TODS), 30(3), 812–853.
43. Hellerstein, J. M., Stonebraker, M., & Hamilton, J., (2007). Architecture
of a database system. Foundations and Trends® in Databases, 1(2),
141–259.
44. Isaac, J., & Harikumar, S., (2016). Logistic regression within DBMS.
In: 2016 2nd International Conference on Contemporary Computing
and Informatics (IC3I) (Vol. 1, pp. 661–666). IEEE.
45. Jaedicke, M., & Mitschang, B., (1998). On parallel processing
of aggregate and scalar functions in object-relational DBMS. In:
46. Jarke, M., & Quix, C., (2017). On warehouses, lakes, and spaces:
The changing role of conceptual modeling for data integration. In:
Conceptual Modeling Perspectives (Vol. 1, pp. 231–245). Springer,

Cham.
47. Joo, Y. J., Kim, J. Y., Lee, Y. I., Moon, K. K., & Park, S. H., (2006).
Design and implementation of map databases for telematics and car
navigation systems using an embedded DBMS. Spatial Information
Research, 14(4), 379–389.
48. Jovanov, E., Starcevic, D., Aleksic, T., & Stojkov, Z., (1992). Hardware
implementation of some DBMS functions using SPR. In: Proceedings
of the Twenty-Fifth Hawaii International Conference on System
Sciences (Vol. 1, pp. 328–337). IEEE.
49. Kanellakis, P., (1995). Constraint programming and database languages:
A tutorial. In: Proceedings of the Fourteenth ACM SIGACT-SIGMOD-
SIGART Symposium on Principles of Database Systems (Vol. 1, pp.
46–53).
50. Kemp, G. J., Angelopoulos, N., & Gray, P. M., (2000). A schema-
based approach to building a bioinformatics database federation. In:
Proceedings IEEE International Symposium on Bio-Informatics and
Biomedical Engineering (Vol. 1, pp. 13–20). IEEE.
51. Kemp, G. J., Angelopoulos, N., & Gray, P. M., (2002). Architecture of
a mediator for a bioinformatics database federation. IEEE Transactions
on Information Technology in Biomedicine, 6(2), 116–122.
52. Khuan, C. T., Abdul-Rahman, A., & Zlatanova, S., (2008). 3D solids
and their management in DBMS. In: Advances in 3D Geoinformation
Systems (Vol. 1, pp. 279–311). Springer, Berlin, Heidelberg.
53. Koonce, D. A., (1995). Information model level integration for CIM
systems: A unified database approach to concurrent engineering.
Computers & Industrial Engineering, 29(1–4), 647–651.
54. Li, Q., & Wang, L., (2007). Devising a semantic model for multimedia
databases: Rationale, facilities, and applications. In: Third International
Conference on Semantics, Knowledge and Grid (SKG 2007) (Vol. 1,
pp. 14–19). IEEE.
55. Li, Q., Li, N., Wang, L., & Sun, X., (2009). A new semantic model
with applications in a multimedia database system. Concurrency and
Computation: Practice and Experience, 21(5), 691–704.
56. Ligêza, A., (2006). Principles of verification of rule-based systems.
Logical Foundations for Rule-Based Systems, 1, 191–198.
57. Linnemann, V., Küspert, K., Dadam, P., Pistor, P., Erbe, R., Kemper, A.,
& Wallrath, M., (1988). Design and implementation of an extensible

database management system supporting user defined data types and
functions. In: VLDB (Vol. 1, pp. 294–305).
58. Liu, M., (1999). Deductive database languages: Problems and solutions.
ACM Computing Surveys (CSUR), 31(1), 27–62.
59. Lohman, G. M., Lindsay, B., Pirahesh, H., & Schiefer, K. B.,
(1991). Extensions to starburst: Objects, types, functions, and rules.
60. Mathiske, B., Matthes, F., & Schmidt, J. W., (1995). Scaling database
languages to higher-order distributed programming. In: Proceedings
of the Fifth International Workshop on Database Programming
Languages (Vol. 1, pp. 1–12).
61. May, W., Ludäscher, B., & Lausen, G., (1997). Well-founded semantics
for deductive object-oriented database languages. In: International
Conference on Deductive and Object-Oriented Databases (pp. 320–
336). Springer, Berlin, Heidelberg.
62. Montevechi, J. A. B., Leal, F., De Pinho, A. F., Costa, R. F., De Oliveira,
M. L. M., & Silva, A. L. F., (2010). Conceptual modeling in simulation
projects by mean adapted IDEF: An application in a Brazilian tech
company. In: Proceedings of the 2010 Winter Simulation Conference
(Vol. 1, pp. 1624–1635). IEEE.
63. Neuhold, E. J., & Olnhoff, T., (1981). Building data base management
systems through formal specification. In: International Colloquium on
the Formalization of Programming Concepts (Vol. 1, pp. 169–209).
64. Ng, K. W., & Muntz, R. R., (1999). Parallelizing user-defined functions
in distributed object-relational DBMS. In: Proceedings. IDEAS’99:
International Database Engineering and Applications Symposium
(Cat. No. PR00265) (Vol. 1, pp. 442–450). IEEE.
65. Nidzwetzki, J. K., & Güting, R. H., (2015). Distributed SECONDO:
A highly available and scalable system for spatial data processing. In:
International Symposium on Spatial and Temporal Databases (Vol. 1,
pp. 491–496). Springer, Cham.
66. Okayama, T., Tamura, T., Gojobori, T., Tateno, Y., Ikeo, K., Miyazaki,
S., & Sugawara, H., (1998). Formal design and implementation of an
improved DDBJ DNA database with a new schema and object-oriented
library. Bioinformatics (Oxford, England), 14(6), 472–478.
67. Overmyer, S. P., Benoit, L., & Owen, R., (2001). Conceptual modeling
through linguistic analysis using LIDA. In: Proceedings of the 23rd
International Conference on Software Engineering: ICSE 2001 (Vol.
1, pp. 401–410). IEEE.
68. Philip, S. Y., Heiss, H. U., Lee, S., & Chen, M. S., (1992). On workload
characterization of relational database environments. IEEE Trans.
Software Eng., 18(4), 347–355.
69. Piatetsky-Shapiro, G., & Jakobson, G., (1987). An intermediate
database language and its rule-based transformation to different
database languages. Data & Knowledge Engineering, 2(1), 1–29.
70. Pieterse, H., & Olivier, M., (2012). Data hiding techniques for database
environments. In: IFIP International Conference on Digital Forensics
71. Ram, S., & Liu, J., (2006). Understanding the semantics of data
provenance to support active conceptual modeling. In: International
Workshop on Active Conceptual Modeling of Learning (Vol. 1, pp. 17–
72. Ramamohanarao, K., & Harland, J., (1994). An introduction to
deductive database languages and systems. VLDB J., 3(2), 107–122.
73. Robinson, S., (2011). Choosing the right model: Conceptual modeling
for simulation. In: Proceedings of the 2011 Winter Simulation
Conference (WSC) (Vol. 1, pp. 1423–1435). IEEE.
74. Rochwerger, B., Breitgand, D., Levy, E., Galis, A., Nagin, K., Llorente,
I. M., & Galán, F., (2009). The reservoir model and architecture
for open federated cloud computing. IBM Journal of Research and
Development, 53(4), 4–1.
75. Romei, A., & Turini, F., (2011). Inductive database languages:
Requirements and examples. Knowledge and Information Systems,
26(3), 351–384.
76. Rossiter, B. N., & Heather, M. A., (2005). Conditions for interoperability.
In: ICEIS (1) (Vol. 1, pp. 92–99).
77. Roussopoulos, N., & Karagiannis, D., (2009). Conceptual modeling:
Past, present and the continuum of the future. In: Conceptual Modeling:
Foundations and Applications (Vol. 1, pp. 139–152). Springer, Berlin,
Heidelberg.
78. Samos, J., Saltor, F., & Sistac, J., (1998). Definition of derived classes
in OODBs. In: Proceedings. IDEAS’98. International Database
Engineering and Applications Symposium (Cat. No. 98EX156) (Vol. 1,

pp. 150–158). IEEE.
79. Samos, J., Saltor, F., Sistac, J., & Bardes, A., (1998). Database
architecture for data warehousing: An evolutionary approach. In:
International Conference on Database and Expert Systems Applications
80. Shahzad, M. K., (2007). Version-manager: For vital simulating
advantages in data warehouse. Computing and Information Systems,
11(2), 35.
81. Snodgrass, R. T., (1992). Temporal databases. In: Theories and
Methods of Spatio-Temporal Reasoning in Geographic Space (Vol. 1,
82. Solovyev, V. D., & Polyakov, V. N., (2013). Database “languages of the
world” and its application. State of the art. Computational Linguistics,
1, 3–9).
83. Stemple, D., Sheard, T., & Fegaras, L., (1992). Linguistic reflection:
A bridge from programming to database languages. In: Proceedings of
the Twenty-Fifth Hawaii International Conference on System Sciences
(Vol. 2, pp. 844–855). IEEE.
84. Stonebraker, M., Agrawal, R., Dayal, U., Neuhold, E. J., & Reuter, A.,
(1993). DBMS research at a crossroads: The Vienna update. In: VLDB
(Vol. 93, pp. 688–692).
85. Sutton, D., & Small, C., (1995). Extending functional database
languages to update completeness. In: British National Conference on
Databases (Vol. 1, pp. 47–63). Springer, Berlin, Heidelberg.
86. Sy, O., Duarte, D., & Dal, B. G., (2018). Ontologies and relational
databases meta-schema design: A three-level unified lifecycle. In: 2018
5th International Conference on Control, Decision and Information
Technologies (CoDIT) (Vol. 1, pp. 518–523). IEEE.
87. Tresch, M., & Scholl, M. H., (1994). A classification of multi-database
languages. In: Proceedings of 3rd International Conference on Parallel
and Distributed Information Systems (Vol. 1, pp. 195–202). IEEE.
88. Trinder, P., (1990). Referentially transparent database languages. In:
Functional Programming (Vol. 1, pp. 142–156). Springer, London.
89. Van, B. P., Kovacs, G., & Micsik, A., (1994). Transformation of
database populations and operations from the conceptual to the internal
level. Information Systems, 19(2), 175–191.
90. Vossen, G., (1991). Data Models, Database Languages and Database
Management Systems (Vol. 1, pp. 3–5). Addison-Wesley Longman
Publishing Co., Inc.
91. Wang, H., & Zaniolo, C., (1999). User-defined aggregates in database
languages. In: International Symposium on Database Programming
Languages (Vol. 1, pp. 43–60). Springer, Berlin, Heidelberg.
92. Wang, W., & Brooks, R. J., (2007). Empirical investigations of
conceptual modeling and the modeling process. In: 2007 Winter
Simulation Conference (Vol. 1, pp. 762–770). IEEE.
93. Yannakoudakis, E. J., Tsionos, C. X., & Kapetis, C. A., (1999). A new
framework for dynamically evolving database environments. Journal
of Documentation, 1, 3–9.
94. Zalipynis, R. A. R., (2020). BitFun: Fast answers to queries with
tunable functions in geospatial array DBMS. Proceedings of the VLDB
Endowment, 13(12), 2909–2912.
95. Zdonik, S. B., & Wegner, P., (1988). Language and methodology for
object-oriented database environments. In: Data Types and Persistence
96. Zhang, B., Van, A. D., Wang, J., Dai, T., Jiang, S., Lao, J., & Gordon,
G. J., (2018). A demonstration of the otterTune automatic database
management system tuning service. Proceedings of the VLDB
Endowment, 11(12), 1910–1913.
97. Zlatanova, S., (2006). 3D geometries in spatial DBMS. In: Innovations
in 3D Geo Information Systems (Vol. 1, pp. 1–14). Springer, Berlin,
Heidelberg.
CHAPTER 4
THE RELATIONAL MODEL
CONTENTS
4.1. Introduction..................................................................................... 112
4.2. Brief History of the Relational Model............................................... 112
4.3. Terminology..................................................................................... 114
4.4. Integrity Constraints......................................................................... 125
4.5. Views............................................................................................... 128
References.............................................................................................. 132
4.1. INTRODUCTION
It is estimated that annual prices for new licenses for the Relational Database
Management System (RDBMS) range from $6 billion to $10 billion U.S.
dollars (or $25 billion U.S. dollars when product purchases would be
included). RDBMS becoming the preeminent data-processing technology
being used today. This piece of software is similar to the second phase of
database management systems (DBMSs), and it is built just on a relational
schema that E. F. Codd suggested (1970). The relational model organizes
all data in terms of its logical structure and places it inside relations (tables)
(Schek & Scholl, 1986). Every connection does have a name, and the data
that constitute it are organized into columns with the same name. Every row
(or tuple) has exactly one value for each property. The structural model’s
straightforward logical organization is one of its many compelling advantages.
However, underneath this seemingly straightforward architecture lies a
robust theoretical basis—something that the first iteration of DBMSs was
severely missing (the system and ranked DBMSs) (Tyler & Lind, 1992).
4.2. BRIEF HISTORY OF THE RELATIONAL MODEL

E. F. Codd developed the relational database model with his key article
‘A relational model for information for massive shared information
banks’ (Codd, 1970). Although a set-oriented approach has been offered
earlier, this study is now widely regarded as a breakthrough in database
management systems (DBMSs) (Childs, 1968). The following are the aims
of the relational data model (Atzeni et al., 2013):
• Allowing for a high level of data integrity. Updates internal
information representations, notably changes to file organizations,
record operands, or access pathways, must not influence
application programs.
• To provide a solid foundation for managing data interpretation,
and integrity, including duplication issues. Codd’s study,
in particular, developed the notion of normalized relations,
essentially relations with no recurring groupings.
• To make it possible for set-oriented data transformation languages
to grow.
Even though there was widespread interest in the relational data model,
the most important research could be ascribed to three groups with quite
distinct views. The experimental relational DBMS Systems R, created in
The Relational Model 113
the late 1970s by IBM’s San José Research Facility in California, was just
one such (Astrahan et al., 1976). That project was created to demonstrate
the usefulness of the relationship paradigm by implementing its data forms
and functions. This has also proven to be an important source of knowledge
on implementation issues like transaction processing, concurrency, recovery
processes, query processing, data secure operation, human elements, and
interface design, resulting in the publishing of numerous research articles and
the provision of various prototypes. The Systems R program, in particular,
resulted in two significant developments (Biber et al., 2008):
• the creation of SQL (prominent ‘S-Q-L’ or ‘See-Quel’), an
extensible markup language (XML) that has subsequently
are becoming the official International Organization for
Standardization (ISO) and de facto standard language for
relational DBMSs;
During the late 1970s and early 1980s, many commercially relational
DBMS solutions were developed, such as IBM’s DB2 including SQL/DS,
and Oracle Company’s Oracle.
The Interactive Graphics Retrieval System (INGRES) program just at
the University of California in Berkeley, which was present simultaneously
with that of the System R effort, was indeed the second key effort in the
creation of the relational paradigm. The INGRES project included the
creation of a sample RDBMS, with both the research focusing on other
broad goals such as the Systems R project. This study created the specific
products INGRES from Relational Technology Inc. (formerly Advanced
Ingres Corporate Relational Databases from Computing Associates) as well
as the Intelligence Database Machine by Britton Lee Inc., which advanced
the broad understanding of relational ideas (Codd, 1979).
Just at IBM UK Research Institute at Peterlee, this third contribution
was the Peterlee Relational Technology Demonstrator (Todd, 1976). That
project was crucial for studies into topics such as information retrieval and
optimization, as well as functional extensions, which would have a more
theoretical bent than that of the Systems R or INGRES initiatives. In the late
1970s and early 1980s, commercial units based upon that relational paradigm
began to arise (Gardner et al., 2008). Although many don’t exactly follow
the definitions of the relational data model, there are already several hundred
RDBMSs including both industrial and PC systems. Microsoft Office and
Visible FoxPro by Microsoft, InterBase with JDataStore from Borland, plus
R: Base from R: BASE Innovations are all instances of PC-based RDBMSs

(Yu et al., 2006).
Several non-relational applications now offer a hierarchical interface,
regardless of the underlying architecture, due to the success of the relational
paradigm. The main network DBMS, Computers Associates’ IDMS,
has been renamed Advantages CA-IDMS, and now supports a relational
representation of data. Computer Corporations of America’s Model 204 and
Software AG’s ADABAS are two more mainframe DBMSs that provide
certain relational functionality. Several relational model modifications have
been suggested, such as expansions to (Smith & Kirby, 2009):
• capture the information from data more precisely (for particular,
Codd, 1979);
• supporting object-oriented principles (Stonebraker and Rowe,
1986, for example);
• supports object-oriented ideas (for example, Involvement
products and Rowe, 1986);
4.3. TERMINOLOGY
The mathematical idea of a connection, which would be physically
expressed as a table, underpins the relational paradigm. Codd, an educated
mathematician, employed mathematical terminology, primarily set theory
and deductive reasoning. The language and structural ideas of the relational
data model are discussed in this section (Getoor & Sahami, 1999).
4.3.1. Relational Data Structure

Relation A table having columns and rows is referred to as a relation. An
RDBMS merely needs that now the database is seen as tables even by the
user. This notion, however, only pertains to such a database’s structured
way: the exterior and conceptual layers of an ANSI-SPARC design. It has
no bearing on the database’s structural system, which may be handled using
several storage types (Martin, 2013).
Attributes A relation’s specified column is called an attribute.
Relationships are being used in the relational model to store information
that will be recorded in the database. Every relation is expressed as a two-
dimensional table, with personal statistics represented by the table’s rows
and characteristics represented by the table’s columns. Attributes may be

arranged in any sequence, and the relationship will remain the same because
hence express the same information (Fan et al., 2019).
The Branch connection, for example, shows the information on branch
locations, having columns for characteristics branchNo (the branch number),
road, town, and postal. Furthermore, the Staff relationship represents
information on employees, including columns for properties such as staffNo
(the employee number), fName, lName, job, gender, DOB (date of birth),
income, and branchNo (the quantity of the division the operate associate the
whole thing at). The Branch and Staff Relationships are shown in Figure 4.1.
As you can see in this example, a column has values for just one property;
for instance, the branchNo columns only carry numbers of existing branch
locations (Codd, 2002).
Domain The collection of possible values with one or even more
characteristics is referred to as a domain. The relationship model’s domains
are a very strong feature. A domain is used to specify each attribute in a
connection. Every attribute may have its domain, and two or even more
characteristics may be declared on the same domain. Figure 4.2 depicts
the domains for a few of the Division and Staff relationship properties
(Sommestad et al., 2010). It’s worth noting there will usually be values
inside a domain that don’t show as entries in the appropriate attribute for
anyone at one moment. The domain idea is significant because it enables the
user to specify the meanings and origin of values which attributes may carry
in one single location. As a consequence, the system has more information at
its disposal when performing a relational action, and conceptually erroneous
activities may be avoided. Although the domain specifications for both of
these properties are character sequences, this is not logical to contrast a
street address with a contact number. The average rent on a residence and
also the number of months it has been rented, on either hand, have distinct
domains (the first a money value, the other a numerical value), yet it is still
allowed to do so (Schek & Scholl, 1986).
Figure 4.1. Relationships between the branch and also the staff.
Source: https://www.researchgate.net/figure/Relationship-between-the-num-
ber-of-staff-and-network-range_fig1_266068320.
Figure 4.2. Some Branch and Staff Relations qualities have their domains.
Source: https://opentextbc.ca/dbdesign01/chapter/chapter-8-entity-relation-
ship-model/.
Two parameters from such domains multiplied Since these two examples
demonstrate, implementing domains completely is difficult, and as a
consequence, many RDBMSs don’t support them (Horsburgh et al., 2008).
Tuple: A tuple is just a relation’s row. The row or tuples inside this
database are the components of a relation. Every row within the Branch
relationship has four variables, per characteristic. Tuples may be arranged
in any sequence, and the connection will remain the same because hence
communicate the same information. The construction of a relation, as well
as a definition of the domains or any other limits on potential values, are
referred to it by its real intention, and it is normally fixed until the definition
of the connection is updated to incorporate more characteristics. The tuples
are referred to as a relation’s extensions (or condition), that vary over time
(Vassiliadis & Sellis, 1999).
Degree: The number of qualities in a connection determines its degree.
Figure 4.1 shows a branch connection with four qualities, or grade four. This
indicates that each table row is just a four-tuple with four factors. An integer
arithmetic relation, also known as a one-tuple, is a connection with just one
variable and has a degree of one. Binary refers to a relationship with two
qualities, ternary refers to a relationship with three aspects, and n-ary refers
to a relationship with more than three components. A relation’s level is a
characteristic of the relation’s real intention (Johannesson, 1994).
Cardinality: A relation’s cardinality has been the number of tuples it
includes. The cardinality of a connection, on the other hand, refers to the
number of tuples within the relation, which varies when tuples were added
or removed. The cardinality of a connection is a feature of its extension
that is defined by the specific instance of the connection anywhere at a
given time. Finally, we’ll look at what a database system is. Database with
a relational structure A set of normalized relations, each having its relation
name. A database table is made up of properly formed relationships. This
suitability is referred to as normalization (Gougeon, 2010).
Different terminologies the structural model’s vocabulary may be
somewhat perplexing. Two sets of words have been introduced. In reality,
the third set of words is frequently used: a relation is called a file, tuples
are called records, and characteristics are called fields. This nomenclature
comes from reality so each relationship may be physically stored in a file by
the RDBMS. The parameters for the relational data model are summarized
in Table 4.1 (Pierce & Lydon, 2001).
Table 4.1. Alternate Names for Terms in Relational Models
4.3.2. Mathematical Relations

To grasp the fundamental meaning of the word relation, we must first study
several mathematical ideas. Assume we possess two sets, D1 and D2, with
D1 equaling 2, 4 and D2 equaling 1, 3, 5. The set containing all ordered
pairs in which the first component is a membership of D1 and the second
component is a part of D2 seems to be the Cartesian combination of all these
sets, denoted D1 D2. Finding all combinations of components first from
D1 and the secondly from D2 is another model that explains this. In our
scenario, we have the following: D1 × D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4,
3), (4, 5)} (Martin, 2013).
A connection would be any subset of this Product of two. We could, for
example, create a relationship.
R such that: R = {(2, 1), (4, 1)}
By specifying a criterion for their choosing, we might determine that
ordered pairings would be in the connection. For instance, if we see that
R comprises all organized pairs with a 1 also as a second clause, we may
express R as (Kazemi & Poole, 2018):
R = {(x, y) | x ∈D1, y ∈D2, and y = 1}
We may create another S connection using these identical sets, where
the first member is almost always twice any second. As a result, we should
write S as (Armstrong, 1974):
S = {(x, y) | x ∈D1, y ∈D2, and x = 2y} or, in this example,
S = {(2, 1)}
Because the Cartesian combination only has one ordered list that meets
this criterion (Cozman & Mauá, 2015).
The concept of a relation may readily be extended to three sets. Let’s
say there are three sets: D1, D2, and D3. Every set among all sorted triples
from the first component from D1, the second component from D2, as well
as the third component from D3, seems to be the Cartesian products D1 D2
D3 of all these three sets. A connection is any collection of such a Cartesian
product. Consider the following scenario (Mereish & Poteat, 2015):
D1 = {1, 3} D2 = {2, 4} D3 = {5, 6}
D1 × D2 × D3 = {(1, 2, 5), (1, 2, 6), (1, 4, 5), (1, 4, 6), (3, 2, 5), (3, 2, 6),
(3, 4, 5), (3, 4, 6)}
A connection is any set of all these organized triples. Using n domains,
we may expand the three different sets and create a generic relation. Let D1,
D2., Dn be the number of sets. Their Cartesian output is as follows:
D1 × D2 ×... × Dn = {(d1, d2..., dn) | d1 ∈D1, d2 ∈D2..., dn ∈Dn}
and is most often written as (Gadia, 1988):
A relationship mostly on n subsets is any collection of n-tuples from such

a Product of two. It’s worth noting because we have to provide the sets, or
domains, from whom we select values when constructing these relationships
(Smets & Jarzabkowski, 2013; Mesquita et al., 2008).
4.3.3. Database Relations

We may create a connection schema by applying the following notions to
databases. Relationship A specified relation is a collection of characters and
domain name pairings that describe a relationship. Schema (Pecherer, 1975).
Let A1, A2, . . . , An be qualities with areas D1, D2, . . . , Dn. Formerly
the usual {A1:D1, A2:D2,. . . , An:Dn} is a relational model. A relation R
is a relationship specified by such a relation schema. S is a collection of
mappings between attribute names and their domains. As a result, relation R
is a collection of n-tuples (Brilmyer, 2018):
(A1:d1, A2:d2, . . . , An:dn) such that d1 ∈D1, d2 ∈D2, . . . , dn ∈Dn
The n-elements tuples each have a property and values for just that
property. Whenever we express a relationship as both a table, we usually
include the note of specific column headers and also the tuples and rows of
the form (d1, d2…, dn), where every other value comes from the relevant
domain. In this approach, a relationship in the relational data model may be
thought of as any subset of both the Product of two of the property domains.
A table is just a depiction of such a relationship in physical form (Aghili
Ashtiani & Menhaj, 2014).
The Branch relationship in Figure 4.1, for example, contains
characteristics branchNo, road, municipality, and postal, with its domains.
Every subset of a Product of two of the domains, or indeed any minimum of
four where the first component would be from the domains BranchNumbers,
the next from the domain StreetNames, and so on, is referred to as the
Branch relationship. The following is also one of the four tuples (Blau &
McGovern, 2003):
{(B005, 22 Deer Rd, London, SW1 4EH)} or more properly:
{(branchNo: B005, street: 22 Deer Rd, city: London, postcode: SW1
4EH)}
This is referred to as a relationship instance. The Branch table is a
handy means of putting down every one of the four tuples that make up the
connection at a given point in time, hence why tables rows in the relational
data model are referred to as tuples. A database system has a structure in
the same manner that a relationship does. Schema for database systems A
collection of related schemas, each with its unique name (Cadiou, 1976).
If R1, R2, . . . , Rn We may describe the relational data structure, or
generally relationship schema, if there exist a collection of connection
schemas., R, as: R = {R1, R2, . . . , Rn}
4.3.4. Properties of Relations

The qualities of a relation are as follows (Ehrhard, 2012):
• The relationship has a unique name that differs from the names of
all other relations in the relational model.
• There is only one atomic (individual) value in every cell of both
relationships.
• Each characteristic is given a unique name.
• An attribute’s values are smooth from the same field.;
• There are no duplicate tuples; every tuple is unique.
• The ordering of the qualities is irrelevant;
• The ordering of tuples has no theoretical importance. (However,
in fact, the ordering may affect the speed with which tuples are
accessed.)
Examine the Branch connection in Figure 4.1 once again to see what
these constraints represent. It is unlawful to keep two postal codes for just a
unique branch office inside a single cell because each cell must only hold one
value. In all other words, there are no recurring groupings in connections. A
relation is normalized if it meets this characteristic (Vyawahare et al., 2018).
The qualities of the relation are represented by the columns displayed
at the tops of the columns. We must not accept a postal value inside this
column since the values inside the branchNo property were from the
BranchNumbers domain. In a relationship, there can’t be any duplicated
tuples. The row (B005, 22 Deer Rd, London, SW1 4EH) occurs just once,
for example (Hull, 1986).
We may swap columns if the attribute term is moved including
the attributes. Even though it makes good sense to retain the address
components in the typical sequence for reading, the table could reflect the
same connection if the city property was placed even before the postcode
property. Tuples may also be swapped, therefore the records of branching
B005 and B004 can be swapped and the relationship will remain the same
(Schumacher & Fuchs, 2012).
The majority of the qualities provided for relations are derived from
mathematical equations (Lyons-Ruth et al., 2004):
• Each member in each tuple remained single-valued whenever we
computed the Cartesian combination of sets containing simple,
single-valued components like integers. In the same way, every
cell in such a relation has precisely one value. A mathematical
relationship, on the other hand, does not need to be normalized.
To keep the relational model simple, Codd decided to exclude
recurring groups.
• The set, or domains, upon which position is specified determines
the available values for that position in such a connection.
Every column in a table must have values from the very same
characteristic domain.
• There are no components in a set that are duplicated. Likewise,
there are still no duplicated tuples in a relationship.
• The position of the items in a relationship has no bearing since
it is a set. As a result, the position of tuples in a relationship is
irrelevant.
In a mathematical relationship, although, the arrangement of the items in

a tuple is crucial. The item set (1, 2), for example, is substantially different
from the item set (1, 2). (2, 1). This isn’t true for relationships inside the
relational data model, which stipulates that now the sequence of characteristics
is irrelevant (Ramanujam et al., 2009). Because the column heads specify
which property the value refers to, this is the case. This implies that now the
position of column headers in the addition is irrelevant, however, the order
of components inside the tuples of the extensions must reflect the order of
the attributes because once the relation’s design is determined (Greenyer &
Kindler, 2010).
4.3.5. Relational Keys

There seem to be no duplicated tuples inside a connection, as indicated
before. As a result, we must be able to recognize one or more qualities
(referred to as structural keys) that uniquely define every tuple in such a
relation. The vocabulary for relationship keys is explained in this section.
Superkey A property or group of properties that identifies a specific tuple
inside a relation (Chamberlin, 1976).
Every tuple inside a relation is uniquely identified by a superkey.
Furthermore, a superkey may have other properties that aren’t required
for a proper identifier, and we’re looking for superkeys with just the bare
minimum of qualities needed for a proper identifier (Kloesgen & Zytkow,
1994).
Candidate Within the key relationship, a superkey because no suitable
subset is a superkey.
A connection R’s candidate key, K, has two characteristics:
• Uniqueness: The values of K within every R tuple properly
identify every tuple; and
• Irreducibility: The originality property does not apply to any
suitable subset of K.
A relationship may have several primary keys. Whenever a key has much
more than one property, it is referred to as a composite key (Vinciquerra,
1993). Take a look at Figure 4.1 for an example of a branch relationship.
We may identify various branch offices based on a city’s worth (for sample,
London has two division workplaces). This property can’t be used as
a candidate key. On either hand, since DreamHome assigns a distinctive
branch number to every branch office, we could only derive one tuple from
a branching number value, branchNo, hence branchNo is a candidate key.

Likewise, the postcode is a possible key to this relationship (He & Naughton,
2008). Considering the relationship Observing, which includes information
about properties that customers have seen. A customer number (clientNo), a
commodity number (propertyNo), a viewing date (viewDate), and, possibly,
a remark make up the connection (comment). There could be numerous
related viewings for various properties given a customer number, clientNo.
Likewise, given a product number, propertyNo, this business may have
been seen by numerous consumers (Hofmeyr, 2021). As a result, clientNo
or propertyNo cannot be used as a contender key by themselves. However,
since the pairing of clientNo and propertyNo can only identify one tuple,
clientNo, and propertyNo combined to provide the (composite) unique key
again for the Viewing connection. We might add viewDate towards the
composite key if we ever need to account for the chance that a client would
view a feature multiple times. However, we believe that it’s not required
(Ceruti, 2021).
It’s worth noting that a relationship instance can’t be used to verify that
a feature or set of properties is a primary key. The absence of duplication for
values that emerge at a given point in time does not rule out the possibility
of duplication. The existence of duplication in an example, on the other
hand, might be used to demonstrate that certain feature combinations are not
relevant facts (Dong & Su, 2000). To determine if duplicates are conceivable,
we must first understand the real world’s interpretation of the feature(s) in
question. We could only be sure that a feature pairing is a potential key
if we use these semantic features. For instance, based on data in Figure
4.3, we could believe that lName, the worker’s surname, would be a great
candidate key again for Staff relationships. Even though the Staff relation
has only one value of ‘White,’ a subsequent member of the staff with both
the surname ‘White’ could join the firm, negating the selection of lName as
just a candidate key (Paredaens et al., 2012).
Primary Key: The possible key for identifying individual tuples inside
a connection (Bucciarelli et al., 2009).
Because there are no duplicated tuples in such a relationship, each row
may be uniquely identified. The main key is always present in a connection.
In the worst scenario, the whole collection of attributes might be used as
the unique identifier, but most of the time, a smaller selection of attributes
is enough to identify the tuples. Alternative keys are alternative keys which
are not chosen to be the main key. If we use branchNo as the main key in the
Branch relationship, postcode becomes an alternative key. Because there is
just one candidate key for the Watching relation, which consists of clientNo
and propertyNo, these characteristics would naturally constitute the primary
key (Allen & Terry, 2005).
Foreign Key: A characteristic (or combination of characteristics) in
one connection that corresponds to the primary keys of another (potentially
identical) relation (Selinger et al., 1989).
Whenever an attribute occurs in several relations, it typically indicates a
connection between the two interactions’ tuples. For example, the presence
of branchNo both in Branch and Staff relationships fields is intentional,
since it connects each branch to the specifics of the employees that work
there. BranchNo is the principal key in the Branch relation (Reilly, 2009).
The branchNo property, on the other hand, is used in the Staff relationship
to link employees to the branch office where they work. BranchNo is just
a primary key in the Staff connection. The attribute branchNo with in Staff
relation is said to be aimed towards the main key attribute Branch with
in-home relation. As we’ll see from the following chapter, such common
features play a crucial role in manipulating data (Brookshire, 1993).
4.3.6. Representing Relational Database Schemas

Any amount of normalized relations may be found in a database system.
A portion of the DreamHome research report’s relational structure is as
follows (Pirotte, 1982):
Figure 4.3. A DreamHome rental database instance.
Source: https://www.transtutors.com/questions/create-the-dreamhome-rental-
database-schema-defined-in-section-3–2-6-and-insert-the--2009490.htm.
Division (branchNo, path, urban, postcode)

Staff (staffNo, fName, lName, location, gender, DOB, income,
branchNo)
PropertyForRent (propertyNo, way, town, postcode, kind, rooms,
payment, ownerNo, staffNo,
branchNo)
Customer (fName, lName, clientNo, telNo, prefType, maxRent)
PrivateOwner (fName, ownerNo, lName, address, telNo)
Observing (propertyNo, clientNo, view date, remark)
Record-keeping (branchNo, clientNO, staffNo, date joined)
The description of the relationship, followed by attributes in parenthesis,
is a standard pattern for describing a relation structure. The principal key is
often highlighted. The conceptual framework, often known as the logical
model, is just a collection of all database designs. This relational structure is
seen in Figure 4.3 (Bagley, 2010).
4.4. INTEGRITY CONSTRAINTS

The structural aspect of a relational model was explored in the preceding
section. A database schema also contains two additional components:
a manipulating section that specifies the sorts of operations that may be
performed just on data, as well as a set of referential integrity that ensures
that data is correct. We’ll talk about relational integrity restrictions in this
part, then relational manipulating operations in the following chapter (Calì
et al., 2002).
We’ve previously seen such an instance of an authenticity constraint:
because each attribute does have a domain, there are limitations (known as
domain restrictions) that limit the range of values that may be used for related
attributes. There are two key integrity criteria, which are limits or limitations
that are applied to all database servers. Integrity and data consistency are
the two main rules of the relational data model. The multitude and generic
constraints, that we present, are two further forms of referential integrity.
It’s important to grasp the idea of nulls before we can establish entity and
integrity constraints (Tansel, 2004).
4.4.1. Nulls
Null That value contains the value for a property that is presently unknown
or irrelevant to this tuple (Calı et al., 2004).
The rational value ‘undefined’ might be interpreted as null. It might
indicate that even a value isn’t relevant to a certain tuple, and that could
simply indicate that neither value has been provided yet. Nulls are a means to
cope with missing or unusual data (Zimmermann, 1975). A null, on the other
hand, is not like a numerical value of zero or even a text string containing
spaces; although zeros and areas are values, a null denotes the lack of one.
As a result, nulls must be handled differently from those other values. Many
publications use the phrase ‘null value,’ however since a null would not be
a value, but rather the lack of one, the word is outdated (Cockcroft, 1997).
The remark attribute, for instance, may remain unspecified in the
Observing relation illustrated in Figure 4.3 until the possible renter has viewed
the home and provided his or her remark to the organization. Lacking nulls,
bogus data must be introduced to indicate this state, or extra characteristics
must be added which may or not be significant to the consumer (Graham &
Urnes, 1992). In that example, we might use the integer ‘1’ to indicate a null
remark. Additionally, we might add a new property to the Observing relation
called has Comment Stayed Supplied, which includes a Y (Yes) if a remark
has already been provided and an N (No) without. Both of these methods
might be perplexing to the consumer (Rasdorf et al., 1987).
Because the relationship model is built on that first conditional calculus,
which would be a two-valued and Boolean logic – only one values that may
be used are right or wrong – nulls might pose implementation issues. Using
nulls forces us to use higher-valued reasoning, like three- and four-valued
reasoning (Codd, 1986, 1987, 1990) (Yazıcı & Sözat, 1998).
The use of nulls within a relational paradigm is a point of contention.
Nulls were subsequently considered an important feature of the concept
by Codd (Codd, 1990). Others believe that this method is wrong, arguing
that the incomplete information issue is not completely understood, that
neither suitable solution has already been developed, but that, as a result,
nulls should not be included in any relational data model (understand, for
example, Date, 1995) (Grefen & Apers, 1993).
We’ve arrived at a point where we can specify the two relationships to
evaluate and analyze.
4.4.2. Entity Integrity

The main keys of basic relations are subject to the first consistency rule. A
foundation relation is currently defined as a relationship that relates to an
object within the conceptual data model. We give a more detailed explanation
(Demuth & Hussmann, 1999).
The authenticity of entities A main key characteristic cannot be null in
a basic relation. A cardinality, by design, is a single identity used to identify
individual tuples. This indicates that no subgroup of the main key is adequate
to identify tuples uniquely. Allowing a null for just any component of
referential integrity implies not all characteristics are required to differentiate
between tuples, which is contrary to the basic key’s specification (Yang &
Wang, 2001; Vermeer & Apers, 1995). We shouldn’t be able to put a tuple
through into the Branch relationship with such a null again for branchNo
property, for instance, since branchNo is the main key of both the Branch
connection. Considering the composite’s unique identifier of the Observing
relation, which consists of the requests of customers (clientNo) and the
object number (propertyNo) (propertyNo). We shouldn’t be allowed to put
a null for such clientNo property, a null again for propertyNo characteristic,
or nulls both to characteristics into the Observing relationship (Motik et al.,
2007).
We’d find some flaws in this principle if we looked into it further.
Initially, why would the regulation only implement foreign keys but not
applicant keys, which further uniquely defines tuples? Second, why does this
rule only applies to base connections? Take into account the query ‘Review
all remarks from showings,’ which uses the data first from Observing
relation in Figure 4.3 (Mcminn et al., 2015). This creates a unary relation
with the ascribe comment as the key. This characteristic is required to be a
unique identifier by description, but it includes nulls (consistent with the
observation on PG36 and PG4 by consumer CR56). Because this isn’t a base
relation, the primary key can be null. So many efforts were made to rewrite
this principle (see, for example, Codd, 1988; Date, 1990) (Qian, 1994).
4.4.3. Referential Integrity

Foreign keys fall under the second integrity rule.
Context-sensitive If one connection has a primary key, the foreign key
then must verify the authenticity of a candidate’s key-value pair of a certain
tuple with an in-home relationship or be completely null (Fan & Siméon,
2003).
For instance, branchNo with in Staff relationship is a primary key that

refers to the Branch household relation’s branchNo property. This should
not be allowed to create a worker record with the branch code B025, for
instance, unless the Branch relationship already has a reference with both
the branch codes B025. Furthermore, we should be able to generate a newly
employed record with such null branch numbers to account for the case
when a new employee joins the organization but is not yet allocated to a
specific branch office (Hammer & McLeod, 1975).
4.4.4. General Constraints

Universal Optional rules set by consumers or system administration specify
or limit some part of the business in a database. Additionally, users may set
additional limitations that the information must meet. For example, if indeed
the number of people who may function at a local branch is limited to 20, the
person shall be able to define this general limitation and expect the DBMS
to implement it (Golfarelli et al., 1999). Whereas if the number of workers
presently allocated to a branch is 20, this shouldn’t be able to add a new
employee to the Employees relation in this situation. Regrettably, the extent
of support for generic restrictions varies considerably from one system to
the next (Hadzilacos & Tryfona, 1992).
4.5. VIEWS
Based on the information provided in the design of the databases as it seems
to be a specific user within a three-level ANSI-SPARC system. The term
‘view’ has a somewhat different connotation in the relational paradigm.
A viewpoint, while being the whole external representation of a user’s
perspective, is a virtualized or derived relationship: a connection which does
not exist in and of itself but may well be continuously formed from one or
even more basic relationships. As a result, an external model might include
both underlying (conceptual-level) linkages and perspectives generated
from them. Aspects of relational database systems are briefly discussed
in this section. We look at views in further depth and demonstrate how to
design and utilize them in SQL (Cosmadakis & Papadimitriou, 1984).
4.5.1. Terminology
Base relationships are the kind of relationships we’ve been discussing in
this chapter thus far. Basis A named relationship that tuples are saved in the
database, and which corresponds to an object within the conceptual data

model (Dayal & Bernstein, 1982).
Views may be defined in terms of basic relationships:
View The dynamical outcome with one or maybe more related actions
on basic relations to build another relation. A perspective is a synthetic
relationship which does not have to exist in the system but may be created
when a user requests it (Braganholo et al., 2003).
A viewpoint is a relationship that looks to existing towards the user and
may be modified as though it were a basic relationship, which doesn’t have
to be saved in the same way as basis relationships do (though its description
is stowed in the scheme set). An inquiry solely on a single or more basic
relation is described as the contents of such a display (Pirahesh et al.,
1994). Any actions taken on the view are instantly converted into actions
taken on the relationships at which it is generated. Modifications towards
the basic relations that influence the view are instantly represented in the
view, making views dynamic. Modifications to the underlying relationships
are performed when users create authorized updates to the display. We’ll
go through the function of viewpoints in this part, as well as some of the
limitations that come with using views to make adjustments. The discussion
of how perspectives are created and processed, on the other hand, will be
postponed (Scholl & Schek, 1990).
4.5.2. Purpose of Views

For numerous reasons, the view method is desirable (Dayal & Bernstein,
1982):
• By concealing sections of the dataset from some users, it offers a
robust and adaptable security measure. Consumers are unaware
that any properties or tuples which are not visible in the display
exist.
• It enables consumers to connect data in a manner that is tailored
to their requirements, allowing the same data to be viewed by
several users in various ways-.
• It may make complicated actions on base relationships easier.
Consumers may now conduct more basic actions just on views,
which will be converted mostly by DBMS into similar operations
on the joined, for instance, whenever a view is specified as
conjunction (join) of two variables.
A view must be built in such a way that it supports the user’s existing
external framework. Consider the following scenario (Clifford & Tansel,
1985):
• A user may need Branch tuples including the identities of
administrators and the other Branch properties. The Branch
connection is combined with a limited variant of the Employee
relationship in which the staff role is ‘Manager’ to generate this
view.
• Staff tuples lacking the pay property should be visible to certain
employees.
• Parameters can be updated or even the order in which they appear
can be modified. For instance, a user may be used to referring to
the branchNo property of branches by its complete name. That
column headline may be seen via Branch Numbers.
• Some employees should only have access to land records for such
properties they oversee.
Even though each of these situations shows that views give logical
data isolation, viewpoints provide that more substantial form of logical and
physical autonomy that aids with conceptual model restructuring. Current
customers, for instance, may be ignorant of the presence of a new attribute
introduced to a relationship if their points of view are configured to omit it.
If an established relationship is reorganized or broken up, a perspective may
be built to allow users to access their previous views (Furtado & Casanova,
1985).
4.5.3. Updating Views

All changes to a base relationship must be displayed quickly in all
perspectives that use it. In the same way, whenever a view is changed, the
subbase relationship should be modified as well. The sorts of changes that
may be done using views, however, are limited. The requirements that most
systems use to decide if an update via a view is permitted are summarized
below (Pellicano et al., 2018):
• Modifications are permitted through a view described by a simple
query utilizing a single system relation and including either the
basic relation’s main key or a candidate’s key.
• Views with numerous base relationships are not permitted to be
updated.
• Modifications are not permitted in views that perform aggregate

or grouping actions.
There are three types of views: those that are technically not upgradeable,
those that are potentially upgradeable, and those that are partly updatable.
Furtado and Casanova provide a survey on upgrading relational perspectives
(1985) (Barsalou et al., 1991).
REFERENCES
1. Aghili, A. A., & Menhaj, M. B., (2014). Construction and applications
of a modified fuzzy relational model. Journal of Intelligent & Fuzzy
Systems, 26(3), 1547–1555.
2. Allen, S., & Terry, E., (2005). Understanding relational modeling
terminology. Beginning Relational Data Modeling, 1, 57–87.
3. Armstrong, W. W., (1974). Dependency structures of data base
relationships. In: IFIP congress (Vol. 74, pp. 580–583).
4. Atzeni, P., Jensen, C. S., Orsi, G., Ram, S., Tanca, L., & Torlone, R.,
(2013). The relational model is dead, SQL is dead, and I don’t feel so
good myself. ACM SIGMOD Record, 42(2), 64–68.
5. Bagley, S. S., (2010). Students, teachers and alternative assessment
in secondary school: Relational models theory (RMT) in the field of
education. The Australian Educational Researcher, 37(1), 83–106.
6. Barsalou, T., Siambela, N., Keller, A. M., & Wiederhold, G., (1991).
Updating relational databases through object-based views. ACM
SIGMOD Record, 20(2), 248–257.
7. Biber, P., Hupfeld, J., & Meier, L. L., (2008). Personal values and
relational models. European Journal of Personality, 22(7), 609–628.
8. Blau, H., & McGovern, A., (2003). Categorizing unsupervised relational
learning algorithms. In: Proceedings of the Workshop on Learning
Statistical Models from Relational Data, Eighteenth International
Joint Conference on Artificial Intelligence (Vol. 1, pp. 3–9).
9. Braganholo, V. P., Davidson, S. B., & Heuser, C. A., (2003). On the
updatability of XML views over relational databases. In: WebDB (Vol.
1, pp. 31–36).
10. Brilmyer, G., (2018). Archival assemblages: Applying disability
studies’ political/relational model to archival description. Archival
Science, 18(2), 95–118.
11. Brookshire, R. G., (1993). A relational database primer. Social Science
Computer Review, 11(2), 197–213.
12. Bucciarelli, A., Ehrhard, T., & Manzonetto, G., (2009). A relational
model of a parallel and non-deterministic λ-calculus. In: International
Symposium on Logical Foundations of Computer Science (Vol. 1, pp.
107–121). Springer, Berlin, Heidelberg.
13. Cadiou, J. M., (1976). On semantic issues in the relational model of

data. In: International Symposium on Mathematical Foundations of
Computer Science (Vol. 1, pp. 23–38). Springer, Berlin, Heidelberg.
14. Calı, A., Calvanese, D., De Giacomo, G., & Lenzerini, M., (2004).
Data integration under integrity constraints. Information Systems,
29(2), 147–163.
15. Calì, A., Calvanese, D., Giacomo, G. D., & Lenzerini, M., (2002). Data
integration under integrity constraints. In: International Conference
on Advanced Information Systems Engineering (Vol. 1, pp. 262–279).
16. Ceruti, M. G., (2021). A review of database system terminology. Data
Management, 13–31.
17. Chamberlin, D. D., (1976). Relational data-base management systems.
ACM Computing Surveys (CSUR), 8(1), 43–66.
18. Clifford, J., & Tansel, A. U., (1985). On an algebra for historical
relational databases: Two views. ACM SIGMOD Record, 14(4), 247–
265.
19. Cockcroft, S., (1997). A taxonomy of spatial data integrity constraints.
GeoInformatica, 1(4), 327–343.
20. Codd, E. F., (1979). Extending the database relational model to capture
more meaning. ACM Transactions on Database Systems (TODS), 4(4),
397–434.
21. Codd, E. F., (2002). A relational model of data for large shared data
banks. In: Software Pioneers (Vol. 1, pp. 263–294). Springer, Berlin,
Heidelberg.
22. Cosmadakis, S. S., & Papadimitriou, C. H., (1984). Updates of
relational views. Journal of the ACM (JACM), 31(4), 742–760.
23. Cozman, F. G., & Mauá, D. D., (2015). Specifying probabilistic
relational models with description logics. Proceedings of the XII
Encontro Nacional de Inteligência Artificial e Computacional (ENIAC)
(Vol. 1, pp. 4–8).
24. Dayal, U., & Bernstein, P. A., (1982). On the correct translation of
update operations on relational views. ACM Transactions on Database
Systems (TODS), 7(3), 381–416.
25. Dayal, U., & Bernstein, P. A., (1982). On the updatability of network
views—Extending relational view theory to the network model.
Information Systems, 7(1), 29–46.
26. Demuth, B., & Hussmann, H., (1999). Using UML/OCL constraints
for relational database design. In: International Conference on the
Unified Modeling Language (Vol. 1, pp. 598–613). Springer, Berlin,
Heidelberg.
27. Dong, G., & Su, J., (2000). Incremental maintenance of recursive views
using relational calculus/SQL. ACM SIGMOD Record, 29(1), 44–51.
28. Ehrhard, T., (2012). The Scott model of linear logic is the extensional
collapse of its relational model. Theoretical Computer Science, 424,
20–45.
29. Fan, W., & Siméon, J., (2003). Integrity constraints for XML. Journal
of Computer and System Sciences, 66(1), 254–291.
30. Fan, X., Li, B., Li, C., SIsson, S., & Chen, L., (2019). Scalable
deep generative relational model with high-order node dependence.
Advances in Neural Information Processing Systems, 1, 32.
31. Furtado, A. L., & Casanova, M. A., (1985). Updating relational views.
Query Processing in Database Systems, 127–142.
32. Gadia, S. K., (1988). A homogeneous relational model and query
languages for temporal databases. ACM Transactions on Database
Systems (TODS), 13(4), 418–448.
33. Gardner, D., Goldberg, D. H., Grafstein, B., Robert, A., & Gardner,
E. P., (2008). Terminology for neuroscience data discovery: Multi-tree
syntax and investigator-derived semantics. Neuroinformatics, 6(3),
161–174.
34. Getoor, L., & Sahami, M., (1999). Using probabilistic relational models
for collaborative filtering. In: Workshop on Web Usage Analysis and
User Profiling (WEBKDD’99) (Vol. 1, pp. 1–6).
35. Golfarelli, M., Maio, D., & Rizzi, S., (1999). Vertical fragmentation
of views in relational data warehouses. In: SEBD (Vol. 1, pp. 19–33).
36. Gougeon, N. A., (2010). Sexuality and autism: A critical review
of selected literature using a social-relational model of disability.
American Journal of Sexuality Education, 5(4), 328–361.
37. Graham, T. N., & Urnes, T., (1992). Relational views as a model for
automatic distributed implementation of multi-user applications. In:
Proceedings of the 1992 ACM Conference on Computer-Supported
Cooperative Work (Vol. 1, pp. 59–66).
38. Greenyer, J., & Kindler, E., (2010). Comparing relational model
transformation technologies: Implementing query/view/transformation
with triple graph grammars. Software & Systems Modeling, 9(1), 21–
46.
39. Grefen, P. W., & Apers, P. M., (1993). Integrity control in relational
database systems—An overview. Data & Knowledge Engineering,
10(2), 187–223.
40. Hadzilacos, T., & Tryfona, N., (1992). A model for expressing
topological integrity constraints in geographic databases. In: Theories
and Methods of Spatio-Temporal Reasoning in Geographic Space (Vol.
1, pp. 252–268). Springer, Berlin, Heidelberg.
41. Hammer, M. M., & McLeod, D. J., (1975). Semantic integrity in a
relational data base system. In: Proceedings of the 1st International
Conference on Very Large Data Bases (Vol. 1, pp. 25–47).
42. He, J. S. K. T. G., & Naughton, C. Z. D. D. J., (2008). Relational
databases for querying XML documents: Limitations and opportunities.
In: Proceedings of VLDB (Vol. 1, pp. 302–314).
43. Hofmeyr, J. H. S., (2021). A biochemically-realizable relational model
of the self-manufacturing cell. Biosystems, 207, 104463.
44. Horsburgh, J. S., Tarboton, D. G., Maidment, D. R., & Zaslavsky, I.,
(2008). A relational model for environmental and water resources data.
Water Resources Research, 44(5), 33–38.
45. Hull, R., (1986). Relative information capacity of simple relational
database schemata. SIAM Journal on Computing, 15(3), 856–886.
46. Johannesson, P., (1994). A method for transforming relational
schemas into conceptual schemas. In: Proceedings of 1994 IEEE 10th
International Conference on Data Engineering (Vol. 1, pp. 190–201).
IEEE.
47. Kazemi, S. M., & Poole, D., (2018). Bridging weighted rules and graph
random walks for statistical relational models. Frontiers in Robotics
and AI, 5, 8.
48. Kloesgen, W., & Zytkow, J. M., (1994). Machine discovery terminology.
In: KDD Workshop (Vol. 1, p. 463).
49. Lyons-Ruth, K., Melnick, S., Bronfman, E., Sherry, S., & Llanas,
L., (2004). Hostile-helpless relational models and disorganized
attachment patterns between parents and their young children: Review
of research and implications for clinical work. Attachment Issues in
Psychopathology and Intervention, 1, 65–94.
50. Martin, J. J., (2013). Benefits and barriers to physical activity for
individuals with disabilities: A social-relational model of disability
perspective. Disability and Rehabilitation, 35(24), 2030–2037.
51. Mcminn, P., Wright, C. J., & Kapfhammer, G. M., (2015). The
effectiveness of test coverage criteria for relational database schema
integrity constraints. ACM Transactions on Software Engineering and
Methodology (TOSEM), 25(1), 1–49.
52. Mereish, E. H., & Poteat, V. P., (2015). A relational model of sexual
minority mental and physical health: The negative effects of shame on
relationships, loneliness, and health. Journal of Counseling Psychology,
62(3), 425.
53. Mesquita, L. F., Anand, J., & Brush, T. H., (2008). Comparing the
resource‐based and relational views: Knowledge transfer and spillover
in vertical alliances. Strategic Management Journal, 29(9), 913–941.
54. Motik, B., Horrocks, I., & Sattler, U., (2007). Adding integrity
constraints to OWL. In: OWLED (Vol. 258, pp. 3–9).
55. Paredaens, J., De Bra, P., Gyssens, M., & Van, G. D., (2012). The
Structure of the Relational Database Model (Vol. 17, pp. 22–27).
Springer Science & Business Media.
56. Pecherer, R. M., (1975). Efficient evaluation of expressions in a
relational algebra. In: ACM Pacific (Vol. 75, pp. 44–49).
57. Pellicano, M., Ciasullo, M. V., Troisi, O., & Casali, G. L., (2018). A
journey through possible views of relational logic. In: Social Dynamics
in a Systems Perspective (Vol. 1, pp. 195–221). Springer, Cham.
58. Pierce, T., & Lydon, J. E., (2001). Global and specific relational models
in the experience of social interactions. Journal of Personality and
Social Psychology, 80(4), 613.
59. Pirahesh, H., Mitschang, B., Südkamp, N., & Lindsay, B., (1994).
Composite-object views in relational DBMS: An implementation
perspective. Information Systems, 19(1), 69–88.
60. Pirotte, A., (1982). A precise definition of basic relational notions and
of the relational algebra. ACM SIGMOD Record, 13(1), 30–45.
61. Qian, X., (1994). Inference channel-free integrity constraints in
multilevel relational databases. In: Proceedings of 1994 IEEE
Computer Society Symposium on Research in Security and Privacy
(Vol.1, pp. 158–167). IEEE.
62. Ramanujam, S., Gupta, A., Khan, L., Seida, S., & Thuraisingham, B.,
(2009). R2D: Extracting relational structure from RDF stores. In: 2009
IEEE/WIC/ACM International Joint Conference on Web Intelligence
and Intelligent Agent Technology (Vol. 1, pp. 361–366). IEEE.
63. Rasdorf, W. J., Ulberg, K. J., & Baugh, J. W., (1987). A structure-
based model of semantic integrity constraints for relational data bases.
Engineering with Computers, 2(1), 31–39.
64. Reilly, C., (2009). The concept of knowledge in KM: A relational
model. Electronic Journal of Knowledge Management, 7(1), 145–154.
65. Schek, H. J., & Scholl, M. H., (1986). The relational model with
relation-valued attributes. Information Systems, 11(2), 137–147.
66. Scholl, M. H., & Schek, H. J., (1990). A relational object model. In:
International Conference on Database Theory (Vol. 1, pp. 89–105).
67. Schumacher, R. F., & Fuchs, L. S., (2012). Does understanding
relational terminology mediate effects of intervention on compare
word problems?. Journal of Experimental Child Psychology, 111(4),
607–628.
68. Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A.,
& Price, T. G., (1989). Access path selection in a relational database
management system. In: Readings in Artificial Intelligence and
Databases (Vol. 1, pp. 511–522). Morgan Kaufmann.
69. Smets, M., & Jarzabkowski, P., (2013). Reconstructing institutional
complexity in practice: A relational model of institutional work and
complexity. Human Relations, 66(10), 1279–1309.
70. Smith, C. A., & Kirby, L. D., (2009). Putting appraisal in context:
Toward a relational model of appraisal and emotion. Cognition and
Emotion, 23(7), 1352–1372.
71. Sommestad, T., Ekstedt, M., & Johnson, P., (2010). A probabilistic
relational model for security risk analysis. Computers & Security,
29(6), 659–679.
72. Tansel, A. U., (2004). Temporal data modeling and integrity constraints
in relational databases. In: International Symposium on Computer
and Information Sciences (Vol. 1, pp. 459–469). Springer, Berlin,
Heidelberg.
73. Tyler, T. R., & Lind, E. A., (1992). A relational model of authority in
groups. In: Advances in Experimental Social Psychology (Vol. 25, pp.
115–191). Academic Press.
74. Vassiliadis, P., & Sellis, T., (1999). A survey of logical models for
OLAP databases. ACM SIGMOD Record, 28(4), 64–69.
75. Vermeer, M. W., & Apers, P. M., (1995). Object-oriented views of
relational databases incorporating behavior. In: DASFAA (Vol. 1, pp.
26–35).
76. Vinciquerra, K. J., (1993). Terminology Database Record
Standardization and Relational Organization in Computer-Assisted
Terminology (pp. 170–171). ASTM Special Technical Publication,
1166.
77. Vyawahare, H. R., Karde, P. P., & Thakare, V. M., (2018). A hybrid
database approach using graph and relational database. In: 2018
International Conference on Research in Intelligent and Computing in
Engineering (RICE) (Vol. 1, pp. 1–4). IEEE.
78. Yang, X., & Wang, G., (2001). Mapping referential integrity constraints
from relational databases to XML. In: International Conference on
Web-Age Information Management (Vol. 1, pp. 329–340). Springer,
Berlin, Heidelberg.
79. Yazıcı, A., & Sözat, M. İ., (1998). The integrity constraints for
similarity‐based fuzzy relational databases. International Journal of
Intelligent Systems, 13(7), 641–659.
80. Yu, K., Chu, W., Yu, S., Tresp, V., & Xu, Z., (2006). Stochastic
relational models for discriminative link prediction. Advances in
Neural Information Processing Systems, 1, 19.
81. Zimmermann, K., (1975). Different views of a data base: Coexistence
between network model and relational model. In: Proceedings of the
1st International Conference on Very Large Data Bases (Vol. 1, pp.
535–537).
CHAPTER 5
DATABASE PLANNING AND DESIGN
CONTENTS
5.1. Introduction..................................................................................... 140
5.2. The Database System Development Lifecycle................................... 141
5.3. Database Planning........................................................................... 143
5.4. Definition of the System................................................................... 144
5.5. Requirements Collection and Analysis............................................. 145
5.6. Database Design.............................................................................. 149
References.............................................................................................. 154
5.1. INTRODUCTION
Several computer-based systems increasingly rely on software rather than
hardware to succeed. Consequently, the company’s software development
track record isn’t especially spectacular. Software applications have
proliferated in recent decades, ranging from tiny, relatively basic applications
with some code lines to big, complicated programs with thousands of
lines of code. Most of such programs had to be maintained regularly (Al-
Kodmany, 1999). This included fixing any errors that were discovered,
adding additional user needs, and updating the program to function on new
or updated systems. Maintenance initiatives have been launched to deplete
resources at an astonishing rate. Consequently, a large number of important
software projects were late, over budget, unstable, hard to manage, and
underperforming (Skaar et al., 2022).
This resulted in the occurrence of the software crisis. Even though this
word had been originally coined in the late 1960s, the issue has been with
us more than four decades later. Consequently, certain authors now call
the software crisis the software slump. Research conducted in the United
Kingdom by OASIG, a Special Interest Group dealing with Organizational
Components of Information Technology, revealed the following concerning
software projects (OASIG, 1996) (Teng & Grover, 1992).
• Eighty to 90% don’t fulfill their efficiency targets;
• Approximately 80% have been delivered late and beyond budget;
• Approximately 40% of them fail or have been withdrawn;
• fewer than 40% completely address skill and training needs;
• fewer than 25% combine enterprise and technology objectives
effectively;
• Only 10 to 20% of those who apply to satisfy all of the
requirements.
There have been multiple significant reasons for the software failure of
the project, such as (El-Mehalawi & Miller, 2003):
• an absence of a comprehensive set of criteria;
• absence of a suitable development technique;
• Poor design deconstruction into manageable elements;
Database Planning and Design 141
An organized method for software design dubbed the Software

Development Lifecycle (SDLC) or the Information Systems Lifecycle
(ISLC) had been presented as a solution to such difficulties. The lifecycle
has been more explicitly known as the Database System Development
Lifecycle (DSDLC) whenever the program being created is a database
system (Hernandez, 2013).
5.2. THE DATABASE SYSTEM DEVELOPMENT

LIFECYCLE
Because a database system is an integral part of a bigger organization’s
information network, the DSDLC is inextricably linked to the information
system’s lifecycle. Figure 5.1 depicts the steps of the DSDLC. Every stage’s
name is followed by a link to the part of this chapter that covers that stage
(Tetlay & John, 2009).
It’s critical to understand that the phases of the DSDLC aren’t exactly
sequential, although some of the preceding phases have been repeated via
feedback loops. Issues identified throughout database design, for instance,
can involve the gathering and study of new needs. We only display a few of
the more prominent feedback loops in Figure 5.1 since there have feedback
loops between many phases. Table 5.1 summarizes the major tasks connected
with every step of the DSDLC (Ruparelia, 2010).
The lifetime doesn’t have to be complicated for tiny database systems
with a modest number of consumers. The lifecycle may get quite complex
whenever developing a medium to big database system having hundreds to
millions of consumers and thousands of questions and application programs.
The activities related to the creation of medium to large database systems are
the main focus. The primary tasks connected with every step of the DSDLC
are described in further detail in the following parts (Weitzel & Kerschberg,
1989).
Figure 5.1. The steps in the building of a database system.
Source: https://www.slideshare.net/AfrasiyabHaider/database-development-
life-cycle.
Table 5.1. A Synopsis of the Principal Activities Connected with Each Phase
of the DSDLC
5.3. DATABASE PLANNING

The management actions that enable the most effective and successful
realization of the phases of the DSDLC (Teng & Grover, 1992).
Database development should be unified into the organization’s total
IT approach. The following are the 3 primary topics to consider while
developing an IS approach (Haslum et al., 2007):
• Identifying corporate strategy and goals, followed by determining
information system requirements;
• Contemporary information systems (IS) are assessed to evaluate
their strengths and limitations;
• Evaluation of IT options that could lead to competitive benefits;
The approaches employed to address such difficulties are outside the
scope of this chapter; although, interested readers might consult Robson
(1997) for a lengthier explanation. A crucial basic stage in database design
is to explicitly establish the database system’s mission statement (Taylor Jr

et al., 2001). The database system’s main goals are defined in the mission
statement. The mission statement is usually defined by those in charge of
the database project inside the organization (like the Owner or Director).
A mission statement clarifies the database system’s goal and paves the
way for the effective and successful development of the essential database
system (Junk, 1998). Following the definition of the mission statement, the
following step is to determine the mission goals. Every mission goal must
include a specific job that the database system ought to be able to do. The
expectation would be that the mission statement will be met if the database
system assists the mission goals. The mission statement and objectives can
be followed by certain extra material that outlines, in broad terms, the job to
be performed, the resources available to execute it, and the funds available
to pay for everything (Álvarez-Romero et al., 2018).
Database planning must also involve the creation of rules that govern how
data would be gathered, the format that would be used, the documentation that
would be required, and the implementation and design process. Standards
may take a long time to design and manage, needing resources both to set
them up and to keep them up to date (Moe et al., 2006). A properly-designed
set of standards, on the other hand, may give a foundation for educating
personnel and monitoring quality control, and also guarantee that work
follows a sequence, regardless of staff abilities and experience. Specific
rules, for instance, can regulate how data objects in the data dictionary
are named, preventing both repetition and inconsistency. Every legal or
corporate data requirement, like the need that certain categories of data be
kept confidentially, must be recorded (Sievers et al., 2012).
5.4. DEFINITION OF THE SYSTEM

The system describes both the scope and the limitations of the database use,
in addition to the essential user views.
Before starting to build a database system, this is important that we 1st
establish the bounds of the system and how it interacts with other components
of the information system of the organization. This must be done before
we can move on to building the system. We mustn’t cover just our current
customers and application locations, but also any potential customers and
applications that may fall inside the boundaries of our system. The bounds
and scope of the database system have been expanded to accommodate
the important new user views that would be made possible by the database
(Noor et al., 2009).
5.4.1. User Views

Specifies the requirements for a database system from the standpoint of a
certain job function (like Supervisor or Manager) or enterprise application
domain (like stock control, personnel, or marketing) (Johansson et al., 2017).
Figure 5.2. User views (1, 2, and 3) as well as (5 and 6) contain overlapping
criteria (represented as hatched regions), however, user view four has diverse
demands.
5.5. REQUIREMENTS COLLECTION AND ANALYSIS

Gathering and processing data about the portion of the organization that will
be assisted by the database system, and utilizing this data to determine the
system’s needs. This phase includes the gathering and processing of data
about the enterprise division that will be served by the database. There have
been various methods known as fact-finding methods for collecting the
required data. For every main user view (such that, enterprise application
area or job involvement), the following data has been collected (Hall &
Fagen, 2017):
• An explanation of the data that was utilized or produced;
• The specifics of how data will be utilized or produced; and
• If there are any other needs for the current database system.
After that, the data has been evaluated to determine which needs (or
characteristics) should be incorporated into the current database system.
Such criteria have been outlined in a set of documents known as the new
database system’s acceptance criteria (Arnold & Wade, 2015). The gathering
and processing of needs is a step in the database design process. The amount
of data collected has been determined by the nature of the issue and the
company’s rules. Too much research too quickly might lead to analysis
paralysis. Working on the incorrect solution to the wrong issue might result
in a loss of both money and time if you don’t give it enough attention
(Westmark, 2004).
The data gathered at this stage can be unstructured and include certain
unstructured requests that should be turned into a more organized statement
of needs. It is accomplished using requirements specification approaches
such as Data Flow Diagrams (DFD), Structured Analysis and Design
(SAD), and Hierarchical Input Process Output (HIPO) charts, all of which
are backed by documentation (Knight et al., 2003).
Determining the essential functionality of a database system is a key task,
as systems having insufficient or partial functionality would upset consumers,
resulting in system rejection or underuse. Insufficient functionality, on the
other hand, might be troublesome since it may make a system hard to create,
manage, utilize, or learn. Another essential action related to this phase is
choosing how to handle the circumstance in which the database system has
several user views. There have been 3 major techniques for addressing the
needs of a database system having numerous user views (Lantos, 1998):
• a blend of both techniques.
• the view integration technique;
• the centralized technique;
Figure 5.3. Multiple user views 1 to 3 are managed in a centralized manner.
5.5.1. Centralized Approach

Every user view’s needs are integrated into a single set of database system
needs. During the database designing phase, a data model describing all user
views has been built. The consolidated approach entails gathering all the
criteria for various user views into a single list. A name has been provided
to the set of user views that gives a few insights into the functional region
surrounded by all the integrated user views (Franco‐Santos et al., 2007).
A model of global data, that reflects all user views, has been built during
the database design phase. The global data model is comprised of graphics
and documents that explicitly outline the consumers’ data needs. Figure 5.3
shows a schematic depicting the administration of user views 1 to 3 utilizing
a consolidated manner. Whenever there has been a lot of intersection in
needs for every user view and the database system isn’t too complicated,
this technique is usually recommended (Gacek et al., 1995).
5.5.2. View Integration Approach

Every user’s needs are kept in separate lists. During the database design
phase, data models reflecting every user view have been built and then
combined. The view integration strategy entails treating every user view’s
needs as a distinct set of criteria. For every user view, we 1st develop a
data model in the database design step (Chen et al., 2005). A local data
model is a data model that identifies a specific user view. Every model is
comprised of documentation and diagrams that officially define the needs
of certain, though not all, database user views. The local data models are
eventually brought together to form a global data model during a later
stage of the database design process, that reflects all database user needs.
Figure 5.4 shows a schematic depicting the administration of user views 1
through 3 utilizing the view integration technique. In general, this method is
recommended (Bedogni et al., 2012).
Figure 5.4. Controlling multiple users’ views 1 to 3 utilizing the view integra-
tion technique.
While there have been considerable disparities between user perspectives
and the database system has been complicated enough to justify breaking
the task into smaller chunks. To manage various user views in certain large
database systems, a mix of consolidated and view integration techniques can

be effective (O’Neill et al., 1977). The needs for 2 or more user views, for
instance, might be combined 1st utilizing the centralized technique, that has
been utilized to generate a localized logical data model. The view integration
technique may then be used to combine this model with additional local
logical data models to create a global logical data model. Every local logical
data model inside this scenario reflects the needs of 2 or more user views,
whereas the last global logical data model reflects the needs of the database
system’s whole user view set (Schalock & Luckasson, 2004).
5.6. DATABASE DESIGN

The procedure for designing a database system that would support the
enterprise’s mission objectives and mission statement.
This chapter introduces the most common approaches to database
design. In addition to this, we discuss the role that data modeling plays in
the design of databases as well as their applications. After that, a discussion
of the three stages of database design, namely logical design, conceptual
design, and physical design, follows (Zilio et al., 2004).
5.6.1. Database Design Methodologies

The terms “bottom-up” and “top-down” refer to the two basic approaches that
can be taken while constructing a database. The bottom-up approach begins
with the most fundamental aspects of the attributes (such that, characteristics
of linkages and objects), which have been organized into relationships that
describe classes of entities and interactions amongst entities by analyzing
the correlations among characteristics (Rao et al., 2002).
The process of normalization entails identifying the required qualities
and then aggregating them into normal relations depending upon functional
connections among them. The bottom-up strategy is best suited to the
creation of basic databases with a limited number of properties. But, while
designing more sophisticated databases with a higher number of features,
where this is impossible to identify all of the functional connections between
the characteristics, such a strategy becomes challenging. Because logical and
conceptual data models for huge databases might include a large number of
characteristics, it’s critical to develop a strategy that makes the development
process easier (Navathe et al., 1984).
Furthermore, it can be challenging to determine all of the features to be
included in the data models in the early phases of defining the data needs
for a complicated database. The top-down method seems to be a more suited

approach for designing complicated databases. Such a method begins with
the creation of data models including several higher-level relationships and
entities, followed by top-down revisions to locate low-level relationships,
entities, and the characteristics that go with them (Shneiderman & Plaisant,
2010). The Entity-Relationship (ER) model has been used to demonstrate
the top-down approach, which starts with the discovery of entities and
connections amongst entities that are of relevance to the organization. For
instance, we can start by defining the entities Property for Rent and Private
Owner, then the connection between them, Private Owner Keeps Property
for Rent, and lastly the associated traits like Private Owner (owner name,
number, and address) and Property for Rent (property address and number)
(Naiburg et al., 2001).
Utilizing the ideas of the ER model, create a higher-level data model.
Additional database design techniques include the inside-out technique and
the mixed strategy approach. The inside-out strategy has been similar to the
bottom-up technique, but it varies in that it identifies a collection of main
entities initially, and then expands to include further relationships, entities,
and qualities linked with those discovered initially. For certain portions
of the model, the mixed strategy method employs both the top-down and
bottom-up approaches before eventually merging all components (Schema,
1995).
5.6.2. Data Modeling

The 2 basic goals of data modeling are to aid in the comprehension of the
data’s interpretation and to make communication regarding information needs
easier. Solving questions about relationships, entities, and characteristics is
required while creating a data model. As a result, the designers learn about
the semantics of the company’s data, that exist if they’re not documented
in a formalized data model. All businesses rely upon entities, connections,
and characteristics to function. Consequently, unless they are well recorded,
their significance can be misunderstood. A data model makes it simpler to
comprehend the data’s meaning, therefore we model data to guarantee that
consumers comprehend (Wiederhold, 1983):
• The data from every user’s point of view;
• The application of data across many user perspectives; and
• The data’s intrinsic character, regardless of its physical
manifestations.
4Data models may be utilized to communicate the designer’s comprehen-

sion of the enterprise’s data needs. Assuming all parties are acquainted with
the model’s notation, this would facilitate communication among design-
ers and users. Enterprises are unifying how they model data by adopting a
single strategy for data modeling and applying it to all database develop-
ment initiatives. Depending upon the basics of the Entity-Relationship (ER)
model, the ER model has the most frequent higher-level data model utilized
in database architecture, and this is also the model that we make use of in
this chapter (Hernandez, 2013).
Table 5.2. The Requirements for an Optimum Data Model
5.6.3. Database Design Phases

The 3 major stages of database design are logical, conceptual, and physical
design. The procedure of creating a model of a company’s data that has
been irrespective of any physical constraints. The 1st step of database design
has been known as conceptual database design, and it entails creating a
conceptual data model of the component of the business that we want to
model. The data described in the users’ needs description has been used
to build the data model. Implementation aspects like the target DBMS
software, the programs of application, hardware platform, the languages of
programming, or other physical concerns have no bearing on conceptual
database architecture (Chin & Ozsoyoglu, 1981).
In the procedure of developing a conceptual data model, the model
has been subjected to testing and evaluation concerning the needs of the
customers. The conceptual data model of the enterprise acts as a source of

information for the subsequent phases, which are concerned with the logical
database design (Teng & Grover, 1992).
Logical database design is the second phase in the process of designing a
database. This step ultimately outcomes in the construction of a logical data
model of the component of the business that we are attempting to explain.
An improvement has been made to the conceptual data model used in the
phase before this one, and it has been converted into a logical data model.
The logical data model is relying upon the database’s aim data model (for
instance, the model of relational data) (Navathe et al., 1986).
A logical model has been produced understanding the fundamental
data model of the target DBMS, while a conceptual data model has been
irrespective of any physical factors. In other terms, we can tell if a database
management system (DBMS) is relational, hierarchical, networked, or
object-oriented. Other characteristics of the selected DBMS, in specific
physical details like storage indexes and structures, are ignored (Klochkov
et al., 2016).
During the construction of a logical data model, the model was
validated and evaluated against customer requirements. The validity of a
logical data model was determined using the standardization methodology.
Standardization confirms that the relationships created from the model of
data don’t contain redundant data, which might cause update anomalies
when employed. The logical data model must also be inspected to determine
if it assists the client-requested transactions (Letkowski, 2015).
The logical data model has been used as a source of data for the
succeeding phase, which is known as the physical database design. It gives
the physical database designer the ability to make compromises, which have
proven to be essential for effective database design. The logical model also
plays an important part in the DSDLC’s operational maintenance phase,
which takes place after the design phase. When maintained properly and
updated regularly, the data model enables the database to reflect possible
changes reliably and effectively (Finkelstein et al., 1988).
The 3rd and last phase of the database design procedure is physical
database design, wherein the designer chooses how the database will be
executed. The construction of a logical structure for the database, that
explains relationships and enterprise restrictions, was the preceding part of
database design (Chowdary et al., 2008). Even though this structure has been
autonomous of the DBMS, this is designed around a certain data model,
like network, relational, or hierarchical. But, before we can construct the

physical database architecture, we should first determine the desired DBMS.
As a result, physical architecture is adapted to a certain DBMS. There is an
interplay between logical and physical design because changes made during
physical design to improve performance can modify the structure of the
logical data model (Dey et al., 1999).
The basic objective of physical database design is to determine the
physical implementation of the logical database architecture. The relational
model implies the following (Carroll, 1987):
• generating a collection of relational tables and limitations based
on the data supplied in the logical data model.
• Assessing the data’s unique storage architecture and access
techniques to optimize the database system’s efficiency; and
• developing the system’s security protection.
For the following 3 primary reasons, logical, and conceptual database
design for bigger systems must preferably be kept distinct from physical
design (Ling & Dobbie, 2004):
• It focuses on a particular topic – the what rather than the how.
• This is carried out at a separate moment – the what should be
comprehended before the how may be decided.
• It necessitates a variety of abilities, which may be found in a
variety of persons.
The design of a database seems to be an iterative procedure, which
means that there is a beginning point and an almost never-ending procession
of improvements. To comprehend them better, see them as learning
processes. As the designers get a deeper grasp of the inner workings of the
company and the value of its data, and as they demonstrate their mastery of
the selected data models, this is feasible that the data gathered would need
revisions to other areas of the design (Kent & Williams, 1993). Particularly,
logical, and conceptual database designs are crucial to the accomplishment
of the system’s overarching goals. If the designs do not accurately depict
the enterprise, it would be exceptionally hard, when it is not impossible, to
specify all of the necessary user views or to keep the database’s integrity
intact. It can even be challenging to describe the physical execution and
keep the system running at a satisfactory level. On either side, one of the
hallmarks of successful database architecture is the capacity to adapt to
change. As a result, this is worthwhile to invest the time and effort required
to create the greatest possible design (Atassi et al., 2014).
REFERENCES
1. Al-Kodmany, K., (1999). Using visualization techniques for enhancing
public participation in planning and design: Process, implementation,
and evaluation. Landscape and Urban Planning, 45(1), 37–45.
2. Álvarez-Romero, J. G., Mills, M., Adams, V. M., Gurney, G. G.,
Pressey, R. L., Weeks, R., & Storlie, C. J., (2018). Research advances
and gaps in marine planning: Towards a global database in systematic
conservation planning. Biological Conservation, 227, 369–382.
3. Arnold, R. D., & Wade, J. P., (2015). A definition of systems thinking:
A systems approach. Procedia Computer Science, 44, 669–678.
4. Atassi, N., Berry, J., Shui, A., Zach, N., Sherman, A., Sinani, E., &
Leitner, M., (2014). The PRO-ACT database: Design, initial analyses,
and predictive features. Neurology, 83(19), 1719–1725.
5. Bedogni, A., Fusco, V., Agrillo, A., & Campisi, G., (2012). Learning
from experience. Proposal of a refined definition and staging system
for bisphosphonate-related osteonecrosis of the jaw (BRONJ). Oral
Diseases, 18(6), 621.
6. Carroll, J. M., (1987). Strategies for extending the useful lifetime of
DES. Computers & Security, 6(4), 300–313.
7. Chen, C. H., Khoo, L. P., & Yan, W., (2005). PDCS—a product
definition and customization system for product concept development.
Expert Systems with Applications, 28(3), 591–602.
8. Chin, F. Y., & Ozsoyoglu, G., (1981). Statistical database design. ACM
Transactions on Database Systems (TODS), 6(1), 113–139.
9. Chowdary, V. M., Chandran, R. V., Neeti, N., Bothale, R. V., Srivastava,
Y. K., Ingle, P., & Singh, R., (2008). Assessment of surface and sub-
surface waterlogged areas in irrigation command areas of bihar state
using remote sensing and GIS. Agricultural Water Management, 95(7),
754–766.
10. Dey, D., Storey, V. C., & Barron, T. M., (1999). Improving database
design through the analysis of relationships. ACM Transactions on
Database Systems (TODS), 24(4), 453–486.
11. El-Mehalawi, M., & Miller, R. A., (2003). A database system of
mechanical components based on geometric and topological similarity.
Part I: Representation. Computer-Aided Design, 35(1), 83–94.
12. Finkelstein, S., Schkolnick, M., & Tiberio, P., (1988). Physical database
design for relational databases. ACM Transactions on Database
Systems (TODS), 13(1), 91–128.
13. Franco‐Santos, M., Kennerley, M., Micheli, P., Martinez, V., Mason,
S., Marr, B., & Neely, A., (2007). Towards a definition of a business
performance measurement system. International Journal of Operations
& Production Management, 1, 2–9.
14. Gacek, C., Abd-Allah, A., Clark, B., & Boehm, B., (1995). On the
definition of software system architecture. In: Proceedings of the First
International Workshop on Architectures for Software Systems (Vol. 1,
pp. 85–94).
15. Hall, A. D., & Fagen, R. E., (2017). Definition of system. In: Systems
Research for Behavioral Science Systems Research (Vol. 1, pp. 81–92).
Routledge.
16. Haslum, P., Botea, A., Helmert, M., Bonet, B., & Koenig, S., (2007).
Domain-independent construction of pattern database heuristics for
cost-optimal planning. In: AAAI (Vol. 7, pp. 1007–1012).
17. Hernandez, M. J., (2013). Database Design for Mere Mortals: A
Hands-on Guide to Relational Database Design. Pearson Education.
18. Hernandez, M. J., (2013). Database Design for Mere Mortals: A
Hands-on Guide to Relational Database Design (Vol. 1, pp. 2–6).
Pearson Education.
19. Johansson, Å., Skeie, Ø. B., Sorbe, S., & Menon, C., (2017). Tax
Planning by Multinational Firms: Firm-Level Evidence from a Cross-
Country Database, 1, 2–5.
20. Junk, M., (1998). Domain of definition of Levermore’s five-moment
system. Journal of Statistical Physics, 93(5), 1143–1167.
21. Kent, A., & Williams, J. G., (1993). Encyclopedia of Microcomputers:
Multistrategy Learning to Operations Research: Microcomputer
Applications (Vol. 1, pp. 2–9). CRC Press.
22. Klochkov, Y., Klochkova, E., Antipova, O., Kiyatkina, E., Vasilieva, I.,
& Knyazkina, E., (2016). Model of database design in the conditions of
limited resources. In: 2016 5th International Conference on Reliability,
Infocom Technologies and Optimization (Trends and Future Directions)
(ICRITO) (Vol. 1, pp. 64–66). IEEE.
23. Knight, J. C., Strunk, E. A., & Sullivan, K. J., (2003). Towards a
rigorous definition of information system survivability. In: Proceedings
DARPA Information Survivability Conference and Exposition (Vol. 1,

pp. 78–89). IEEE.
24. Lantos, P. L., (1998). The definition of multiple system atrophy:
A review of recent developments. Journal of Neuropathology and
Experimental Neurology, 57(12), 1099.
25. Letkowski, J., (2015). Doing database design with MySQL. Journal of
Technology Research, 6, 1.
26. Ling, T. W., & Dobbie, G., (2004). Semistructured Database Design
(Vol. 1, pp. 2–9). Springer Science & Business Media.
27. Moe, S., Drüeke, T., Cunningham, J., Goodman, W., Martin, K., Olgaard,
K., & Eknoyan, G., (2006). Definition, evaluation, and classification
of renal osteodystrophy: A position statement from kidney disease:
Improving global outcomes (KDIGO). Kidney International, 69(11),
1945–1953.
28. Naiburg, E., Naiburg, E. J., & Maksimchuck, R. A., (2001). UML for
Database Design (Vol. 1, pp. 2–4). Addison-Wesley Professional.
29. Navathe, S., Ceri, S., Wiederhold, G., & Dou, J., (1984). Vertical
partitioning algorithms for database design. ACM Transactions on
Database Systems (TODS), 9(4), 680–710.
30. Navathe, S., Elmasri, R., & Larson, J., (1986). Integrating user views
in database design. Computer, 19(01), 50–62.
31. Noor, A. M., Alegana, V. A., Gething, P. W., & Snow, R. W., (2009). A
spatial national health facility database for public health sector planning
in Kenya in 2008. International Journal of Health Geographics, 8(1),
1–7.
32. O’Neill, J. P., Brimer, P. A., Machanoff, R., Hirsch, G. P., & Hsie, A. W.,
(1977). A quantitative assay of mutation induction at the hypoxanthine-
guanine phosphoribosyl transferase locus in Chinese hamster ovary
cells (CHO/HGPRT system): Development and definition of the
system. Mutation Research/Fundamental and Molecular Mechanisms
of Mutagenesis, 45(1), 91–101.
33. Rao, J., Zhang, C., Megiddo, N., & Lohman, G., (2002). Automating
physical database design in a parallel database. In: Proceedings of the
2002 ACM SIGMOD International Conference on Management of
Data (Vol. 1, pp. 558–569).
34. Ruparelia, N. B., (2010). Software development lifecycle models.
ACM SIGSOFT Software Engineering Notes, 35(3), 8–13.
35. Schalock, R. L., & Luckasson, R., (2004). American association on

mental retardation’s definition, classification, and system of supports
and its relation to international trends and issues in the field of
intellectual disabilities. Journal of Policy and Practice in Intellectual
Disabilities, 1(3, 4), 136–146.
36. Schema, C., (1995). Relational Database Design (Vol. 1, pp. 2–9).
Prentice Hall Austria.
37. Shneiderman, B., & Plaisant, C., (2010). Designing the User Interface:
Strategies for Effective Human-Computer Interaction (Vol. 1, pp. 2, 3).
Pearson Education India.
38. Sievers, S., Ortlieb, M., & Helmert, M., (2012). Efficient implementation
of pattern database heuristics for classical planning. In: International
Symposium on Combinatorial Search (Vol. 3, No. 1, pp. 1–5).
39. Skaar, C., Lausselet, C., Bergsdal, H., & Brattebø, H., (2022). Towards
a LCA database for the planning and design of zero-emissions
neighborhoods. Buildings, 12(5), 512.
40. Taylor, Jr. F. B., Toh, C. H., Hoots, K. W., Wada, H., & Levi, M.,
(2001). Towards definition, clinical and laboratory criteria, and a
scoring system for disseminated intravascular coagulation. Thrombosis
and Haemostasis, 86(11), 1327–1330.
41. Teng, J. T. C., & Grover, V., (1992). Factors influencing database
planning: An empirical study. Omega, 20(1), 59–72.
42. Teng, J. T., & Grover, V., (1992). An empirical study on the determinants
of effective database management. Journal of Database Management
(JDM), 3(1), 22–34.
43. Tetlay, A., & John, P., (2009). Determining the Lines of System
Maturity, System Readiness and Capability Readiness in the System
Development Lifecycle, 1, 2–6.
44. Weitzel, J. R., & Kerschberg, L., (1989). Developing knowledge-
based systems: Reorganizing the system development life cycle.
45. Westmark, V. R., (2004). A definition for information system
survivability. In: 37th Annual Hawaii International Conference on
System Sciences, 2004: Proceedings of the (Vol. 1, p. 10). IEEE.
46. Wiederhold, G., (1983). Database Design (Vol. 1077, pp. 4–9). New
York: McGraw-Hill.
47. Zilio, D. C., Rao, J., Lightstone, S., Lohman, G., Storm, A., Garcia-
Arellano, C., & Fadden, S., (2004). DB2 design advisor: Integrated
automatic physical database design. In: Proceedings of the Thirtieth
International Conference on Very Large Data Bases (Vol. 1, pp. 1087–
1097).
CHAPTER 6
DATA MANIPULATION
CONTENTS
6.1. Introduction..................................................................................... 160
6.2. Introduction to SQL......................................................................... 160
6.3. Writing SQL Commands.................................................................. 165
6.4. Data Manipulation........................................................................... 167
References.............................................................................................. 173
6.1. INTRODUCTION
We provided an in-depth description of such relational models as well
as relational languages. Query Language, as well as SQL which is more
generally known, is a special language which has arisen as a result of the
evolution of the relational paradigm. In the recent past, SQL has emerged
as the dominant language for relational database management systems
(RDBMSs). The Structured Query Language (SQL) specification was first
developed in 1986 first by American National Standards Institute (ANSI),
and it has been later approved for use internationally and accepted in
1987 even by International Organization for Standardization (ISO) (ISO,
1987) (Warnes et al., 2014). SQL is presently supported by more than one
hundred Database Systems, and it can operate on a wide variety of hardware
platforms, including personal computers to mainframes. Even though SQL
is now rather important, we do not make an effort to cover all aspects of
the language because this particular standard is somewhat complicated. In
this chapter, we will concentrate on the expressions of a language that are
responsible for data modification (Chatterjee & Segev, 1991).
Figure 6.1. A graphical example of the manipulation of data files.
Source; https://www.solvexia.com/blog/5-top-tips-for-data-manipulation.
6.2. INTRODUCTION TO SQL

In this part, we’ll go over the goals of SQL, provide a brief history of a
language, and then explain why it’s so vital for database systems (Kim,
1992).
Data Manipulation 161
Figure 6.2. SQL: A basic overview.
Source: https://www.javatpoint.com/dbms-sql-introduction.
6.2.1. Objectives of SQL

Relational languages must ideally enable a user to (Melton & Simon, 2001):
• construct the database and relational structures;
• fundamental data analysis activities, including such data input,
update, and deleting from relationships; and
• Both basic and complicated queries may be run.
A computer language’s command and control and syntax should be
reasonably simple to learn and must fulfill these functions with little user
effort. Lastly, the languages should be transportable, that is, they must adhere
to some widely accepted standards such that we may utilize the very same
central command as well as syntax from one database management system
(DBMS) to the next. SQL was created to meet these criteria (Emerson et al.,
1989).
SQL would be an example of a transform-oriented language, which is
a language intended to convert data into desired outputs using relations.
An ISO SQL standard includes two key components like a language (Lans,
2006):
• A Data Definition Language (DDL) for establishing database

schema and regulating access to data;
• Data Manipulation Language (DML) is a programming language
for accessing and altering data.
SQL only had these operational definitions and manipulation instructions
before SQL:1999; it didn’t include flows of control systems like IF. THEN.
ELSE, GO TO, or DO. WHILE. These have to be programmed in coding or
career language or performed interactively based just on the user’s choices.
SQL may be utilized in two ways because of that lack of logical correctness.
The first method would be to use SQL manually by typing commands into a
terminal. So second option is to use a practical language to incorporate SQL
queries (Ali, 2010).
SQL is just a language that is reasonably simple to learn (Kofler, 2005):
• That’s a non-procedural language in which you express what data
you need instead of how to acquire it. To put it another way, SQL
doesn’t need you to define the database access techniques.
• SQL, like other current languages, is largely free-format, meaning
that portions of statements don’t have to be entered in certain
positions on the display.
The instruction structure is made out of common English terms like
CREATE INSERT, TABLE and SELECT. Consider the following scenario
(Kumar et al., 2014):
– CREATE TABLE Staff (staff VARCHAR(5), name VARCHAR(15),
salary DECIMAL(7,2));
– INSERT INTO Staff VALUES (‘SG16,’ ‘Brown,’ 8300);
– SELECT staff, lName, salary
FROM Staff
WHERE salary > 10000;
Database administrators (DBAs), management employees, software
developers, as well as a variety of other sorts of end-users may all utilize
SQL (de Haan, 2005).
The SQL language currently has an international standard, establishing
it as the official and de facto industry-standard language for designing and
using database systems (ISO, 1992,1999a) (Kriegel, 2011).
6.2.2. History of SQL

The foundational article by E. F. Codd, when serving at IBM’s Research
Laboratory in San José, began the evolution of the relational data model
(and consequently SQL) (Codd, 1970). D. Chamberlin, another IBM San
José Research employee, established the Organized English Programming
Language, as well as SEQUEL, in 1974. Around 1976, an updated version
called SEQUEL/2 was introduced, however, the name was later altered
to SQL for legal concerns (Chamberlin and Boyce, 1974; Chamberlin et
al., 1976). Although the formal pronunciation is ‘S-Q-L,’ most individuals
commonly interpret SQL as ‘See-Quel’ (Harkins & Reid, 2002).
IBM developed System R, a prototype DBMS centered on SEQUEL/2
(Astrahan et al., 1976). The goal of this sample was to see whether the
relational data model was feasible. One of the most noteworthy achievements
credited to this project, apart from its other accomplishments, was the
invention of SQL. The system SQUARE (Clarifying Inquiries as Regional
Expressions), which predates the Systems R project, is where SQL gets its
start. SQUARE was created as an experimental language to use with English
phrases to achieve relational algebra (Boyce et al., 1975).
Database management Oracle, developed by what became known as the
Oracle Corporation in the late 1970s, was likely the very first commercialized
version of such a relational DBMS depending on SQL. Soon after, INGRES
came up with QUEL, a programming language that, while more organized
than SQL, is a little less English-like. INGRES is transformed into a
SQL-based DBMS when SQL became the primary database language of
distributed databases. From 1981 to 1982, IBM released SQL/ DS, the very
first professional RDBMS, for such DOS/VSE and VM/CMS systems,
accordingly, but also DB2 for such VMS systems in 1983 (Haan et al., 2009).
Following an IBM project proposal, the ANSI started development on
even a Relational Database Language (RDL) in 1982. In 1983, ISO entered
the effort, and the two groups collaborated to create SQL standards. (In
1984, the moniker RDL was discarded, and the proposed standard returned
to a format very much like to current SQL versions.) The first ISO standard,
which was released in 1987, received a lot of flak. Important elements
including relational model requirements and some relational operations,
according to Date, an outstanding researcher in this field, were missing
(Gorman, 2014; Leng & Terpstra, 2010). He also noted how the language
was immensely repetitive, meaning that the identical question could be
written in several ways (Date, 1986, 1987a, 1990). Most of the complaint was
genuine, and the standards organizations had acknowledged this before the
guideline was issued. Therefore, it was considered that releasing a standard
as soon as possible to create a common basis wherein the languages and
implementation might evolve was more essential than waiting until all of the
features that individuals thought should be included could be described and
agreed upon (Willis, 2003).
An ‘Integrity Improvement Feature’ was specified in an ISO supplement
released in 1989. (ISO, 1989). Around 1992, this ISO standard underwent
its first significant change, known as SQL2 and SQL-92 (ISO, 1992).
Even though some capabilities were described for the very first time in the
standards, several of them would have previously been incorporated in one
or both of the numerous SQL versions, either part or even in a comparable
manner. The following version of the standards generally referred to only
as SQL:1999, did not become official until 1999. (ISO, 1999a). Further
capabilities for entity data processing have been included in this edition. In
late 2003, a new version of SQL, SQL:2003, was released (Roof & Fergus,
2003).
Additions are vendor-provided features that go above and beyond
the standards. Using data inside a SQL Server database, for instance, that
standard that provides six alternative data types. Several implementations
add a variety of enhancements to this list. A vernacular is a term used to
describe how SQL is implemented. There are no two precisely identical
languages, and no dialect presently fits the ISO standard. Furthermore, when
database manufacturers add new features, they are extending their SQL
languages and separating them still further (Jesse, 2018).
A SQL language’s essential core, on the other hand, is exhibiting
symptoms of standardization. To claim conformity with the SQL:2003
standards, a supplier must provide a set of capabilities known as Core SQL.
Several of the remaining points are separated into bundles; for instance,
product capabilities and OLAP were separated into bundles (Operational
Logical Processing). Even though SQL was created by IBM, its prominence
prompted other companies to develop their versions. There are probably
hundreds of SQL-based solutions available nowadays, including new ones
being released regularly (El Agha et al., 2018).
6.2.3. Importance of SQL

SQL will be the first and, thus far, most widely accepted specific database
language. The Network Database Languages (NDL), depending on the
CODASYL network structure, are also the only mainstream database

languages (Thomas et al., 1977). Almost every major contemporary vendor
offers SQL-based or SQL-interfaced database solutions, and almost all are
included in just several of the standard-setting organizations. Both suppliers
and consumers have made significant investments in the SQL language.
Many big and prominent groups, such as the X/OPEN collaboration for
UNIX standard, have adopted this as part of their structural arrangements,
including IBM’s Systems Application Architecture (SAA) (Der Lans, 2007).
SQL has also been designated as a Federal Information Processing
Standard (FIPS), with compliance required for any DBMS sales to the US
administration. A group of companies known as the SQL Includes Pre created
a set of modifications to SQL which would allow for compatibility across
different systems. As just an operational definitions tool, SQL is utilized in
those other specifications and even impacts the creation of other standards
(Rose, 1989). The ISO’s Information Resource Dictionary System (IRDS)
and Remote Data Access (RDA) standards are two examples. A research
study on the language’s progress is strong, also supplying a theoretical
foundation for such language as well as the tools required to properly
implement it. This is particularly true when it comes to query processing,
data dissemination, and protection. There are currently specialized SQL
variants aimed at new industries, like OnLine Analytical Processing (OLAP)
(Ntagwabira & Kang, 2010).
6.2.4. Terminology
The phrases tables, rows, and columns are used in the ISO Software system
rather than the formal terminology relationships, properties, and tuples.
We generally utilize the ISO nomenclature in our SQL presentations. SQL
also deviates from the structural model’s specification. SQL, for instance,
enables duplicated rows within a table created by the SELECT command,
enforces column sorting, and allows users to organize the rows in a results
section (Duncan, 2018).
6.3. WRITING SQL COMMANDS

We’ll go through the architecture of a SQL query and the syntax we are using
to specify the syntax of the different SQL operations throughout this section.
Protected words and consumer terms make up a SQL query. Protected words
are a permanent element of a SQL language with a specific meaning. They
can’t be divided over paragraphs and should be spelt precisely as necessary.

User-defined terms are composed either by the user (following specific
grammar requirements) or reflect the names of different database objects,
including columns, tables, views, and indexes (Ma et al., 2019).
A collection of syntactic rules governs the construction of words in the
sentence. However, it is not required by the standard, several variants of
SQL do need and use of a transaction terminator (typically a semicolon ‘;’)
to indicate the conclusion of each SQL statement. Many SQL query elements
are generally case sensitive, meaning letters may be written in either top
or bottom case. Actual textual data should be written precisely as it exists
inside the database, which is the only relevant exception for the principle. If
we save a user’s surname as ‘SMITH’ and subsequently searched for it with
the phrase ‘Smith,’ that row would not be discovered (Kearns et al., 1977).
Because SQL is an unrestricted language, capitalization, and process
defining make a SQL statement or series of queries more accessible. Consider
the following scenario (Saisanguansat & Jeatrakul, 2016):
• A statement’s clauses should all start on such a new line;
• Every clause’s starting should be aligned with the beginnings of
subsequent clauses; and
• When a clause contains many parts, which are on its line and
recessed at the beginning of the sentence to show the connection.
We utilize the extended version of a Backus Naur Form (BNF) syntax
to construct SQL commands over this and the following section (Julavanich
et al., 2019):
• Reserved terms are represented by upper-case characters and
must be spelt precisely as indicated.
• Consumer terms are represented by smaller letters.;
• A vertical bar () denotes a choice between two options, such as
a, b, and c;
• A needed element is indicated by curly brackets, such as a; and
• Optional elements are indicated by square brackets, such as [a].
The ellipsis (…) can be used to show that an item may be repeated zero
or more times.
Consider the following scenario (Otair et al., 2018):
{a | b} (, c . . .)
meaning either a kind or b, detached by commas, followed by zero or even

more repetition of c.
In reality, DDL instructions are being used to build the database schema
(tables) and accessing methods (what every user may legally view), and
afterwards, DML instructions are often used to fill and search the tables.
Consequently, we offer the DML instructions well before DDL comments in
this chapter to emphasize the relevance of DML expressions to the average
user (Batra, 2018).
6.4. DATA MANIPULATION

This section examines the SQL DML instructions, specifically
(Piyayodilokchai et al., 2013):
• SELECT – to do a MySQL database;
• INSERT – to enter information into such a table;
• UPDATE – to make changes to database table; and
• DELETE – to remove information from a table.
We concentrate most of this chapter on the Group by clause and its
different variants because of its intricacy and also the relative ease of all
the other DML statements. We start with basic searches and gradually add
difficulty to demonstrate how more sophisticated queries using sorting,
combining, aggregation, and inquiries on several tables may be created. The
INSERT, UPDATE, and DELETE procedures are discussed after the chapter
(Sarhan et al., 2017).
The SQL commands are shown using corresponding tables from the
Provided study case:
Divide (branchNo, highway, urban, postcode)
Personnel (staffNo, fName, lName, location, gender, DOB, income,
branchNo)
(propertyNo, road, metropolis, postcode, kind, rooms, rent, ownerNo,
staffNo, branchNo) PropertyForRent (propertyNo, street, city, postcode,
type, lodgings, rent, ownerNo, staffNo, branchNo)
a customer (clientNo, fName, lName, telNo, prefType, maxRent)
Literals: C When we go into the SQL DML instructions, it’s important
to understand what the literals are. Constants used in SQL commands are
known as literals. For each type of data handled by SQL, there are multiple
types of regular expressions. However, we may differentiate between lambda
expressions that are encased in quotation marks and those that are not for
the sake of simplicity. Only one quotation must be used to encapsulate any
non-numeric data; single quotations are not required for integer data values.
We might utilize literally to enter data into the table, for instance (Zhang H.
& Zhang X., 2018):
INSERTINTO PropertyForRent (propertyNo, road, municipal, postcode,
kind, lodgings, payment, staffNo, ownerNo, branchNo)
VALUES (‘PA14,’ ‘16 Holhead,’ ‘Aberdeen,’ ‘AB7 5SU,’ ‘House,’ 6,
650.00, ‘CO46,’ ‘SA9,’ ‘B007’);
The amount in column rooms is just an integer literal, while the amount
in column price is just a decimal system literal; neither is contained in quote
marks. The remaining sections are all text strings contained in quote marks
(Kline et al., 2008).
Simple Queries: The SELECT declaration is used to get data through
one or maybe more MySQL database and show it. It’s a strong command that
combines the Selecting, Projector, and Join procedures of relational calculus
into a single sentence. The most often used SQL function is SELECT, which
will have the following specific form (Budiman et al., 2017):
TableName is the name of the appropriate database particular table to

which you have direct exposure, and pseudonym is an entirely voluntary
abbreviated form for TableName. column expression reflects a new column
or appearance, TableName seems to be the name of such an appropriate
database particular table to which you have direct exposure, as well as alias
is an entirely voluntary abbreviated form for TableName. In such a SELECT
statement, the following steps are followed (Haan et al., 2009):
FROM indicates which table(s) will be utilized (Morton et al., 2010)
WHERE Sieves the rows based on a criterion.
GROUP BY rows through the similar column worth are grouped together
HAVING filters the categories based on a set of criteria
SELECT defines the columns that will be shown in the production

ORDER BY defines the production output order
Within the SELECT statement, the sequence of the variables cannot be
modified. The very first two sentences, SELECT, and FROM, are the only
ones that must be used; the rest are optional. The SELECT procedure is
complete: the output of a database inquiry is yet another table. As we’ll see,
there are several variants of this phrase (Piyayodilokchai et al., 2011).
6.4.1. Example #1 Retrieve All Columns, All Rows
List full particulars of all staff.

Later there remain no limitations stated in this inquiry, the WHERE section
is needless
and all columns are essential. We inscribe this inquiry as:
SELECT fName, lName, staffNo, DOB, sex, salary, position, branchNo
FROM Staff;
Because many SQL reconstructions need some of a table’s column, an
asterisk (*) instead of column names is a simple method to indicate ‘all
columns’ in SQL. This question may also be expressed in the following
fashion, which is similar and shortened (Kemalis & Tzouramanis, 2008):
SELECT *
FROM Staff;
Table 6.1 shows the outcome table in both cases.
Table 6.1. Table of Results for Example #1

6.4.2. Example # 2 Retrieve Specific Columns, All Rows

Make a list among all employees’ pay, including simply their employee
number, primary, and previous names, and income information.
SELECT salary, fName,staffNo, lName,
FROM Staff;
Inside this example, a database table is generated using Staff that only
contains the columns staffNo, fName, lName, and salary, in the order
requested. Table 6.2 shows the outcome of this procedure. The rows within
the results section may not have been sorted unless otherwise specified. Most
DBMSs sort the bring success by one or even more columns (for instance,
Microsoft Office Admission would category this outcome table founded on
the main key staffNo). In the following part, we’ll show you how and where
to sort individual rows of a results section (Renaud & Biljon, 2004).
6.4.3. Example # 3 Use of Distinct

Make a tilt of completely of the possessions facts that have been seen (Arcuri
& Galeotti, 2019).
SELECT property
FROM Viewing;
Table 6.3 displays the results. Although SELECT doesn’t erase duplicates
when projecting across one or even additional columns, unlike with the
association algebra Projecting procedure, there are many multiple copies.
We utilize the DISTINCT keywords to get rid of the duplicates. Rephrase
the question as follows (Rankins et al., 2010):
SELECT DISTINCT property
FROM Viewing;
After removing the copies, we receive the result table presented in Table
6.4.
Table 6.3. With Duplicates, the Outcome Table for Example #3
Table 6.4. Duplicates are Removed from the Outcome Table for Example #3
6.4.4. Example # 4 Calculated Fields

Create a list of monthly income for all employees, including the employee
number, initial, and previous names, and payment information (Phewkum
et al., 2019).
SELECT staffNo, fName, lName, salary/12
FROM Staff;
And that therefore monthly wages are needed, this query is now almost
similar to Example #2. In this scenario, just divide the pay by 12 to get the
required result, as shown in Table 6.5 (Bagui et al., 2006).
That’s an illustration of a computed field in action (occasionally termed a
calculated or resulting ground). In general, you provide a SQL statement
in the SELECT list to utilize a computed column. Adding, reducing,
multiplying, and dividing are all possible SQL statements, and parenthesis
may be utilized to create more complicated expressions. A computed column
may include several table columns; although, the columns referred to in
such an algebraic statement must be of the numeric category (Sadiq et al.,
2004). col4 has been printed as the fourth column of the intended values.
Usually, the term of a column confidential in the result table is named of
the equivalent column in the data frame in which it was fetched. SQL, on
the other hand, has no idea how to name the column in just this scenario.
In certain languages, the column is given a name that corresponds to its
location in the table (Gini, 2008).
(For instance, col4); others may be left the column name vacant and use
the SELECT list statement. The AS phrase is allowed by the Standard
specifications for naming the column. We might have published: in the
preceding example (Cosentino et al., 2015):
SELECT fName, lName, staffNo, salary/12 AS monthly salary
FROM Staff;
Row selection (WHERE clause)
The SELECT command is used to obtain all rows from such a table
in the examples below. Furthermore, we often have to limit the number
of rows that are returned. The WHERE clause does this by combining the
term WERE with a searching disorder that defines the rows to be fetched.
The following stand the five fundamental search criteria (or conditional
statements in ISO terminology) (Zhang et al., 2009):
• Comparison: Examine the validity of one statement in
comparison to the worth of the alternative.
• Range: Check if the value of an appearance is inside a certain
range of possibilities.
• Set Membership: Check if an expression’s factor is close to one
with a collection of values.
• Pattern Match: Check to see whether a filament fits a pattern.
• Null: Check for null (unknown) values in a column.
REFERENCES
1. Ali, M., (2010). An introduction to Microsoft SQL server stream insight.
In: Proceedings of the 1st International Conference and Exhibition on
Computing for Geospatial Research & Application (Vol. 1, pp. 1–1).
2. Arcuri, A., & Galeotti, J. P., (2019). SQL data generation to enhance
search-based system testing. In: Proceedings of the Genetic and
Evolutionary Computation Conference (Vol. 1, pp. 1390–1398).
3. Bagui, S., Bagui, S. S., & Earp, R., (2006). Learning SQL on SQL
server 2005 (Vol. 1, pp. 2–9). O’Reilly Media, Inc.
4. Batra, R., (2018). SQL Primer (Vol. 1, pp. 183–187). Apress, Berkeley,
CA.
5. Budiman, E., Jamil, M., Hairah, U., & Jati, H., (2017). Eloquent object
relational mapping models for biodiversity information system. In:
2017 4th International Conference on Computer Applications and
Information Processing Technology (CAIPT) (Vol. 1, pp. 1–5). IEEE.
6. Chatterjee, A., & Segev, A., (1991). Data manipulation in heterogeneous
databases. ACM SIGMOD Record, 20(4), 64–68.
7. Cosentino, V., Izquierdo, J. L. C., & Cabot, J., (2015). Gitana: A
SQL-based git repository inspector. In: International Conference on
Conceptual Modeling (Vol. 1, pp. 329–343). Springer, Cham.
8. De Haan, L., (2005). Introduction to SQL, i SQL* plus, and SQL*
plus. Mastering Oracle SQL and SQL* Plus, 1, 25–64.
9. Der, L. V., (2007). Introduction to SQL: Mastering the Relational
Database Language, 4/E (With Cd) (Vol. 1, pp. 3–9). Pearson Education
India.
10. Duncan, S., (2018). Practical SQL. Software Quality Professional,
20(4), 64–64.
11. El Agha, M. I., Jarghon, A. M., & Abu-Naser, S. S., (2018). SQL Tutor
for Novice Students (Vol. 1, pp. 4–8).
12. Emerson, S. L., Darnovsky, M., & Bowman, J., (1989). The Practical
SQL Handbook: Using Structured Query Language (Vol. 1, pp. 2–6).
Addison-Wesley Longman Publishing Co., Inc.
13. Gini, R., (2008). Stata tip 56: Writing parameterized text files. Stata
Journal, 8(199-2016-2514), 134–136.
14. Gorman, T., (2014). Introduction to SQL and SQL developer. In:
Beginning Oracle SQL (Vol. 1, pp. 23–58). A press, Berkeley, CA.
15. Haan, L. D., Fink, D., Gorman, T., Jørgensen, I., & Morton, K., (2009).
Introduction to SQL, SQL* plus, and SQL developer. In: Beginning
Oracle SQL (Vol. 1, pp. 25–69). Apress.
16. Haan, L. D., Fink, D., Gorman, T., Jørgensen, I., & Morton, K., (2009).
Writing and automating SQL* plus scripts. In: Beginning Oracle SQL
(Vol. 1, pp. 287–327). Apress.
17. Harkins, S. S., & Reid, M. W., (2002). Introduction to SQL server. In:
SQL: Access to SQL Server (Vol. 1, pp. 307–370). Apress, Berkeley,
CA.
18. Jesse, G., (2018). SQL: An introduction to SQL lesson and hands-on
lab. In: Proceedings of the EDSIG Conference ISSN (Vol. 2473, p.
3857).
19. Julavanich, T., Nalintippayawong, S., & Atchariyachanvanich, K.,
(2019). RSQLG: The reverse SQL question generation algorithm. In:
2019 IEEE 6th International Conference on Industrial Engineering and
Applications (ICIEA) (Vol. 1, pp. 908–912). IEEE.
20. Kearns, R., Shead, S., & Fekete, A., (1997). A teaching system for
SQL. In: Proceedings of the 2nd Australasian Conference on Computer
Science Education (Vol. 1, pp. 224–231).
21. Kemalis, K., & Tzouramanis, T., (2008). SQL-IDS: A specification-
based approach for SQL-injection detection. In: Proceedings of the
2008 ACM Symposium on Applied Computing (Vol. 1, pp. 2153–2158).
22. Kim, W., (1992). Introduction to SQL/X. In: Future Databases’ 92
(Vol. 1, pp. 2–7).
23. Kline, K., Kline, D., & Hunt, B., (2008). SQL in a Nutshell: A Desktop
Quick Reference Guide (Vol. 1, pp. 1–4). O’Reilly Media, Inc.
24. Kofler, M., (2005). An introduction to SQL. The Definitive Guide to
MySQL5 (2nd edn., pp. 189–216).
25. Kriegel, A., (2011). Discovering SQL: A Hands-on Guide for Beginners
(Vol. 1, pp. 5–7). John Wiley & Sons.
26. Kumar, R., Gupta, N., Charu, S., Bansal, S., & Yadav, K., (2014).
Comparison of SQL with HiveQL. International Journal for Research
in Technological Studies, 1(9), 2348–1439.
27. Lans, R. F. V. D., (2006). Introduction to SQL: Mastering the Relational
Database Language (Vol. 1, pp. 2–5). Addison-Wesley Professional.
28. Leng, C., & Terpstra, W. W., (2010). Distributed SQL queries with
bubblestorm. In: From Active Data Management to Event-Based
Systems and More (Vol. 1, pp. 230–241). Springer, Berlin, Heidelberg.
29. Ma, L., Zhao, D., Gao, Y., & Zhao, C., (2019). Research on SQL
injection attack and prevention technology based on web. In: 2019
International Conference on Computer Network, Electronic and
Automation (ICCNEA) (Vol. 1, pp. 176–179). IEEE.
30. Melton, J., & Simon, A. R., (2001). SQL: 1999: Understanding
Relational Language Components (Vol. 1, pp.6–9). Elsevier.
31. Morton, K., Osborne, K., Sands, R., Shamsudeen, R., & Still, J., (2010).
Core SQL. In: Pro Oracle SQL (Vol. 1, pp. 1–27). Apress.
32. Naeem, M. A., Ullah, S., & Bajwa, I. S., (2012). Interacting with data
warehouse by using a natural language interface. In: International
Conference on Application of Natural Language to Information
Systems (Vol. 1, pp. 372–377). Springer, Berlin, Heidelberg.
33. Ntagwabira, L., & Kang, S. L., (2010). Use of query tokenization to
detect and prevent SQL injection attacks. In: 2010 3rd International
Conference on Computer Science and Information Technology (Vol. 2,
pp. 438–440). IEEE.
34. Otair, M., Al-Sardi, R., & Al-Gialain, S., (2008). An Arabic retrieval
system with native language rather than SQL queries. In: 2008 First
International Conference on the Applications of Digital Information
and Web Technologies (ICADIWT) (Vol. 1, pp. 84–89). IEEE.
35. Phewkum, C., Kaewchaiya, J., Kobayashi, K., & Atchariyachanvanich,
K., (2019). Scramble SQL: A novel drag-and-drop SQL learning
tool. In: 2019 23rd International Computer Science and Engineering
Conference (ICSEC) (Vol. 1, pp. 340–344). IEEE.
36. Piyayodilokchai, H., Panjaburee, P., Laosinchai, P., Ketpichainarong,
W., & Ruenwongsa, P., (2013). A 5E learning cycle approach–based,
multimedia-supplemented instructional unit for structured query
language. Journal of Educational Technology & Society, 16(4), 146–
159.
37. Piyayodilokchai, H., Ruenwongsa, P., Ketpichainarong, W., Laosinchai,
P., & Panjaburee, P., (2011). Promoting students’ understanding of
SQL in a database management course: A learning cycle approach.
International Journal of Learning, 17(11), 2–6.
38. Rankins, R., Bertucci, P., Gallelli, C., & Silverstein, A. T., (2010).
Microsoft SQL Server 2008 R2 Unleashed (Vol. 1, pp. 2–9). Pearson
Education.
39. Renaud, K., & Biljon, J. V., (2004). Teaching SQL—Which pedagogical
horse for this course?. In: British National Conference on Databases
40. Roof, L., & Fergus, D., (2003). Introduction to SQL Server CE. In:
The Definitive Guide to the. NET Compact Framework (Vol. 1, pp.
453–481). Apress, Berkeley, CA.
41. Rose, J. A., (1989). Introduction to SQL, by Rick F. Van Der Lans,
Addison-Wesley Publishing Company, Wokingham, England,
348pages including index, 1988 (£ 16.95). Robotica, 7(4), 365–366.
42. Sadiq, S., Orlowska, M., Sadiq, W., & Lin, J., (2004). SQLator: An
online SQL learning workbench. In: Proceedings of the 9th Annual
SIGCSE Conference on Innovation and Technology in Computer
Science Education (Vol. 1, pp. 223–227).
43. Saisanguansat, D., & Jeatrakul, P., (2016). Optimization techniques for
PL/SQL. In: 2016 14th International Conference on ICT and Knowledge
Engineering (ICT&KE) (Vol. 1, pp. 44–48). IEEE.
44. Sarhan, A. A., Farhan, S. A., & Al-Harby, F. M., (2017). Understanding
and discovering SQL injection vulnerabilities. In: International
Conference on Applied Human Factors and Ergonomics (Vol. 1, pp.
45–51). Springer, Cham.
45. Thomas J. Watson IBM Research Center. Research Division, & Denny,
G. H., (1977). An introduction to SQL, a Structured Query Language,
(Vol. 1, pp.4–9).
46. Warnes, G. R., Bolker, B., Gorjanc, G., Grothendieck, G., Korosec, A.,
Lumley, T., & Rogers, J., (2014). gdata: Various R programming tools
for data manipulation. R Package Version, 2(3), 35.
47. Willis, T., (2003). Introduction to SQL Server 2000. In: Beginning SQL
Server 2000 for Visual Basic Developers (Vol. 1, pp. 11–28). Apress,
Berkeley, CA.
48. Zhang, H., & Zhang, X., (2018). SQL injection attack principles
and preventive techniques for PHP site. In: Proceedings of the 2nd
International Conference on Computer Science and Application
Engineering (Vol. 1, pp. 1–9).
49. Zhang, Y., Xiao, Y., Wang, Z., Ji, X., Huang, Y., & Wang, S., (2009).
ScaMMDB: Facing challenge of mass data processing with MMDB.
In: Advances in Web and Network Technologies, and Information
Management (Vol. 1, pp. 1–12). Springer, Berlin, Heidelberg.
CHAPTER 7
DATABASE CONNECTIVITY AND WEB
TECHNOLOGIES
CONTENTS
7.1. Introduction..................................................................................... 180
7.2. Database Connectivity..................................................................... 180
7.3. Internet Databases........................................................................... 194
7.4. Extensible Markup Language........................................................... 204
References.............................................................................................. 208
7.1. INTRODUCTION
A database, as already known, is a central place for crucial corporate data.
Conventional commercial applications or emerging business channels like
the Internet and mobile devices such as smart phones might create this
data (Wahle et al., 2009). The data should be accessible to all commercial.
These users require data access through a variety of methods, including
spreadsheets, Visual Basic applications, Web front ends, Microsoft Access
reports and forms, and etc. The topologies that programs employ to link to
databases are discussed in this chapter (Coelho et al., 2011).
The Internet has altered the way businesses of all sizes function.
Purchasing products and services over the Web, for instance, has become
normal. Interconnectivity occurs in current’s environment not only among
an application and the database, but also among applications exchanging
messages and data (Baca et al., 2009). The Extensible Markup Language
(XML) standardizes the exchange of structured and unstructured data
amongst programs. Because the Web and databases are becoming more
intertwined, database experts must understand how to design, use, and
maintain Web interfaces to the databases (Migliore & Chinta, 2017).
7.2. DATABASE CONNECTIVITY

This refers to the connection and communication protocols between
application programs and data storage. Since it offers an interface amongst
the application program and database, database connection software is called
the database middleware. The data store, also called the data source, is the
data analysis application, like Oracle RDBMS, IBM DBMS, or SQL Server
DBMS that will be utilized to save the application program’s generated
data. In an ideal world, the data source or store might be hosted everywhere
and contain any sort of data. The source of data could be, for instance, a
relational database, a structured database, an Excel spreadsheet, or a textual
data file (Johnson, 2014).
It is impossible to stress the importance of traditional database
connectivity interfaces. A common database connecting interface is needed
to allow programs to connect to data sources, just like SQL has become
the dominant global data manipulation language. Database connectivity can
be accomplished in a variety of ways. Just the following interfaces will be
covered in this section (Ramakrishnan & Rao, 1997):
Database Connectivity and Web Technologies 181
a. Native SQL connectivity

b. OLE-DB (Object Linking and Embedding for Database)
developed by Microsoft.
c. ODBC (Open Database Connectivity), DAO (Data Access
Objects), and RDO (Remote Data Objects) developed by
Microsoft.
d. Sun’s JDBC (Java Database Connectivity).
e. Microsoft’s ADO.NET (ActiveX Data Objects).
Most of the interfaces are Microsoft products. Client apps, even so,
link to databases, and most of such applications operate on computers that
run any form of Microsoft Windows. The data communication interfaces
depicted here are market leaders, and they have the backing of many database
suppliers. In truth, ODBC, ADO.NET, and OLE-DB are the foundation of
Microsoft’s Universal Data Access (UDA) framework, a group of techniques
utilized to access and manipulate data from any kind of data source via a
single interface. Database connectivity interfaces of Microsoft have grown
over time: every interface builds on the previous one, delivering more
capability, capabilities, flexibility, and assistance (Hackathorn, 1993).
7.2.1. Native SQL Connectivity

Most DBMS manufacturers offer their own connection mechanisms for
their databases. Native SQL connectivity is the connection interface offered
by the database seller and exclusive to that seller. The Oracle RDBMS is
the best instance of this kind of native interface. To link the application of
a client to the Oracle database, the Oracle SQL*Net interface should be
installed and set up on the client machine. Figure 7.1 depicts the Oracle
SQL*NET interface settings on the PC of client (Ivanov & Carson, 2002).
Connectivity interfaces of native database are tailored to “their” database
management system (DBMS), and they provide access to the vast majority
of the database’s functions. Maintaining several native interfaces for various
databases, on the other hand, might be time consuming for the programmer.
As a result, global database connectivity is required. In most cases, the
seller’s connectivity interface of native database isn’t the only option to link
to the database; most modern DBMS solutions support various standards of
database connectivity, the most prevalent of which being ODBC (Brown et
al., 2012).
7.2.2. ODBC, RDO, and DAO

Open Database Connectivity (ODBC), created by Microsoft in the early
nineties, is a specialized version of the SQL Access Group CLI (Call
Level Interface) standard for access of database. ODBC is the most often
used database communication interface. ODBC is a basic application
programming interface (API) that permits any application of Windows to
retrieve relational sources of data utilizing SQL (API). An API is defined
as the group of procedures, protocols, and tools for constructing software
applications by the Webopedia online dictionary (Kötter, 2004).
Figure 7.1. ORACLE intrinsic connectivity.
Source:https://docs.oracle.com/en/database/oracle/oracle-database/119/db-
seg/introduction-to-strong-authentication.html.
A good API simplifies software development by giving all the necessary
building parts; the computer programmer assembles the blocks. Most operating
systems, like Microsoft Windows, offer an API so that the developers can create
programs that are compatible with the operating system. Even though APIs
are developed for programmers, they eventually benefit consumers since they
ensure that all apps utilizing a common API have identical user interfaces. This
facilitates user learning of new programs (Stephan et al., 2001). The ODBC
database middleware protocol was the first extensively used standard, and it
was quickly implemented in Windows applications. Beyond the capacity to
implement SQL and modify relational style data, ODBC didn’t give considerable
capability as computer languages matured. As a result, programmers required a
more efficient method of data access. Microsoft created two more data access
APIs to meet this need (Norrie et al., 1998):
• Data Access Objects is an API based on object that allows Visual
Basic programs to retrieve MS Access, dBase, and MS FoxPro
databases. DAO offered an efficient interface that revealed the Jet

data engine’s features to programmers (on which the database of
MS Access is centered). Other relational-style sources of data can
also be accessed using the DAO interface.
• Remote Data Objects is an application interface based on
object utilized to connect to distant database data servers. RDO
leverages the relatively low DAO and ODBC to handle databases
directly. RDO has been designed for databases based on server
including Microsoft SQL Server, and DB2.
Figure 7.2 shows how Windows programs can retrieve local and distant
relational sources of data via ODBC, RDO, and DAO (Abdalhakim, 2009).
Client programs can utilize ODBC to retrieve relational sources of data,
as shown in Figure 7.2. The RDO and DAO object interfaces, on the other
hand, offer more capabilities. The basic ODBC data services are used by
DAO and RDO. ODBC, RDO, and DAO are executed as mutual code that is
instantly connected to Windows operating system via dynamic-link libraries
(DLLs), which are saved as.dll files. The code, when run as a DLL, reduces
burden and run times (Quirós et al., 2018).
Figure 7.2. Utilizing ODBC, RDO, and DAO to access databases.
Source: https://www.oreilly.com/library/view/ado-activex-data/155659241150/
ch01s01.html.
The fundamental ODBC design has three key modules (Kuan et al.,
2015):
• An API of ODBC that application programs can use to access
ODBC features.
• A driver manager that is responsible of handling all connections
of database.
• ODBC driver that interconnects straight to the DBMS.
The 1st phase in utilizing ODBC is to define the source of data. To
define the source of data, one should first construct data source name (DSN)
(Jansons & Cook, 2006).
To develop the DSN one must provide the following:
• ODBC Driver: One must determine which driver will be used to
link to the source of data. Typically, the database manufacturer
supplies the ODBC driver, even though Microsoft offers many
drivers that link to the majority of popular databases. If the Oracle
DBMS is used, for instance, one will choose the Oracle-supplied
ODBC driver for Oracle or, if needed, the Microsoft-supplied
ODBC driver for Oracle.
• DSN Name: This is the name by which ODBC, and hence
applications, will recognize the source of data. There are 2 sorts
of data sources available in ODBC: user and system. Only the
user has access to user data sources. All of the users, particularly
operating system services, have access to system sources of data
(Lamb, 2007).
• ODBC Driver Parameters: To link to the database, most of the
ODBC drivers need specified parameters. If the database of MS
Access, for instance, one should specify the position of the file of
Microsoft Access and, if essential, a login and password. If the
DBMS server is used, one will require to enter the name of server
and database, login, and password to link to the database. The
ODBC displays necessary to develop a System ODBC source of
data for the Oracle DBMS are shown in Figure 7.3. It should be
noted that some of the ODBC drivers rely on the native driver
given by the DBMS seller.
Figure 7.3. Setting an oracle ODBC source of data.
Source: tackoverflow.com/questions/511026065/how-do-i-setup-an-odbc-con-
nection-to-oracle-using-firedac.
Once the ODBC source of data has been created, application developers
can easily write to ODBC API by executing particular commands with
the necessary arguments. The ODBC driver manager will direct calls to
the suitable source of data. The ODBC API standard specifies 3 degrees
of conformance: Core, Level 1, and Level 2, which offer progressively
greater capability. Level-1 may support the majority of SQL DML and
DDL statements, such as sub-queries and aggregate operations, however
procedural SQL and cursors aren’t supported. Database suppliers can pick
which support level to implement. To interface with ODBC, however,
the database seller should execute all of the functionality outlined in the
corresponding ODBC API level of support (Stephens & Huhns, 1999).
Figure 7.4 explains how to utilize ODBC to access data from Oracle
RDBMS using Microsoft Excel. The usage of these interfaces was restricted
when they were utilized with other kinds of data source since most of the
features provided by them are targeted at accessing relational sources of
data. Access to alternative non-relational sources of data has grown more
relevant with the introduction of object-oriented programming languages
(Huang et al., 2015).
Figure 7.4. MS Excel utilizes ODBC to link to the database of oracle.
Source: https://stackoverflow.com/questions/488196768/connecting-to-oracle-
database-through-excel.
7.2.3. OLE-DB
ODBC, RDO, and DAO were extensively utilized, however they didn’t s
nonrelational data. Microsoft created Object Linking and Embedding for
Databases to address this need and ease data communication (OLE-DB).
OLE-DB is the middleware database that offers object-oriented capability
for accessibility to non-relational and relational data. It is centered on
Microsoft’s Component Object Model (COM). OLE-DB was the initial
component of Microsoft’s ambition to offer an uniform object-oriented
foundation for next-generation application advancement (Moffatt, 1996).
OLE-DB is built of COM objects that offer programs with relatively
low database connectivity. Due to the fact that OLE-DB is built on COM,
the objects have data and techniques, also called the interface. The OLE-DB
paradigm is easier to comprehend when its functionality is divided into 2
sorts of objects (Saxena & Kumar, 2012):
• Consumers are data-requesting objects (procedures or

applications). Data consumers make requests for information
by using the approaches accessible by data provider objects and
giving the relevant variables.
• Providers are generally objects that maintain a data source’s
connection and supply data to consumers. There are 2 kinds of
providers: data and service providers.
• Data suppliers offer information to other processes. Database
suppliers make data provider objects that reveal the underlying
data source’s capability.
• Consumers receive more capabilities from service providers. The
location of the service provider is amongst the data provider and
consumer. The service provider obtains data from a data provider,
changes the data, and afterwards gives the data consumer with
the changed data. In other words, the service provider operates
as both a data consumer and a data provider for the consumer
of data. The service provider may, for instance, provide cursor
and transaction management, processing of query, and indexing
services.
Many suppliers use OLE-DB objects to supplement their ODBC
assistance, thereby establishing a mutual object layer on top of the current
database connectivity (native or ODBC) that programs may interact with
(Bazghandi, 2006). The OLE-DB objects reveal database capabilities;
there exists objects for relational data, structured data, and the flat-file text
data, for instance. The objects also do specific functions like developing
a connection, running a query, calling a stored process, describing the
transaction, or calling an OLAP function. Being required to have all of the
capability all of the time, the database provider can select what functionality
to execute in the modular manner utilizing OLE-DB objects. Table 7.1 lists
few object-oriented classes that OLE-DB uses, as well as some techniques
that the objects offer (Baber & Hodgkin, 1992).
Table 7.1. Example OLE-DB Interfaces and Classes
OLE-DB added features to the apps that accessed the data. Moreover,
it lacked assistance for scripting languages, particularly those employed in
Web development, like Active Server Pages (ASPs) and ActiveX. (A script
is created in an interpretable programming language that is implemented at
runtime) (Vecchio et al., 2020). ActiveX Data Objects (ADO) offers a high-
level, interface based on application to communicate with OLE-DB, RDO,
and DAO. Microsoft built this architecture to provide this capability. ADO
provides a standardized interface for computer languages that use OLE-
DB objects to retrieve data. Figure 7.5 depicts the ADO/OLE-DB design’s
interaction with native and ODBC connectivity alternatives. ADO presented
a simplified object model consisting of a handful of communicating objects
to deliver the data transformation services needed by applications. Table 7.2
contains examples of ADO object types (Song & Gao, 2012).
Table 7.2. Example ADO Objects

Figure 7.5. OLE-DB design.
Source: http://www.differencebetween.net/technology/web-applications/differ-
ence-between-oledb-and-odbc/.
Even though the ADO paradigm is a vast advance over the OLE-DB
paradigm, Microsoft is urging programmers to utilize ADO.NET, their
newest data access architecture (Lee et al., 2016).
7.2.4. ADO.NET
ADO.NET, which is centered on ADO, is the data access module of
Microsoft’s.NET application development platform. The Microsoft.
NET architecture is the component-oriented platform for creating global,
heterogeneous, inter-operable applications that can manipulate any sort of
data across any network and in either programming language. The aim of this
book extends beyond comprehensive examination of the .NET architecture
(Khannedy, 2011).
Consequently, this part will solely introduce ADO.NET, the fundamental
data access module of the .NET architecture. It is essential to realize that the
.NET architecture expands and improves the capability offered by ADO/

OLE-DB. DataSets and XML support are two novel technologies offered by
ADO.NET that are crucial for the building of distributed applications (Hall
& Kier, 2000).
To see the significance of this novel approach, keep in mind that the
DataSet is the memory-resident, unconnected depiction of the database.
Columns, rows, tables, connections, and restrictions are all part of the
DataSet. After reading data from the provider of data, the data is stored
in the memory-resident DataSet, which is then separated from the data
provider. The application of data consumer communicates with the DataSet
object’s data to make adjustments to it. The DataSet data is synced with the
source of data and the modifications are made persistent once the program is
completed (Press et al., 2001).
The DataSet is internally kept in XML format, and the data in it can be
made permanent as XML documents. This is especially important in existing
scattered contexts. Consider the DataSet to be an in-memory XML-oriented
database that demonstrates the durable data saved in the source of data. The
key modules of ADO.NET object - oriented paradigm are depicted in Figure
7.6 (Poo et al., 2008).
Figure 7.6. Framework of ADO.NET.
Source: https://www.codeproject.com/Articles/338643/ASP-NET-Providers-
for-the-ADO-NET-Entity-Framework.
The ADO.NET architecture integrates all data access features into a

single object model. Numerous objects interact within this object model to
accomplish specialized data manipulation operations. These items are data
suppliers and data consumers (Ternon et al., 2014).
The database vendors give data provider objects. ADO.NET, on the other
hand, comes with 2 fundamental data providers: 1 for OLE-DB sources of
data and another for SQL Server. As a result, ADO.NET can connect to any
database that has previously been supported, along with an ODBC database
including an OLE-DB data provider. ADO.NET also has a SQL Server
data provider that is highly efficient. To change the data in data source, the
data provider should facilitate a set of specified objects. Figure 7.6 depicts
several of these artifacts. Following is a quick explanation of the objects
(Liu et al., 2018).
• Connection object specifies the source of data, the name of server,
database, and etc. This object allows the client program to open
and end database connections.
• Command object a database command that will be implemented
within a particular database connection. This object holds the
SQL or stored process to be executed by the database. Command
produces a collection of columns and rows when a SELECT
query is implemented.
• DataReader object is a customized object that establishes a
read-only connection with the database in order to obtain data
progressively (forward only) and quickly.
• DataAdapter object is responsible for the DataSet object’s
management. The ADO.NET design’s most specific object is this
one. The DataAdapter object includes the Commands of Select,
Insert, Update, and DeleteC objects, which help manage the data
in a DataSet. These objects are used by the DataAdapter object to
update and integrate the data in a DataSet with the data from the
persistent data source (Hodge et al., 2016).
• DataSet object is the database’s in-memory demonstration
of the data. There are two key objects in this object. The
DataTableCollection object includes an amalgamation of
DataTable objects that comprise the in-memory database, and
DataRelationCollection object comprises an amalgamation of
objects that describe data associations and methods for associating
1 row in 1 table with the associated row in some other table.
• DataTable object displays data in a tabular style. This object

contains 1 extremely crucial property: PrimaryKey, which
permits entity integrity to be enforced. The DataTable object, in
turn, is made up of 3 major objects:
• DataColumnCollection comprises one or more descriptions of
column. Column names, data types, nulls permitted, maximum
minimum values are all attributes of every column description.
• DataRowCollection includes 0 rows, 1 row, or generally more
than 1 row with data as explained in DataColumnCollection.
• ConstraintCollection comprises the explanations of limitations
for the table. 2 kinds of constraints are favored: ForeignKey and
Unique Constraint (Huang et al., 2012).
The DataSet is a row, table, and constraint-based database. Furthermore,
the DataSet does not necessitate a continuous connection to the source of
data. The SelectCommand object is used by the DataAdapter to update
the DataSet from the source of data. The DataSet, therefore, is completely
autonomous of the source of data after it is populated, due to which it is
dubbed disconnected (Shum, 1997).
Furthermore, DataTable objects in the DataSet might come from
several sources of data. This implies that an Oracle database might have
the EMPLOYEE table and the SQL Server database might have SALES
table. Might then develop a DataSet that connects the 2 tables as if they
were in the very same database (Harris & Reiner, 1990). In summary, the
DataSet object provides a pathway for applications to assist truly diverse
distributed databases. The ADO.NET architecture is designed to perform
well in disconnected situations. Applications send and receive messages in
request/reply style in the disconnected environment. The Internet is by far
the most usual instance of the disconnected system. The Internet serves as
the network platform for modern programs, while the Web browser serves
as the GUI (graphical user interface) (Deng et al., 2010).
7.2.5. Connectivity of Java Database

Sun Microsystems created Java, an object-oriented computer language that
functions on top of the Web browser software. Among the most widely
used programming languages for development of Web is Java. Java was
established by Sun Microsystems to be a write once, execute anywhere
environment. That is, a programmer can build a Java application once and
then execute it in numerous settings without making any changes (Apple
OS X, Microsoft Windows, IBM AIX.). Java’s portable design underpins its

cross-platform abilities (Robinson et al., 2010). Applets are pre-processed
bits of Java code that execute in the environment of virtual machine on the
operating system of host. The borders of this environment are clearly defined,
and interaction with the operating system of host is strictly monitored. For
most of the operating systems, Sun supplies Java runtime environments.
Another benefit of Java is that it has a on-demand design. When the Java
application starts up, it can use the Internet to adaptively download all of its
modules or needed components (Regateiro et al., 2017).
Java applications employ established APIs to obtain data outside of the
Java execution environment. It is an API that permits the Java program to
deal with the variety of sources of data (relational databases, tabular sources
of data, spreadsheets, and the text files). JDBC enables the Java program
to link to a source of data, prepare, and transmit SQL code to the server of
database, and evaluate the query results (Coetzee & Eloff, 2002; Luo et al.,
2018). One of the primary benefits of JDBC is that it permits the firm to
capitalize on its existing investments in technology and staff training. JDBC
enables programmers to process data in the company’s databases using their
SQL expertise. In truth, JDBC supports both direct connection to the server
of database and access through database middleware. Moreover, JDBC
allows users to link to database via an ODBC driver. Figure 7.7 depicts the
fundamental JDBC design as well as the numerous database access patterns
(Li & Yang, 2020).
The components and operation of any database access middleware are
comparable. One benefit of JDBC as compared to the other middleware
is that it doesn’t need any client configuration. As component of the Java
applet download, the JDBC driver is downloaded automatically and
installed. Sic Java is a Web-oriented platform, apps can utilize a simple URL
to link to the database. When the URL is entered, the Java design kicks
in, the appropriate applets (such as database driver of JDBC and all of the
configuration information) are transferred to the client, and the applets are
safely implemented in the operating environment of a client (Manzotti et al.,
2015; Ng et al., 2009).
Figure 7.7. Framework of JDBC.
Source: https://www.tutorialspoint.com/jdbc/jdbc-introduction.htm.
Every day, more organizations invest money in building and extending
their Web presence, as well as finding new ways to do business online. This
type of business will produce a growing quantity of information, which
will be kept in databases. Java and the.NET architecture are examples of
how businesses are increasingly relying on the Web as a crucial business
resource. In reality, the Internet is expected to become the next development
platform. The sections below will teach more about Web databases and how
to use them (Chen et al., 2000).
7.3. INTERNET DATABASES

Millions of individuals use the Internet to link to databases through Web
browsers or services of data all over the globe (i.e., utilizing the smart phone
apps to obtain information about weather). The ability to link to databases
over the internet opens the door to novel advanced services that (Minkiewicz
et al., 2016):
• Allow for quick reactions to competitive demands by quickly

bringing novel products and services to market;
• Improve client happiness by establishing Web-oriented support
services;
• Permit mobile smart devices to access data from anywhere and at
any time through the Internet; and
• Ensure that information is disseminated quickly and effectively
by providing global access from all over the street or around the
world.
Given these benefits, numerous firms rely on their information technology
teams to develop UDA designs centered on Internet protocols. Table 7.3
displays a selection of Internet technology features and the advantages they
offer (Van Steensel & Winter, 1998).
Table 7.3. Features and Advantages of Internet Technologies
It is simple to see why various database experts believe the DBMS link
to the Internet to be the vital aspect in IS development in today’s business
and worldwide information environment. The Web has a substantial
impact on database application development, particularly the development
and administration of user interfaces and the connectivity of database, as
described in the next sections. Having a Web-oriented database interface, on
the other hand, doesn’t invalidate the database development and execution
challenges. In the end, whether you buy anything online or in person, the
system-level details of transaction are fundamentally the same, and they

both necessitate the same fundamental database structures and associations.
If there’s one thing to remember right now, it’s this (Liu et al., 2003):
The consequences of poor database framework, execution, and
maintenance are magnified in an environment where transactions are
counted in hundreds of thousands instead of hundreds each day (Shahabi
et al., 2000).
The Internet is quickly altering the generation, accessibility, and
distribution of information. At the heart of this transformation are the
Web’s capacity to access (local and distant) database data, the interface’s
simplicity, and cross-platform capabilities. The Web has contributed to the
development of a novel information distribution standard. The sections that
follow investigate how Web to database middleware empowers end users to
communicate with databases over the internet are following (Ouzzani et al.,
2000).
7.3.1. Web-to-Database Middleware

Generally, the Web server serves as the central hub for all Internet services.
The client browser, for instance, requests the Web page when the end user
utilizes the Web browser to interactively query a database. When the server
of Web accepts the page request, it searches the hard disk for page; after it
locates it (for instance, the stock quote or information of product catalog), it
returns it to the client (Yerneni et al., 1998).
Modern Web sites rely heavily on dynamic Web pages. The Web server
constructs the contents of Web page in this database-query circumstance
before sending the page to the Internet browser of client. The only issue
with the previous query context is that the database query outcome should
be included on the webpage before it is returned to a client. Regrettably,
neither the Web browser nor even the Web server understands how to link to
and read information from the database. To enable this form of request, the
Web server’s capacity should be expanded to interpret and execute database
requests. This task is carried out via a server side extension (Blinowska &
Durka, 2005).
The server-side extension is an application that communicates directly
with a Web server to process certain kinds of requests. In the previous
database query instance, the server-side extension application recovers
the information from databases and transmits it to Web server, which then
delivers it to the client’s browser for display. This extension enables retrieval
and presentation of query outcomes, but more importantly, it ensures that
(Xu, 2012). It offers its services to Web server in a completely transparent
manner to the client browser.
In summary, the server-side extension significantly enhances the
capability of the server of Web and, by extension, the Internet (Minkiewicz
et al., 2015). Web to database middleware is another term for the database
server-side extension software. The communication amongst the browser,
Web server, and the Web to database middleware is depicted in Figure 7.8.
Figure 7.8 shows the Web to database middleware activities (Khan et
al., 2001):
• A page request is sent to Web server by the browser of client.
• The request is received and validated by the Web server. The server
will then forward the query to Web to database middleware to be
processed. In most cases, the queried page includes a scripting
language that enables database communication.
Figure 7.8. Web-to-database middleware.
Source: https://flylib.com/books/en/2.6772.1.65/1/.
• The script is read, authenticated, and executed by the Web to

database middleware. In this instance, the connectivity layer of
database is used to link to the database and execute the query.
• The request is executed by the server of database, and the outcome
is returned to the Web to database middleware (Sylvester, 1997).
• The Web to database middleware integrates the query results,
then interactively builds an HTML-formatted page using the data
of database and transmits it to the Internet server.
• The server of Web sends the client browser the newly produced
HTML page, which already comprises the result set.
• The page is shown on the local machine by the client browser.
The communication amongst the Web server and Web to the database
middleware is essential for the proper deployment of an Internet database.
Thus, the middleware should be well-integrated with another Internet
services and use-related components. For instance, while deploying Web to
the database middleware, the software should validate the Web server kind
and install itself according to the Web server’s specifications. Additionally,
the Web server interfaces offered by Web server will determine how
successfully the Web server and Web to database service communicate
(Bouguettaya et al., 1999).
7.3.2. Web Server Interfaces

Enhancing Web server capabilities necessitates proper communication
between the Web server and Web to database middleware. (Interoperate is
a term used by database experts to describe how one party can reply to
the interactions of the other.) The word communicate is used in this book
to imply collaboration.) If the Web server and an exterior software are to
communicate effectively, both programs should adopt a standard method
of exchanging messages and responding to requests. The way a Web server
interfaces with external programs is defined by the Web server interface.
There are 2 well-described Web server interfaces available at the moment
(de Leeuw et al., 2012):
• Common Gateway Interface (CGI)
• Application programming interface (API)
The CGI employs script files that execute certain functions centered on
the parameters given to the server of Web by the client. The script file is a
tiny program that contains commands written in the computer programming
language, most commonly Perl, Visual Basic, or C#. Using the parameters
given by the Web server, the information of the script file can be utilized to
link to the database and get data from it. Following that, the script changes
the extracted data to HTML style and transmits it to Internet server, which
delivers the HTML-styled page to the client (Meng et al., 1998).
The primary drawback of utilizing CGI scripts would be this document
is just an external application that is performed independently for every user
query. This situation reduces system efficiency. For instance, if there are 200
continuous requests, the script will be loaded 2 hundred times, consuming
substantial CPU and memory on the Web server. The script’s language and
creation process can also effect system performance. Using an interpretive
language or constructing the script incompetently, for instance, can reduce
performance (Mohr et al., 1998).
A novel Web server interface standard called API is more effective and
quicker as compared to the CGI script. Because APIs are executed as shared
code or dynamic-link frameworks, they are more effective (DLLs). That is,
the API is considered as if it were a component of a Web server program,
which is called dynamically as required (Van Dulken 1999).
APIs are quicker as compared to CGI scripts as the code is stored in
memory, eliminating the need to run a separate program for every request.
Rather, all requests are served by the same API. One more benefit is that
the API can utilize the mutual connection to database rather than building
a novel one each time, as done by CGI scripts. Even though APIs handle
requests more efficiently, they have significant drawbacks. Since the APIs
and the Web server use the similar memory space, an API issue can bring
this server down. Another issue is that APIs are unique to Internet server and
the operating system. At the time of writing, there are 4 well-known Web
server APIs (de Macedo et al., 2008):
• ISAPI (Internet Server API) for Web servers of Microsoft
Windows.
• WSAPI (WebSite API) for O’Reilly Web servers.
• JDBC to offer database connectivity for applications of Java.
The several kinds of Web interfaces are demonstrated in Figure 7.9
(Lemkin et al., 2005).
Figure 7.9. Web server API and CGI interfaces.
Source: https://frankfu.click/database/database-system-design-management/
chapter-15-database-and-internet/.
Web to the database middleware program should be capable to link to
the database irrespective of the kind of Web server interface utilized. One of
2 methods can be used to make that connection (Ou & Zhang, 2006):
• Make use of the vendor’s native SQL access middleware. If
you’re using Oracle, you can, for instance, use SQL*Net.
• Use Open Database Connectivity (ODBC), OLE-DB (Object
Linking and Embedding for Database), ActiveX Data Objects
(ADO), ADO.NET (ActiveX Data Objects for .NET) interface,
or Java Database Connectivity (JDBC) for Java.
7.3.3. The Web Browser

It is an application program on the computer of client that allows end users
to browse the Internet, like Microsoft Internet Explorer, or Mozilla Firefox.
When the visitors click on the hyperlink, this browser sends an HTTP GET
page request over the TCP/IP Internet protocol to the chosen Web server
(Mai et al., 2014). The task of the Internet browser is to read the HTML
code received from the Web server and display the several page modules
in the standard standardized manner. Consequently, the browser’s ability to
analyze and present information is insufficient for developing Web-oriented
apps. This is due to the fact that the Internet is indeed a stateless system,
which means that the Web server doesn’t check the update of any client
connecting with it at any time (Bradley, 2003). Such that, there isn’t any
open communication line amongst the server and every client who connects
to it, which is obviously impracticable on the Internet! Rather, client, and
computers of server communicate in brief “conversations” based on the
request-response concept. Because the browser is only involved with the
existing page, the second page has no means of knowing what was carried
out on the first. The only time the server computers and client connect is
when the client asks a page and the server transmits the desired page to a
client after the user clicks on a link (Falk, 2005).
The server/client communication is terminated after the page and its
modules are received by the client. As a result, while one might be surfing
a page and believe that interaction is open, one is actually merely accessing
the HTML document saved in browser’s local cache (temporary directory).
The server has no notion what the consumer is doing with the page, what
information is inserted in the form, which option is chosen, and so on. To
respond to the client’s choices on Web, one must navigate to the novel page,
losing sight of what was done previously (Kiley, 1997)!
The role of a Internet browser is to present the page on the computer
of client. By utilizing HTML, the browser has no computing capabilities
besides styling output script and receiving form field inputs. Even while the
browser validates form field data, rapid data entry validation isn’t possible.
Web deviates to some other Web programming languages like Java,
VBScript, and JavaScript to conduct such important client-side processing
(Chen et al., 1996).
The browser looks like a dumb terminal that can only present data and
conduct simple processing like receiving form data inputs. Plug-ins as well
as other client-side extensions are required to enhance the functionality of
the Web browser. Web application servers offer the essential computing
power on the side of server (Bouguettaya et al., 1998).
7.3.4. Client-Side Extensions

Client-side extensions extend the capabilities of the browser. Even though
client-side extensions come in a variety of shapes and sizes, the following
are the most typical ones (Kerse et al., 2001):
• JavaScript and Java.
• Plug-ins.
• VBScript and Active X.
The plug-in is an exterior application that the web browser invokes
when necessary. The plug-in is a specific operating system as it is an
exterior application. To permit the Web server to appropriately manage data
that wasn’t originally assisted, the plug-in is linked with the data object,
typically using the file extension. For instance, if one of page module is the
PDF document, the server of Web will receive information, identify it as the
Portable Document Format object, and start Adobe Acrobat Reader on the
computer of client to display the document (Jacobsen & Andenæs, 2011).
Java executes on the top of Web browser software, as previously stated.
The Web server compiles and stores Java programs. (Java is similar to C++
in numerous ways.) Java procedures are called from within the HTML page.
When the browser discovers this call, it downloads and executes the Java
classes \from the Web server on the client machine. The fundamental benefit
of Java is that it allows application developers to create apps once and operate
them in several environments. (Interoperability is a critical consideration
when creating Web apps.) Consequently, diverse client browsers aren’t
always compatible, restricting portability (Gordon, 2003).
JavaScript is the scripting language (a language that facilitates the
execution of a series of instructions or macros) that permits Web designers to
create interactive sites. JavaScript is simpler to comprehend as compared to
Java since it is simpler to generate. The Web pages contain JavaScript code.
It is downloaded along with the Web page and is engaged when a specified
event occurs, like a mouse click on the item or a page being retrieved into
memory from the server (Fatima & Wasnik, 2016).
ActiveX is Microsoft’s Java replacement. ActiveX is a set of guidelines
for creating programs that operate in Microsoft’s client browser. ActiveX has
limited portability since it is primarily designed for Windows applications.
By introducing controls to Web pages, ActiveX broadens the Web browser.
Drop-down lists, calendar, slider, and the calculator are instances of such
controls) (Carmichael, 2002). These controls, which are periodically
retrieved from the Web server, allow to process the data within the browser.
ActiveX controls can generally be written in a variety of programming
languages, the most popular of which are C++ and Visual Basic. The .NET
framework from Microsoft permits ActiveX-oriented applications (like
ADO.NET) to work across various operating systems (Braxton et al., 2003).
One more Microsoft tool used to enhance browser capabilities is
VBScript. Microsoft Visual Basic is the source of VBScript. VBScript code,
such as JavaScript, is embedded into an HTML page and is launched by
triggering events like clicking a link. From the developer’s perspective,
utilizing routines that allow validation of data on the side of client is a must
(Dubey & Chueh, 1998). When data is inserted on a Web form, for instance,
and no data analysis is performed on the side of client, the complete data set
should be submitted to the Web server. In that case, the server must complete
all data analysis, wasting significant CPU cycles of processing. As a result,
among the most fundamental elements for Web applications is the client-
side data input va. The vast majority of data validation functions are written
in Java, JavaScript, VBScript, or Active X (Liu et al., 1998).
7.3.5. Web Application Servers

Web application server is the middleware application that connects Web
servers to a variety of services including databases, search engines, and
directory systems. A consistent operating environment for Web applications
is provided by the Web application server. These servers can be utilized to
do the following (BL Bard & Davies, 1995):
• Link to and request a database from the Web page.
• Display database information in the Web page, utilizing numerous
formats.
• Develop vibrant Web search pages.
• Develop Web pages to fill, refresh, and delete database
information.
• In an application program logic, ensure referential integrity.
• To express business rules, employ simple and layered queries as
well as programming logic.
Web application servers offer characteristics like (Etzioni & Weld,
1995):
• The integrated development environment that supports permanent
application variables and manages sessions.
• User identification and security via user IDs and passwords.

• Computational languages are used in an application server to
specify and store logic of business.
• Automatic HTML page production using Java, JavaScript, ASP,
VBScript, and other programming languages.
• Performance and fault-tolerance characteristics
• Access to a database with transaction managerial skills.
• FTP (File transfers), connectivity of database, and e-mail are just
a few of the capabilities available.
As of this chapter, major Web application servers comprise Adobe’s
ColdFusion/JRun, Oracle’s WebLogic Server, IBM’s WebSphere
Application Server, NetObjects’ Fusion, and Apple’s WebObjects. All of the
Web application servers allow to link to various sources of data and other
services. They differ in terms of the capabilities they offer, their resilience,
scalability, convenience of use, compliance with other Web and tools of
database, and the scope of the development platform (Di Martino et al.,
2019).
Present-generation systems entail more than merely Web-enabled
database application development. In addition, they need apps capable of
connecting with one another and with non-Web-based systems. Obviously,
systems should be able to share data in a format centered on industry
standards. That is the function of XML (Bray et al., 2000).
7.4. EXTENSIBLE MARKUP LANGUAGE

The Internet has ushered in novel technologies that make it easier for business
associates and consumers to share information. Companies are utilizing the
Internet to develop new system that combines their data in order to boost
effectiveness and lower expenses. Electronic commerce allows businesses
of all sizes to advertise and sell their goods and services to millions of people
across the world. Business to business (B2B) or business to consumer (B2C)
e-commerce transactions can occur amongst businesses or among businesses
and consumers (Friedman et al., 1999).
The majority of electronic-commerce transactions are among businesses.
Since B2B electronic-commerce connects business operations between
organizations, it necessitates the exchange of business data between various
corporate entities. However, how firms represent, recognize, and utilize data
varies greatly from one another (Bryan, 1998).
Until recent times, it was expected that a Web-based purchase order

might take the shape of an HTML document. The HTML Web page shown
by a Web browser might have both formatting and order information. HTML
tags govern how something appears on a Web page, like bold text or header
style, and frequently appear in pairs to begin and conclude formatting
characteristics (Sperberg-McQueen, 2000).
There is no straightforward way to obtain details of order (like the order
number, date, customer number, item, quantity, price, or details of payment)
from an HTML page if an application wishes to receive order information
from the Web page. The HTML page can only specify how to show the order
in the browser; it can’t change the order’s data elements, such as the date,
information of shipping, payment, and, and so on. A novel markup language
called XML, was created to tackle this problem (Castan et al., 2001).
The XML is a meta-language for representing and manipulating data
elements. XML is intended to make it easier to interchange structured
documents like orders and receipts over the Internet. In 1998, World Wide
Web Consortium (W3C)1 released the 1st XML 1.0 standard specification.
That standard laid the groundwork for XML to have the real-world
attractiveness of being a genuine vendor-independent foundation. It’s not
strange, then, that XML has quickly become the data interchange standard
for e-commerce systems (Sanin & Szczerbicki, 2006).
The XML metalanguage enables the creation of news elements, like
(ProdPrice), to define the data items in an XML document. The X in
XML refers to the language’s potential to be extended; it is considered
to be expandable. The Standard Generalized Markup Language (SGML)
is a global standard for the manufacture and supply of very complicated
technical reports. XML is evolved from SGML. The aviation sector and the
military services, for instance, utilize documents that are too sophisticated
and bulky for the Internet. The XML document is the text file, similar to
HTML, which was originated from SGML. It does, however, have a few
crucial extra traits, which are as follows (Huh, 2014):
• XML permits the explanation of novel tags to define data
elements, like (ProductId).
• XML is case sensitive, thus (ProductID) and (Productid) aren’t
the same thing.
• XMLs should be well-created, which means tags should be
formatted correctly. The majority of openings have a comparable
closing. For instance, the product identifier format might be
ProductId>2345-AA</ProductId>.
• The nesting of XMLs should be done correctly. A correctly layered
XML would look something like this: Product>ProductId>2345-
AA</ProductId></Product>.
• The <-- & --> symbols can be used to add comments to an XML
document.
• Only XMLs are allowed to use the XML and xml prefixes.
XML isn’t a novel version of HTML or a substitute for it. XML is
engaged with the definition and processing of information instead of data
the presentation of data. XML offers the principles that allow structured
documents to be shared, exchanged, and manipulated across organizational
boundaries (Yoshida et al., 2005). XML and HTML serve complementing
services instead of overlapping ones. XHTML Extensible Hypertext Markup
Language) is HTML’s next generation, built on the XML architecture. The
XHTML standard adds XML functionality to the HTML standard. Even
though more sophisticated than HTML, XHTML has relatively severe
syntactic constraints (Vuong et al., 2001).
Assume a B2B scenario in which Company A utilizes XML to swap item
data with Company B over through the Web as an instance of how XML
may be used for data sharing. The details of ProductList.xml document are
shown in Figure 7.10 (Baski & Misra, 2011).
Figure 7.10. The productlist.xml page’s contents.
Source: https://www.slideshare.net/SaranyaPsg/1-xml-fundamen-
tals-1972357783.
The XML instance in Figure 7.10 demonstrates the following

fundamental XML characteristics (Berman, 2005):
• The 1st line is required and specifies the XML document
declaration.
• A root element exists in each XML document. The 2nd line in the
instance declares the ProductList root component.
• The root element is made up of subsidiary elements or sub-
elements. Line 3 of the sample specifies Product as a subsidiary
element of ProductList.
• Subelements can be found within every element. Every Product
element, for instance, is made up of multiple subsidiary elements,
such as P CODE, P DESCRIPT, P INDATE, P QOH, P MIN, and
P PRICE.
• The XML document has a hierarchical tree structure with
elements associated in the parent-child association; every parent
element can possess several child elements. The root element, for
instance, is ProductList.
ProductList has the child element called Product. P CODE, P INDATE,
P DESCRIPT, P MIN, P QOH, and P PRICE are the 6 child components of
product (Yousfi et al., 2020).
If Company B comprehends the tags generated by Company A, it can
execute the ProductList.xml document once it receive it. The interpretation
of the XMLs in Figure 7.10 is truly self evident, but there isn’t any
straightforward path to validate the data or confirm whether the data is full.
For instance, one might come across a P INDATE value of 25/14/2009—but
is that the accurate value? What if Company B also anticipates a vendor
component? How many businesses communicate descriptions of data
for business data elements? The following section will demonstrate how
Document kind Definitions and XML standards are utilized to overcome
these issues (Katehakis et al., 2001).
REFERENCES
1. Abdalhakim, H., (2009). Addressing burdens of open database
connectivity standards on the users. In: 2009 Third International
Symposium on Intelligent Information Technology Application
Workshops (Vol. 1, pp. 305–308). IEEE.
2. Baber, J. C., & Hodgkin, E. E., (1992). Automatic assignment of
chemical connectivity to organic molecules in the Cambridge structural
database. Journal of Chemical Information and Computer Sciences,
32(5), 401–406.
3. Baca, A., Dabnichki, P., Heller, M., & Kornfeind, P., (2009). Ubiquitous
computing in sports: A review and analysis. Journal of Sports Sciences,
27(12), 1335–1346.
4. Baski, D., & Misra, S., (2011). Metrics suite for maintainability of
extensible markup language web services. IET Software, 5(3), 320–
341.
5. Bazghandi, A., (2006). Web database connectivity methods (using
MySQL) in windows platform. In: 2006 2nd International Conference
on Information & Communication Technologies (Vol. 2, pp. 3577–
3581). IEEE.
6. Berman, J. J., (2005). Pathology data integration with eXtensible
markup language. Human Pathology, 36(2), 139–145.
7. Bl Bard, J., & Davies, J. A., (1995). Development, databases and the
internet. BioEssays, 17(11), 999–1001.
8. Blinowska, K. J., & Durka, P. J., (2005). Efficient application of
internet databases for new signal processing methods. Clinical EEG
and Neuroscience, 36(2), 123–130.
9. Bouguettaya, A., Benatallah, B., & Edmond, D., (1998). Reflective
data sharing in managing internet databases. In: Proceedings 18th
International Conference on Distributed Computing Systems (Cat. No.
98CB36183) (Vol. 1, pp. 172–181). IEEE.
10. Bouguettaya, A., Benatallah, B., Ouzzani, M., & Hendra, L., (1999).
Using java and CORBA for implementing internet databases. In:
Proceedings 15th International Conference on Data Engineering (Cat.
No. 99CB36337) (Vol. 1, pp. 218–227). IEEE.
11. Bradley, G., (2003). Introduction to extensible markup language
(XML) with operations research examples. Newletter of the INFORMS
Computing Society, 24(1), 1–20.
12. Braxton, S. M., Onstad, D. W., Dockter, D. E., Giordano, R., Larsson,
R., & Humber, R. A., (2003). Description and analysis of two internet-
based databases of insect pathogens: EDWIP and VIDIL. Journal of
Invertebrate Pathology, 83(3), 185–195.
13. Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., Yergeau, F., &
Cowan, J., (2000). Extensible Markup Language (XML), 1, 3–7.
14. Brown, J. A., Rudie, J. D., Bandrowski, A., Van, H. J. D., & Bookheimer,
S. Y., (2012). The UCLA multimodal connectivity database: A web-
based platform for brain connectivity matrix sharing and analysis.
Frontiers in Neuroinformatics, 6, 28.
15. Bryan, M., (1998). An introduction to the extensible markup language
(XML). Bulletin of the American Society for Information Science,
25(1), 11–14.
16. Carmichael, P., (2002). Extensible markup language and qualitative data
analysis. In: Forum Qualitative Sozialforschung/Forum: Qualitative
Social Research (Vol. 3, No. 2).
17. Castan, G., Good, M., & Roland, P., (2001). Extensible markup
language (XML) for music applications: An introduction. The Virtual
Score, 12, 95–102.
18. Chen, H., Schuffels, C., & Orwig, R., (1996). Internet categorization and
search: A self-organizing approach. Journal of Visual Communication
and Image Representation, 7(1), 88–102.
19. Chen, J., DeWitt, D. J., Tian, F., & Wang, Y., (2000). NiagaraCQ:
A scalable continuous query system for internet databases. In:
20. Coelho, P., Aguiar, A., & Lopes, J. C., (2011). OLBS: Offline location
based services. In: 2011 Fifth International Conference on Next
Generation Mobile Applications, Services and Technologies (Vol. 1,
pp. 70–75). IEEE.
21. Coetzee, M., & Eloff, J., (2002). Secure database connectivity on the
www. In: Security in the Information Society (Vol. 1, pp. 275–286).
Springer, Boston, MA.
22. De Leeuw, N., Dijkhuizen, T., Hehir‐Kwa, J. Y., Carter, N. P., Feuk, L.,
Firth, H. V., & Hastings, R., (2012). Diagnostic interpretation of array
data using public databases and internet sources. Human Mutation,
33(6), 930–940.
23. De Macedo, D. D., Perantunes, H. W., Maia, L. F., Comunello, E., Von,
W. A., & Dantas, M. A., (2008). An interoperability approach based
on asynchronous replication among distributed internet databases. In:
2008 IEEE Symposium on Computers and Communications (Vol. 1,
pp. 658–663). IEEE.
24. Deng, Y., Tanga, Z., & Yunhua, C. M. C., (2010). Information integration
based on open geospatial database connectivity specification. In:
ISPRS Technical Commission IV, ASPRS/CaGIS 2010 Fall Specialty
Conference (Vol. 1, pp. 4–8).
25. Di Martino, S., Fiadone, L., Peron, A., Riccabone, A., & Vitale, V. N.,
(2019). Industrial internet of things: Persistence for time series with
NoSQL databases. In: 2019 IEEE 28th International Conference on
Enabling Technologies: Infrastructure for Collaborative Enterprises
(WETICE) (Vol. 1, pp. 340–345). IEEE.
26. Dubey, A. K., & Chueh, H., (1998). Using the extensible markup
language (XML) in automated clinical practice guidelines. In:
Proceedings of the AMIA Symposium (Vol. 1, p. 735). American
Medical Informatics Association.
27. Etzioni, O., & Weld, D. S., (1995). Intelligent agents on the internet:
Fact, fiction, and forecast. IEEE Expert, 10(4), 44–49.
28. Falk, H., (2005). State library databases on the internet. The Electronic
Library, 23(4), 492–498.
29. Fatima, H., & Wasnik, K., (2016). Comparison of SQL, NoSQL and
NewSQL databases for internet of things. In: 2016 IEEE Bombay
Section Symposium (IBSS) (Vol. 1, pp. 1–6). IEEE.
30. Friedman, C., Hripcsak, G., Shagina, L., & Liu, H., (1999). Representing
information in patient reports using natural language processing and
the extensible markup language. Journal of the American Medical
Informatics Association, 6(1), 76–87.
31. Gordon, A., (2003). Terrorism and knowledge growth: A databases
and internet analysis. In: Research on Terrorism (Vol. 1, pp. 124–138).
Routledge.
32. Hackathorn, R. D., (1993). Enterprise Database Connectivity: The Key
to Enterprise Applications on the Desktop (Vol. 1, pp. 4–8). John Wiley
& Sons, Inc.
33. Hall, L. H., & Kier, L. B., (2000). Molecular connectivity chi indices
for database analysis and structure-property modeling. In: Topological
Indices and Related Descriptors in QSAR and QSPAR (Vol. 1, pp. 317–
370). CRC Press.
34. Harris, P., & Reiner, D. S., (1990). The lotus DataLens approach to
heterogeneous database connectivity. IEEE Data Eng. Bull., 13(2),
46–51.
35. Hodge, M. R., Horton, W., Brown, T., Herrick, R., Olsen, T., Hileman,
M. E., & Marcus, D. S., (2016). Connectome DB—sharing human
brain connectivity data. Neuroimage, 124, 1102–1107.
36. Huang, H., Nguyen, T., Ibrahim, S., Shantharam, S., Yue, Z., & Chen, J.
Y., (2015). DMAP: A connectivity map database to enable identification
of novel drug repositioning candidates. In: BMC Bioinformatics (Vol.
16, No. 13, pp. 1–11). BioMed Central.
37. Huang, H., Wu, X., Pandey, R., Li, J., Zhao, G., Ibrahim, S., & Chen,
J. Y., (2012). C2Maps: A network pharmacology database with
comprehensive disease-gene-drug connectivity relationships. BMC
Genomics, 13(6), 1–14.
38. Huh, S., (2014). Coding practice of the journal article tag suite
extensible markup language. Sci. Ed., 1(2), 105–112.
39. Ivanov, S., & Carson, J. H., (2002). Database connectivity. In: New
Perspectives on Information Systems Development (Vol. 1, pp. 449–
460). Springer, Boston, MA.
40. Jacobsen, H. E., & Andenæs, R., (2011). Third year nursing students’
understanding of how to find and evaluate information from
bibliographic databases and internet sites. Nurse Education Today,
31(8), 898–903.
41. Jansons, S., & Cook, G. J., (2006). Web-enabled database connectivity:
A comparison of programming, scripting, and application-based access.
Information Systems Management, 1, 4–6.
42. Johnson, R. A., (2014). Java database connectivity using SQLite:
A tutorial. International Journal of Information, Business and
Management, 6(3), 207.
43. Katehakis, D. G., Sfakianakis, S., Tsiknakis, M., & Orphanoudakis,
S. C., (2001). An infrastructure for integrated electronic health record
services: The role of XML (extensible markup language). Journal of
Medical Internet Research, 3(1), e826.
44. Kerse, N., Arroll, B., Lloyd, T., Young, J., & Ward, J., (2001). Evidence
databases, the internet, and general practitioners: The New Zealand
story. New Zealand Medical Journal, 114(1127), 89.
45. Khan, L., Mcleod, D., & Shahabi, C., (2001). An adaptive probe-based
technique to optimize join queries in distributed internet databases.
Journal of Database Management (JDM), 12(4), 3–14.
46. Khannedy, E. K., (2011). MySQL Dan Java Database Connectivity
(Vol. 1, pp. 4–9). Bandung: StripBandung.
47. Kiley, R., (1997). Medical databases on the internet: Part 1. Journal of
the Royal Society of Medicine, 90(11), 610–611.
48. Kötter, R., (2004). Online retrieval, processing, and visualization
of primate connectivity data from the CoCoMac database.
Neuroinformatics, 2(2), 127–144.
49. Kuan, L., Li, Y., Lau, C., Feng, D., Bernard, A., Sunkin, S. M., & Ng,
L., (2015). Neuroinformatics of the Allen mouse brain connectivity
atlas. Methods, 73(1), 4–17.
50. Lamb, J., (2007). The connectivity map: A new tool for biomedical
research. Nature Reviews Cancer, 7(1), 54–60.
51. Lee, J. M., Kyeong, S., Kim, E., & Cheon, K. A., (2016). Abnormalities
of inter-and intra-hemispheric functional connectivity in autism
spectrum disorders: A study using the autism brain imaging data
exchange database. Frontiers in Neuroscience, 10(1), 191.
52. Lemkin, P. F., Thornwall, G. C., & Evans, J., (2005). Comparing
2-D electrophoretic gels across internet databases. The Proteomics
Protocols Handbook, 1, 279–305.
53. Li, Z., & Yang, L., (2020). Underlying mechanisms and candidate drugs
for COVID-19 based on the connectivity map database. Frontiers in
Genetics, 11, 558557.
54. Liu, T. P., Hsieh, Y. Y., Chou, C. J., & Yang, P. M., (2018). Systematic
polypharmacology and drug repurposing via an integrated L1000-
based connectivity map database mining. Royal Society Open Science,
5(11), 181321.
55. Liu, X., Liu, L. C., Koong, K. S., & Lu, J., (2003). An examination of
job skills posted on internet databases: Implications for information
systems degree programs. Journal of Education for Business, 78(4),
191–196.
56. Liu, Z., Du, X., & Ishii, N., (1998). Integrating databases in internet. In:
1998 Second International Conference. Knowledge-Based Intelligent
Electronic Systems: Proceedings KES’98 (Cat. No. 98EX111) (Vol. 3,
pp. 381–385). IEEE.
57. Luo, B., Gu, Y. Y., Wang, X. D., Chen, G., & Peng, Z. G., (2018).
Identification of potential drugs for diffuse large b-cell lymphoma
based on bioinformatics and connectivity map database. Pathology-
Research and Practice, 214(11), 1854–1867.
58. Mai, P. T. A., Nurminen, J. K., & Di Francesco, M., (2014). Cloud
databases for internet-of-things data. In: 2014 IEEE International
Conference on Internet of Things (iThings), and IEEE Green Computing
and Communications (GreenCom) and IEEE Cyber, Physical and
Social Computing (CPSCom) (Vol. 1, pp. 117–124). IEEE.
59. Manzotti, G., Parenti, S., Ferrari-Amorotti, G., Soliera, A. R., Cattelani,
S., Montanari, M., & Calabretta, B., (2015). Monocyte-macrophage
differentiation of acute myeloid leukemia cell lines by small molecules
identified through interrogation of the connectivity map database. Cell
Cycle, 14(16), 2578–2589.
60. Meng, W., Liu, K. L., Yu, C., Wang, X., Chang, Y., & Rishe, N., (1998).
Determining Text Databases to Search in the Internet, 1, 4, 5.
61. Migliore, L. A., & Chinta, R., (2017). Demystifying the big data
phenomenon for strategic leadership. SAM Advanced Management
Journal, (07497075), 82(1).
62. Minkiewicz, P., Darewicz, M., Iwaniak, A., Bucholska, J., Starowicz,
P., & Czyrko, E., (2016). Internet databases of the properties, enzymatic
reactions, and metabolism of small molecules—Search options and
applications in food science. International Journal of Molecular
Sciences, 17(12), 2039.
63. Minkiewicz, P., Iwaniak, A., & Darewicz, M., (2015). Using internet
databases for food science organic chemistry students to discover
chemical compound information. Journal of Chemical Education,
92(5), 874–876.
64. Moffatt, C., (1996). Designing client-server applications for enterprise
database connectivity. In: Database Reengineering and Interoperability
(Vol. 1, pp. 215–234). Springer, Boston, MA.
65. Mohr, E., Horn, F., Janody, F., Sanchez, C., Pillet, V., Bellon, B., &
Jacq, B., (1998). FlyNets and GIF-DB, two internet databases for
molecular interactions in Drosophila melanogaster. Nucleic Acids

Research, 26(1), 89–93.
66. Ng, C. K., White, P., & McKay, J. C., (2009). Development of a web
database portfolio system with PACS connectivity for undergraduate
health education and continuing professional development. Computer
Methods and Programs in Biomedicine, 94(1), 26–38.
67. Norrie, M. C., Palinginis, A., & Wurgler, A., (1998). OMS connect:
Supporting multidatabase and mobile working through database
connectivity. In: Proceedings 3rd IFCIS International Conference on
Cooperative Information Systems (Cat. No. 98EX122) (Vol. 1, pp.
232–240). IEEE.
68. Ou, C., & Zhang, K., (2006). Teaching with databases: Begin with the
internet. TechTrends, 50(5), 46.
69. Ouzzani, M., Benatallah, B., & Bouguettaya, A., (2000). Ontological
approach for information discovery in internet databases. Distributed
and Parallel Databases, 8(3), 367–392.
70. Poo, D., Kiong, D., & Ashok, S., (2008). Java database connectivity.
In: Object-Oriented Programming and Java (Vol. 1, pp. 297–314).
Springer, London.
71. Press, W. A., Olshausen, B. A., & Van, E. D. C., (2001). A graphical
anatomical database of neural connectivity. Philosophical Transactions
of the Royal Society of London. Series B: Biological Sciences,
356(1412), 1147–1157.
72. Quirós, M., Gražulis, S., Girdzijauskaitė, S., Merkys, A., & Vaitkus,
A., (2018). Using SMILES strings for the description of chemical
connectivity in the crystallography open database. Journal of
Cheminformatics, 10(1), 1–17.
73. Ramakrishnan, S., & Rao, B. M., (1997). Classroom projects on
database connectivity and the web. ACM SIGCSE Bulletin, 29(1),
116–120.
74. Regateiro, D. D., Pereira, Ó. M., & Aguiar, R. L., (2017). SPDC:
Secure proxied database connectivity. In: DATA (Vol. 1, pp. 56–66).
75. Robinson, J. L., Laird, A. R., Glahn, D. C., Lovallo, W. R., & Fox, P. T.,
(2010). Metaanalytic connectivity modeling: Delineating the functional
connectivity of the human amygdala. Human Brain Mapping, 31(2),
173–184.
76. Sanin, C., & Szczerbicki, E., (2006). Extending set of experience
knowledge structure into a transportable language extensible markup
language. Cybernetics and Systems: An International Journal, 37(2,
3), 97–117.
77. Saxena, V., & Kumar, S., (2012). Object-Oriented Database
Connectivity for Hand Held Devices, 1, 4–8.
78. Shahabi, C., Khan, L., & McLeod, D., (2000). A probe-based technique
to optimize join queries in distributed internet databases. Knowledge
and Information Systems, 2(3), 373–385.
79. Shum, A. C., (1997). Open Database Connectivity Development of the
Context Interchange System (Vol. 1, pp. 2–5). Doctoral dissertation,
Massachusetts Institute of Technology.
80. Song, H., & Gao, L., (2012). Use ORM middleware realize
heterogeneous database connectivity. In: 2012 Spring Congress on
Engineering and Technology (Vol. 1, pp. 1–4). IEEE.
81. Sperberg-McQueen, C. M., (2000). Extensible Markup Language
(XML) 1.0 (Vol. 1, pp. 4–9). World Wide Web Consortium.
82. Stephan, K. E., Kamper, L., Bozkurt, A., Burns, G. A., Young, M. P., &
Kötter, R., (2001). Advanced database methodology for the collation
of connectivity data on the macaque brain (CoCoMac). Philosophical
Transactions of the Royal Society of London; Series B: Biological
Sciences, 356(1412), 1159–1186.
83. Stephens, L. M., & Huhns, M. N., (1999). Database Connectivity Using
an Agent-Based Mediator System. Univ. of S. Carolina, report A, 38.
84. Sylvester, R. K., (1997). Incorporation of internet databases into
pharmacotherapy coursework. American Journal of Pharmaceutical
Education, 61(1), 50–54.
85. Ternon, E., Agyapong, P., Hu, L., & Dekorsy, A., (2014). Database-
aided energy savings in next generation dual connectivity heterogeneous
networks. In: 2014 IEEE Wireless Communications and Networking
Conference (WCNC) (Vol. 1, pp. 2811–2816). IEEE.
86. Van, D. S., (1999). Free patent databases on the internet: A critical
view. World Patent Information, 21(4), 253–257.
87. Van, S. M. A. M., & Winter, R. M., (1998). Internet databases for
clinical geneticists‐an overview 1. Clinical Genetics, 53(5), 323–330.
88. Vecchio, F., Miraglia, F., Judica, E., Cotelli, M., Alù, F., & Rossini,
P. M., (2020). Human brain networks: A graph theoretical analysis of
cortical connectivity normative database from EEG data in healthy

elderly subjects. GeroScience, 42(2), 575–584.
89. Vuong, N. N., Smith, G. S., & Deng, Y., (2001). Managing security
policies in a distributed environment using extensible markup language
(XML). In: Proceedings of the 2001 ACM Symposium on Applied
Computing (Vol. 1, pp. 405–411).
90. Wahle, S., Magedanz, T., Gavras, A., Hrasnica, H., & Denazis, S.,
(2009). Technical infrastructure for a pan-European federation of
testbeds. In: 2009 5th International Conference on Testbeds and Research
Infrastructures for the Development of Networks & Communities and
Workshops (Vol. 1, pp. 1–8). IEEE.
91. Xu, D., (2012). Protein databases on the internet. Current Protocols in
Protein Science, 70(1), 2–6.
92. Yerneni, R., Papakonstantinou, Y., Abiteboul, S., & Garcia-Molina,
H., (1998). Fusion queries over internet databases. In: International
Conference on Extending Database Technology (Vol. 1, pp. 55–71).
93. Yoshida, Y., Miyazaki, K., Kamiie, J., Sato, M., Okuizumi, S., Kenmochi,
A., & Yamamoto, T., (2005). Two‐dimensional electrophoretic profiling
of normal human kidney glomerulus proteome and construction of an
extensible markup language (XML)‐based database. Proteomics, 5(4),
1083–1096.
94. Yousfi, A., El Yazidi, M. H., & Zellou, A., (2020). xMatcher: Matching
extensible markup language schemas using semantic-based techniques.
International Journal of Advanced Computer Science and Applications,
1, 11(8).
CHAPTER 8
DATABASE ADMINISTRATION AND
SECURITY
CONTENTS
8.1. Introduction..................................................................................... 218
8.2. The Role of a Database in an Organization...................................... 220
8.3. Introduction of a Database............................................................... 222
8.4. The Evolution of Database Administration Function......................... 224
References.............................................................................................. 230
8.1. INTRODUCTION
To determine the monetary worth of data, consider what is recorded in a
company’s database: information on users, vendors, inventories, activities,
etc. How many chances would be wasted if the data is destroyed? What
exactly is the importance of data damage? A business whose whole
information is destroyed, for instance, would pay enormous costs and
expenses (Mata-Toledo & Reyes-Garcia, 2002). Throughout tax season,
the accountancy firm’s troubles would be exacerbated by a loss of data.
Data loss sets every business in a precarious situation. The firm may be not
capable of successfully managing everyday operations, it may lose clients
who expect quick and professional assistance, and it may miss out on the
chance to increase the customer base (Bertino & Sandhu, 2005).
Information is a useful resource that may be derived from data. If the
information is precise and accurate, it is necessary to motivate activities
that strengthen the quality of its products and produce wealth. In practice,
a company is susceptible to a data-information-decision loop; that is, the
data owner appears to apply intelligence to data to generate information
that serves as the foundation for the knowledge of the user when making
choices. Figure 8.1 depicts the cycle in question (Leong-Hong & Marron,
1978).
Figure 8.1. The cycle of data-information-decision making.
As seen in Figure 8.1, choices taken by upper-level managers result in
responses at lower organizational stages. These acts provide extra data to
Database Administration and Security 219
monitor the functioning of the firm. The extra data must now be reused
inside the context of data-information-decision making. Consequently,
statistics serve as the foundation for decision-making, long-term strategy,
monitoring, and operational tracking (Zygiaris, 2018).
Effective asset control is a fundamental performance element for
every firm. To arrange information as a business asset, executives must
comprehend the significance of analyzed information and data. There are
businesses (such as the one that gives credit files) that primary product is
data and one whose viability is entirely dependent on the data administration
(Said et al., 2009). Most firms are always looking for innovative methods
to maximize the value of their collected data. This advantage may manifest
in many ways, including data warehouses that allow enhanced relationship
management as well as a closer connection with suppliers and customers to
assist computerized logistics operations. As businesses grow more reliant
on data, the precision of that data grows increasingly vital. Such firms
face an even larger danger from soiled data, which includes inaccurate and
inconsistent information. Data may get soiled for several causes, including
(Bullers Jr et al., 2006):
• Absence of authenticity constraint implementation (not null,
uniqueness, referential integrity, etc.;
• Data entry punctuation mistakes;
• Utilization of equivalents and/or homophones throughout many
systems;
• Utilization of nontraditional acronyms in character information;
and
• Distortions of composite characteristics into basic attributes vary
amongst systems.
Many reasons for dirty data may be handled at the level of the database,
like the application of constraints correctly. Therefore, resolving the other
sources of stale information is more difficult. The migration of data between
networks, like in the establishment of a database system, is one cause of
soiled data. Data quality measures are often considered as attempts to
regulate soiled data (Tahir & Brézillon, 2013).
Data quality is an all-encompassing strategy for guaranteeing the
correctness, authenticity, and consistency of the data. The notion that
the quality of the data is exhaustive is crucial. Quality of information is
focused with over simply cleansing dirty data; also, it seeks to prevent
new data mistakes and boost user trust in the information. Large-scale data
quality efforts are often complicated and costly. Therefore, these activities
must be aligned with company objectives and have the support of senior
management. Although data quality initiatives vary widely from organization
to organization, the majority incorporate a combination of (Paul & Aithal,
2019):
• A framework of data governance accountable for data integrity;
• Evaluations of the present data integrity;
• Creation of data quality requirements following enterprise
objectives; and
• integration of tools and procedures to assure the integrity of the
new dataset.
Several tools may aid in the execution of data quality efforts.
Particularly, several manufacturers provide data characterization and master
data management (MDM) tools to aid in guaranteeing data quality. The
application that makes up data profiling technology collects metrics and
evaluates existing data sources. These programs evaluate current sets of
data to find sequences in data and thus can match the current sequences
in data to organizationally-defined criteria (El-Bakry & Hamada, 2010).
This study may assist the business in understanding the performance of the
manufactured data and identifying sources of soiled data. MDM software
aids in the prevention of stale data by organizing shared information across
many systems. MDM offers a _master_ copy of objects, like users, that
exist across several organizational systems. Even though these technology
techniques contribute significantly to data integrity, the total answer for high-
quality data inside an organization depends mainly on data management and
administration (Kahn & Garceau, 1985).
8.2. THE ROLE OF A DATABASE IN AN

ORGANIZATION
Different personnel in various departments utilize data for various purposes.
As a result, information management must include the idea of shared
information. When used correctly, the DBMS enables• (Yao et al., 2007):
• Data analysis and display in meaningful forms by converting data
to information.
• Data and information are exchanged with the right person at the
right moment.
• Data retention and tracking for a suitable duration of time.

• Internal and external control on data duplication and usage.
Regardless of the kind of organization, the primary objective of the
database is to enable management decision-making at all levels of the
business while maintaining data privacy and security (Khan et al., 2014).
The management structure of a company may be split into three
stages: top, intermediate, and functional. Strategic choices are made by
upper executives, tactical things are decided by middle management, and
ongoing operational choices are taken by operational management. Short-
term operating choices affect just daily activities, such as opting to modify
the cost of the product to remove it from stock. Tactical choices have a
longer time horizon and have a bigger impact on larger-scale activities,
such as modifying the cost of the product in reaction to competing demands
(Armstrong & Densham, 1990).
Strategic choices influence the organization’s ultimate well-being or
even existence, such as shifting price strategies throughout product offerings
to win a share of the market. The DBMS must offer tools that have a relevant
perspective of the information to every management level as well as enable
the needed degree of decision making. The actions listed below are common
at every management level. At the highest levels of management, the system
has to be capable to (Yusuf & Nnadi, 1988):
• Give the data required for decision making, strategy development,
policy development, and goal setting.
• Give entry to internally and externally data for identifying growth
prospects and trace the course of this growth. (The character of
the activities is referred to as path: can a corporation get to be a
services company, a production organization, or a few mix of the
two?)
• Establish a structure for developing and implementing
organizational policies. (Keep in mind that such policies are
transformed into business requirements at a lower level of the
company.)
• Increase the chances of a favorable rate of return for the
organization by looking out for innovative methods to cut
expenses and/or increase productivity.
• Give feedback to determine if the organization is meeting its
objectives.
At the middle management level, the database must be able to

(Rabinovitch & Wiese, 2007):
• Provide the data required for tactical judgments and strategy.
• Monitor and oversee the distribution and utilization of
organizational assets, as well as evaluate the effectiveness of
various organizations.
• Give a mechanism for regulating and protecting the database’s
safety and confidentiality. Data security entails safeguarding
data from unauthorized users’ unintentional or purposeful usage.
Privacy is concerned with people’s and organizations’ rights to
decide the “who, when, why, how, as well as what” of data use.
The database must have the ability to: Just at the level of operational
management, the system must be capable to (Bhardwaj et al., 2012):
• As accurately as possible describe and assist the company’s
activities. The database schema has to be adaptable enough to
include all essential current and anticipated data.
• Generate search queries within defined time frames. Please
remember that the performance requirements at lower levels of
the organization and operations become more stringent. As a
result, at the operational managerial level, the database should
allow faster answers to a bigger amount of transactions.
• Improve the company’s short-term functioning by delivering
accurate information for phone support, software engineering, as
well as computing environments.
A typical goal for every database is to ensure a constant stream of data
across the organization. The corporate or enterprise network is another name
for the company’s database. The enterprise database is described as “the
company’s collected data that supports all current and anticipated future
activities.” Most successful firms today rely on corporate databases to enable
all their activities, from planning to installation, sales to operations, and
everyday tactical decisions to the planning process (Dewett & Jones, 2001).
8.3. INTRODUCTION OF A DATABASE

Possessing a digitized database does not ensure that the data would be
utilized correctly to give managers the best options. A database management
system (DBMS) is a way of managing information; the same as any tool,
it should be utilized efficiently to deliver the intended results. Imagine this
metaphor: a hammer may help a craftsman make woodwork, but it can also
harm the grasp of a youngster. The presence of a computer central database
is not the answer to a company’s issues; instead, its proper administration
and utilization are (King, 2006).
The implementation of a DBMS involves a major shift and challenge;
the DBMS can make a significant influence on the company, which may be
favorable or bad depending on how it is managed. One important factor,
for instance, is tailoring the DBMS to the company rather than pushing
the company to conform to the DBMS (Yarkoni et al., 2010). The primary
concern ought to be the demands of the company, not the parts as shown in
the figure of the DBMS. However, implementing a DBMS cannot be done
without hurting the company. The stream of fresh development management
data has a significant impact on how the company performs and, as a result,
its company culture (Motomura et al., 2008).
The implementation of a DBMS within an enterprise has been defined
as a three-step process that contains three key components (Schoenbachler
& Gordon, 2002):
• Technological. Database management system computer
equipment;
• Managerial and administrative functions; and
• Cultural. Corporate aversion to change.
The technical part consists of choosing, implementing, setting, and
maintaining the DBMS to ensure that it manages data storage, access, and
security effectively. The individual or persons in charge of dealing with
the technological aspects of the DBMS system should have the technical
abilities required to give or obtain acceptable assistance for the DBMS’s
multiple users: programmers, administrators, and end-users. As a result,
database administration manpower is a critical technical concern in the
DBMS implementation. To ensure a successful move to the new social
context, the chosen employees must have the correct balance of technical
and administrative abilities (Kalmegh & Navathe, 2012).
The management component of DBMS implementation should not
be overlooked. A high-quality DBMS, like having the greatest racing car,
does not ensure a high-quality information system (Dunwell et al., 2000).
The implementation of a DBMS into an organization requires meticulous
preparation to develop an appropriate leadership system to support the
person or individuals in charge of managing the DBMS. A very well checking
and regulating function should be applied to the organizational structure.
Administrative professionals must have good communication skills, as well

as a thorough grasp of organizational and commercial concepts (Chen et al.,
2021).
Top management should be responsible for the current system to
identify and promote the organization’s data administration responsibilities,
objectives, and roles. The cultural effect of implementing a database system
should be carefully considered. The presence of the DBMS is influential
on individuals, functions, as well as relationships. For instance, extra staff
may be engaged, current personnel may be assigned new duties, and job
performance may be assessed using new criteria (Goldewijk 2001).
Since the database strategy generates a more regulated and organized
information flow, a cultural influence is anticipated. Department managers
who are accustomed to managing their data must relinquish subjective
control of their information to the data administration role and distribute it
with the others of the firm. App developers must understand and adhere to
new engineering and design guidelines (Chen et al., 2004; Amano & Maeda,
1987). Managers may experience what they perceive to be additional data
and may take time to acclimatize to the new environment. Whenever the new
database becomes life, users may be hesitant to utilize the data offered by the
system, questioning its worth or veracity. (Many people will be astonished,
if not outraged, to learn that the truth contradicts their preconceived
conceptions and deeply held views.) The database administration has to be
ready to welcome target customers, respond to their problems, address those
issues where feasible, and train end-users on the program’s applications and
advantages (Buhrmester, 1998).
8.4. THE EVOLUTION OF DATABASE

ADMINISTRATION FUNCTION
The origins of data administration may be traced back to the ancient,
decentralized universe of the system files. The expense of data and
managerial repetition in these kinds of file systems prompted the creation
of an electronic data processing (EDP) or data processing (DP) division as
a centralized information administration role (Fry & Sibley, 1976). The DP
department’s job was to aggregate all computing funds to provide operational
assistance to every division. The DP administration role was granted the
ability to administer all current business file systems, as well as to address
data and management issues resulting from duplication of data and/or abuse
(Kahn & Garceau, 1985).
The introduction of the DBMS and its shared view of data resulted in a
new stage of data management complexity, transforming the DP department
into an information systems (IS) division (Sherif, 1984):
• A service function to offer end-users continuous data management
assistance was added to the IS department’s duties.
• A production function that uses integrated app or management
computational modeling to give end-users customized solutions
for their data needs.
The IS department’s organization’s internal design mirrored its functional
emphasis. Figure 8.2 depicts how most IT departments were organized. The
IS app development section was divided up by the category of the backed
system as the supply for app development expanded: financial reporting,
stock, advertising, and so on (Mistry et al., 2013). Even so, because of this
advancement, the data management duties were split. The database operations
section was in control of applying, tracking, and attempting to control the
DBMS operations, while the app development section was in process of
organizing database specifications and logical data layout (Gillenson, 1991).
Figure 8.2. The IS department’s internal organization.
Source: https://www.researchgate.net/figure/IIASAs-internal-organizational-
structure_fig1_318585146.
The surface area as well as the role of the DBA function, as well as its
location within a company’s activities, differ from one company to the next.
The DBA role may be classified as either a staff or a line position in the
organization structure (Barki et al., 1993). Placing the DBA role on staff
often results in a consulting environment, in which the DBA may establish
a data administration plan but lacks the power to enforce it or handle any
disputes. In a line role, the DBA is responsible for planning, defining,
implementing, and enforcing the rules, standards, and procedures utilized in
the data administration activities. Figure 8.3 illustrates the two alternative
DBA function positions (Jain & Ryu, 1988).
Figure 8.3. The placement of the DBA function.
How well the DBA role fits within an organizational process is not
standardized. In part, this is since the DBA function is perhaps the most
flexible of all organizational roles. In reality, the rapid evolution of DBMS
technology necessitates a shift in organizational structures (Weber &
Everest, 1979):
• The growth of database systems may compel a business to
democratize its data management role. The distributed system
enables the System Administrator to establish and assign the
duties of every local DBA, hence putting new and complicated
coordination activities on the systems DBA.
• The rising complexity and capability of microcomputer-based

DBMS packages facilitate the creation of user-friendly, less
expensive, and effective department-specific solutions.
• Microcomputer-based DBMS programs are getting more complex
and capable, which makes it easier to make department-specific
solutions that are easy to use, less expensive, and work well.
However, such an atmosphere is also conducive to data redundancy, not
to add the challenges caused by those who do not have the technical expertise
to construct effective database designs. In summary, the new microcomputer
context necessitates that the database administrator (DBA) acquires a new
set of management and technical abilities (Teng & Grover, 1992).
It is customary to characterize the DBA function by separating DBA
actions into stages of the Database Life Cycle (DBLC). If this strategy is
implemented, the DBA role will need staff for the following cases (Houser
& Pecchioli, 2000):
• Scheduling of databases, such as the development of rules,
processes, and enforcement.
• Collection of database needs and design concepts.
• Algebraic and transactional database architecture.
• Physical data development and construction.
• Database validation and troubleshooting.
• Maintenance and operation of databases, involving setup,
transformation, and transfer.
• Database instruction and assistance.
• Data quality control and reporting.
Per this paradigm, Figure 8.4 depicts a suitable DBA functional structure
(Aiken et al., 2013).
Figure 8.4. A DBA functional organization.
Source: https://documen.site/download/chapter-15–29_pdf.
Consider that a business may have many incompatible DBMSs
deployed to serve distinct processes. For instance, it is fairly unusual for
organizations to have a hierarchical DBMS to handle daily transactions at
the top management and a database system to meet the ad hoc data needs of
middle and upper management (LEONARD, 1990). Several microcomputer
DBMSs may also be implemented in the various departments. In such a
scenario, the business may allocate one DBA to each DBMS. Figure 8.5
depicts the job of a systems administrator, also known as the general
coordinator of all DBAs (Ramanujam & Capretz, 2005).
Figure 8.5. Multiple database administrators in an organization.
Source: http://onlineopenacademy.com/database-administrator/.
In the data management role, there is an increasing tendency toward

specialization. For example, some of the bigger businesses’ organizational
charts distinguish between a DBA and a data administrator (DA). The DA,
sometimes known as the information resource manager (IRM), reports
directly to senior management and is given more responsibility and power
than the DBA, however, the two functions overlap to some extent (Guynes
& Vanecek, 1995; Roddick, 1995).
The DA is in charge of the company’s comprehensive data resources,
both electronic and manual. Because the DA is in charge of handling not just
computerized data but also data beyond the purview of the DBMS, the DA’s
job description encompasses a broader range of actions than the DBA’s.
The DBA’s position within the enlarged organizational structure may differ
from one firm to the next. The DBA may report to the DA, the IRM, the IS
manager, or the company’s CEO, depending on the structure’s components
(Rose, 1991; Thomas et al., 2006).
REFERENCES
1. Aiken, P., Gillenson, M., Zhang, X., & Rafner, D., (2013). Data
management and data administration: Assessing 25 years of practice.
In: Innovations in Database Design, Web Applications, and Information
Systems Management (Vol. 1, pp. 289–309). IGI Global.
2. Amano, K., & Maeda, T., (1987). Database management in research
environment. In: Empirical Foundations of Information and Software
Science III (Vol. 1, pp. 3–11). Springer, Boston, MA.
3. Armstrong, M. P., & Densham, P. J., (1990). Database organization
strategies for spatial decision support systems. International Journal
of Geographical Information Systems, 4(1), 3–20.
4. Barki, H., Rivard, S., & Talbot, J., (1993). A keyword classification
scheme for IS research literature: An update. MIS Quarterly, 1, 209–
226.
5. Bertino, E., & Sandhu, R., (2005). Database security-concepts,
approaches, and challenges. IEEE Transactions on Dependable and
Secure Computing, 2(1), 2–19.
6. Bhardwaj, A., Singh, A., Kaur, P., & Singh, B., (2012). Role of
fragmentation in distributed database system. International Journal of
Networking & Parallel Computing, 1(1), 2–7.
7. Buhrmester, D., (1998). Need Fulfillment, Interpersonal Competence,
and the Developmental Contexts of Early Adolescent Friendship, 1,
1–9.
8. Bullers, Jr. W. I., Burd, S., & Seazzu, A. F., (2006). Virtual machines-
an idea whose time has returned: Application to network, security, and
database courses. ACM SIGCSE Bulletin, 38(1), 102–106.
9. Chen, J., Hong, H., Huang, M., & Kubik, J. D., (2004). Does fund size
erode mutual fund performance? The role of liquidity and organization.
American Economic Review, 94(5), 1276–1302.
10. Chen, X., Yang, H., Liu, G., & Zhang, Y., (2021). NUCOME: A
comprehensive database of nucleosome organization referenced
landscapes in mammalian genomes. BMC Bioinformatics, 22(1), 1–15.
11. Dewett, T., & Jones, G. R., (2001). The role of information technology
in the organization: A review, model, and assessment. Journal of
Management, 27(3), 313–346.
12. Dunwell, J. M., Khuri, S., & Gane, P. J., (2000). Microbial relatives of
the seed storage proteins of higher plants: Conservation of structure and
diversification of function during evolution of the cupin superfamily.

Microbiology and Molecular Biology Reviews, 64(1), 153–179.
13. El-Bakry, H. M., & Hamada, M., (2010). A developed watermark
technique for distributed database security. In: Computational
Intelligence in Security for Information Systems 2010 (Vol. 1, pp. 173–
14. Fry, J. P., & Sibley, E. H., (1976). Evolution of data-base management
15. Gillenson, M. L., (1991). Database administration at the crossroads:
The era of end-user-oriented, decentralized data processing. Journal of
Database Management (JDM), 2(4), 1–11.
16. Goldewijk, K. K., (2001). Estimating global land use change over the
past 300 years: The HYDE database. Global Biogeochemical Cycles,
15(2), 417–433.
17. Guynes, C. S., & Vanecek, M. T., (1995). Data management issues in
information systems. Journal of Database Management (JDM), 6(4),
3–13.
18. Houser, J., & Pecchioli, M., (2000). Database administration for
spacecraft operations-the integral experience. ESA Bulletin, 1, 100–
107.
19. Jain, H. K., & Ryu, H. S., (1988). The Issue of Site Autonomy in
Distributed Database Administration, 1, 2–5.
20. Kahn, B. K., & Garceau, L. R., (1985). A developmental model of the
database administration function. Journal of Management Information
Systems, 1(4), 87–101.
21. Kalmegh, P., & Navathe, S. B., (2012). Graph database design
challenges using HPC platforms. In: 2012 SC Companion: High
Performance Computing, Networking Storage and Analysis (Vol. 1,
pp. 1306–1309). IEEE.
22. Khan, S. A., Saqib, M., & Al Farsi, B., (2014). Critical Role of a
Database Administrator: Designing Recovery Solutions to Combat
Database Failures, 1, 3–9.
23. King, W. R., (2006). The critical role of information processing in
creating an effective knowledge organization. Journal of Database
Management (JDM), 17(1), 1–15.
24. LEONARD, B., (1990). Quality control for a shared multidisciplinary
database. Data Quality Control: Theory and Pragmatics, 112, 43.
25. Leong-Hong, B., & Marron, B. A., (1978). Database Administration:

Concepts, Tools, Experiences, and Problems (Vol. 28, pp. 4–8).
National bureau of standards.
26. Mata-Toledo, R. A., & Reyes-Garcia, C. A., (2002). A model course for
teaching database administration with personal oracle 8 i. Journal of
Computing Sciences in Colleges, 17(3), 125–130.
27. Mistry, J., Finn, R. D., Eddy, S. R., Bateman, A., & Punta, M., (2013).
Challenges in homology search: HMMER3 and convergent evolution
of coiled-coil regions. Nucleic Acids Research, 41(12), e120-e121.
28. Motomura, N., Miyata, H., Tsukihara, H., Okada, M., Takamoto, S., &
Organization, J. C. S. D., (2008). First report on 30-day and operative
mortality in risk model of isolated coronary artery bypass grafting in
Japan. The Annals of Thoracic Surgery, 86(6), 1866–1872.
29. Paul, P., & Aithal, P. S., (2019). Database security: An overview
and analysis of current trend. International Journal of Management,
Technology, and Social Sciences (IJMTS), 4(2), 53–58.
30. Rabinovitch, G., & Wiese, D., (2007). Non-linear optimization of
performance functions for autonomic database performance tuning.
In: Third International Conference on Autonomic and Autonomous
Systems (ICAS’07) (Vol. 1, pp. 48–48). IEEE.
31. Ramanujam, S., & Capretz, M. A., (2005). ADAM: A multi-agent
system for autonomous database administration and maintenance.
International Journal of Intelligent Information Technologies (IJIIT),
1(3), 14–33.
32. Roddick, J. F., (1995). A survey of schema versioning issues for
database systems. Information and Software Technology, 37(7), 383–
393.
33. Rose, E., (1991). Data modeling for non-standard data. Journal of
Database Management (JDM), 2(3), 8–21.
34. Said, H. E., Guimaraes, M. A., Maamar, Z., & Jololian, L., (2009).
Database and database application security. ACM SIGCSE Bulletin,
41(3), 90–93.
35. Schoenbachler, D. D., & Gordon, G. L., (2002). Trust and customer
willingness to provide information in database-driven relationship
marketing. Journal of Interactive Marketing, 16(3), 2–16.
36. Sherif, M. A. J., (1984). The Impact of Database Systems on

Organizations: A Survey with Special Reference to the Evolution of
the Database Administration Function (Vol. 1, pp.3–9). Doctoral
dissertation, City University Londo.
37. Tahir, H., & Brézillon, P., (2013). Shared context for improving
collaboration in database administration. International Journal of
Database Management Systems, 5(2), 13.
38. Teng, J. T., & Grover, V., (1992). An empirical study on the determinants
of effective database management. Journal of Database Management
(JDM), 3(1), 22–34.
39. Thomas, P. D., Kejariwal, A., Guo, N., Mi, H., Campbell, M. J.,
Muruganujan, A., & Lazareva-Ulitsky, B., (2006). Applications for
protein sequence–function evolution data: MRNA/protein expression
analysis and coding SNP scoring tools. Nucleic Acids Research,
34(suppl_2), 645–650.
40. Weber, R., & Everest, G. C., (1979). Database administration:
Functional, organizational, & control perspectives. EDPACS: The EDP
Audit, Control, and Security Newsletter, 6(7), 1–10.
41. Yao, B., Yang, X., & Zhu, S. C., (2007). Introduction to a large-scale
general purpose ground truth database: Methodology, annotation tool
and benchmarks. In: International Workshop on Energy Minimization
Methods in Computer Vision and Pattern Recognition (Vol. 1, pp. 169–
42. Yarkoni, T., Poldrack, R. A., Van, E. D. C., & Wager, T. D., (2010).
Cognitive neuroscience 2.0: Building a cumulative science of human
brain function. Trends in Cognitive Sciences, 14(11), 489–496.
43. Yusuf, A., & Nnadi, G., (1988). Structure and functions of computer
database systems. Vikalpa, 13(4), 37–44.
44. Zygiaris, S., (2018). DataBase: Administration and security. In:
Database Management Systems (Vol. 1, pp. 3–10). Emerald Publishing
Limited.
INDEX
A competent judgment 5
complex relational data software 56
Active Server Pages (ASPs) 188
Component Object Model (COM)
ActiveX Data Objects (ADO) 188,
186
200
computer-based programs 3
Ada 81
computer database 6
advertising 225
computer language 7
Amazon 2
computer simulation 2
asset control 219
conceptual design 149
authenticity constraint 125
Conference on Data Systems and
automatic transmission 56
Languages 74
B
D
basic application programming in-
Data 45, 47, 48, 52, 54, 56, 66, 69,
terface (API) 182
70, 72
budget 140
data administration 54, 219, 224,
business rules 48, 50, 51, 52, 66, 67,
226, 230
68, 69, 70, 71, 72
database architecture 47, 51
C database configuration 52
Database connectivity 180, 181, 211
C# 7, 32
database design 141, 143, 146, 147,
cloud services 2
149, 150, 151, 152, 153, 154,
Cloud storage 2
155, 156, 158
COBOL 81, 84
Database development 143
commercial contract 52
database management systems
commercial database management
(DBMSs) 2
systems 74
database modeling 46, 62 dynamic-link libraries (DLLs) 183

Database planning 144
E
database schema 78, 80, 81, 82
database server 8, 9, 10, 29, 31, 32 electronic data processing (EDP)
Database System Development 224
Lifecycle (DSDLC) 141 Extensible Markup Language
database systems 2, 10, 13, 24, 28, (XML) 180, 209, 215
31, 34, 35, 36, 37, 38, 40, 42,
F
43
Data Base Task Group (DBTG) 74 Facebook 2
data communication 181, 186 file organization system 9
data consistency 125 financial reporting 225
Data Definition Language (DDL) Flickr 2
162 Fortran 81
data dictionary 144
data ecosystem 6 G
Data entry 219 Google 2, 3
Data Flow Diagrams (DFD) 146 graphics generator 85
Data gathering 3
data integrity 112, 133 H
Data manipulation 82 Hierarchical Input Process Output
Data Manipulation Language (HIPO) 146
(DML) 162
data modeling 46, 47, 55, 60, 70 I
data modification 160 information corruption 8
data processing (DP) 224 information model 46, 66
Data quality 219, 227 Information Resource Dictionary
data retrieval 83 System (IRDS) 165
data security 50 information systems (IS) 143
dataset 129 Information Systems Lifecycle
data source 180, 181, 184, 185, 187, (ISLC) 141
191 Interactive Graphics Retrieval Sys-
data source name (DSN) 184 tem (INGRES) 113
data system 3 International Organization for
data transformation 82 Standardization (ISO) 113
data warehouses 219 Internet 180, 192, 193, 194, 195,
decision-making 219, 221 196, 197, 198, 199, 200, 201,
deductive reasoning 114 204, 205, 209, 211, 213, 215
desktop database 10 inventory management system 47
Index 237
J programming language 55, 58

Java 7, 32, 81 Q
L Query-By-Example (QBE) 84
query language 83, 85
logical design 149
R
M
referential integrity 125, 127, 138
macro environment 3
Relational Database Language
management system 7, 8
(RDL) 163
master data management (MDM)
Relational Database Management
220
System (RDBMS) 112
metadata 82, 92, 100
relational data model 112, 113, 114,
Microsoft Access reports 180
117, 120, 122, 125, 126
mission statement 144, 149
relational paradigm 113, 114, 126,
mobile devices 180
128
N Reliable data 6
Remote Data Access (RDA) 165
Network Database Languages
(NDL) 164 S
O social networks 2
software crisis 140
object-oriented programming lan-
Software Development Lifecycle
guages 185
(SDLC) 141
OnLine Analytical Processing
software failure 140
(OLAP) 165
software systems 52
Open Database Connectivity
sound judgment 46
(ODBC), 182, 200
stale information 219
operating system 78, 98
Standards Planning and Require-
Operational Logical Processing 164
ments Committee (SPARC)
operational management 221, 222
74
organizational sustainability 6
standard transmitting 56
P statistics 3, 4, 5, 6, 7, 8, 9, 11, 12,
14, 17, 18, 19, 21, 22, 27, 28,
Pascal 81 30, 33
personality 54 stock control 145
physical design 149, 151, 153 strategy development 221
policy development 221 Structured Analysis and Design
Privacy 222 (SAD) 146
Procedural languages 83
Structured Query Language (SQL) video files 2

59 Visual Basic applications 180
Systems Application Architecture Visual Studio 7
(SAA) 165
W
T
Web development 188
Twitter 2 Web front ends 180
Twitter posts 2 Web interfaces 180, 199
U Y
Universal Data Access (UDA) 181 Yahoo 2
V
vernacular 164

Adele Kuzmiakova - The Creation and Management of Database Systems-Arcler Press (2023)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adele Kuzmiakova - The Creation and Management of Database Systems-Arcler Press (2023)

Uploaded by

Copyright:

Available Formats

The Creation and Management of

e-book Edition 2023

ISBN: 978-1-77469-673-6 (e-book)

© 2023 Arcler Press

ISBN: 978-1-77469-442-8 (Hardcover)

Adele Kuzmiakova is a machine learning engineer focusing on solving problems

Chapter 1 Introduction to Database Systems.............................................................. 1

Chapter 2 Data Models............................................................................................. 45

Chapter 3 Database Environment............................................................................. 73

Chapter 4 The Relational Model............................................................................. 111

Chapter 5 Database Planning and Design............................................................... 139

Chapter 6 Data Manipulation................................................................................. 159

Chapter 7 Database Connectivity and Web Technologies...................................... 179

Figure 1.1. Taking raw data and turning it into information

3GL 3rd-generation language

1.2. WHY DATABASES?

1.3. DATA VS. INFORMATION

Figure 1.1. Taking raw data and turning it into information.

regarding a certain topic. Knowledge entails being acquainted with,

1.4. INTRODUCING THE DATABASE

Metadata, in a husk, delivers an extra complete view of statistics in a

1.4.1. Role and Advantages of the DBMS

The employment of a DBMS among both the employer’s programs and

• Data Accessibility Has Been Improved: T The Database allows

1.4.2. Types of Databases

the structure in which it remained collected, remains mentioned as formless

1.5. IMPORTANCE OF DATABASE DESIGN

1.6. EVOLUTION OF FILE SYSTEM DATA

1.6.1. Manual File Systems

1.6.2. Computerized File Systems

Figure 1.3. The customer file’s contents.

Table 1.2. Terminology for Basic Files

Figure 1.4. The agent file’s contents.

Figure 1.5. A straightforward file structure.

The introduction of computer systems toward stock firm statistics was

1.6.3. File System Redux

users to do complex data investigations, considerably improving their

1.7. PROBLEMS WITH FILE SYSTEM DATA

data restoration activity necessitates a large amount of software.

of abilities. Furthermore, when using alerts to improve data and

1.7.1. Structural and Data Dependence

1.7.2. Data Redundancy

concerns and strategies related to data protection are explored in

the data redundancy are completed correctly, a data anomaly

1.7.3. Lack of Design and Data-Modeling Skills

The ability to model data is also an important aspect of a design

1.8. DATABASE SYSTEMS

1.8.1. The Database System Environment

Figure 1.7. The database system’s surroundings.

• The DBMS program administers the database inside the DBMS.

• End users are the individuals who utilize software applications

database administration. Lastly, the database technology currently in use is

1.8.2. DBMS Functions

Figure 1.8. Using Microsoft SQL server express to visualize information.

DBMS arranges the physically received data to fulfill the user’s

Figure 1.9. Using oracle to demonstrate data storage management.

• Access Control for many Users: To guarantee consistency of

provide information on time and frequency of field reversals. Science,

ACM International Conference on Management of Data (Vol. 1, pp.

been significantly cut down because database administrators (DBAs)

2.2. IMPORTANCE OF DATA MODELS

Data is seen differently by application developers, who are more

2.3. DATA MODEL BASIC BUILDING BLOCKS

company’s current structure. Every manager, who is also a worker,