Download as pdf or txt
Download as pdf or txt
You are on page 1of 295

Prepared by:

==============➔
Dedication
I dedicate all my efforts to my reader who gives me an urge and inspiration
to work more.

Muhammad Sharif
Author
Database Systems Handbook

Brief Contents
About This Book and Author .....................................................................................................1
Index of book chapters .............................................................................................................2
CHAPTER 1 INTRODUCTION TO DATABASE AND DATABASE MANAGEMENT SYSTEM
1.1 Data, information, Database, Distributed Database, Database Management system......4
CHAPTER 2 DATA TYPES, DATABASE KEYS, SQL FUNCTIONS AND OPERATORS
2.1 Introduction ……………………........................................................................................………20
CHAPTER 3 DATA MODELS, ITS TYPES, AND MAPPING TECHNIQUES
3.1 Introduction ...................................................................................................................... 41
CHAPTER 4 DISCOVERING BUSINESS RULES AND DATABASE CONSTRAINTS
4.1 Introduction .......................................................................................................................82
CHAPTER 5 DATABASE DESIGN STEPS AND IMPLEMENTATIONS
5.1 Introduction .......................................................................................................................86
CHAPTER 6 DATABASE NORMALIZATION AND DATABASE JOINS
6.1 Introduction…………………………………………. ........................................................................122
CHAPTER 7 FUNCTIONAL DEPENDENCIES IN THE DATABASE MANAGEMENT SYSTEM
7.1 Introduction .....................................................................................................................135
CHAPTER 8 DATABASE TRANSACTION, SCHEDULES, AND DEADLOCKS
8.1 Introduction………………. ................................................................................................... 139
CHAPTER 9 RELATIONAL ALGEBRA AND QUERY PROCESSING
9.1 Introduction……................................................................................................................ 162
CHAPTER 10 FILE STRUCTURES, INDEXING, AND HASHING
10.1 Introduction .................................................................................................................. 200
CHAPTER 11 DATABASE USERS AND DATABASE SECURITY MANAGEMENT
11.1 Introduction .................................................................................................................. 211
CHAPTER 12 BUSINESS INTELLIGENCE TERMINOLOGIES IN DATABASE SYSTEMS
12.1 Introduction .................................................................................................................. 224
CHAPTER 13 DBMS INTEGRATION WITH BPMS
13.1 Introduction .................................................................................................................. 251
CHAPTER 14 RAID STRUCTURE AND MEMORY MANAGEMENT
14.1 Introduction .................................................................................................................. 253
CHAPTER 15 ORACLE DATABASE FUNDAMENTAL AND ITS ADMINISTRATION
15.1 Introduction ………………………………………………………………………………………………….…...…...271
CHAPTER 16 LOGS MANAGEMENT, DATABASE BACKUPS AND RECOVERY
16.1 Introduction .................................................................................................................. 292

BY: MUHAMMAD SHARIF 2


Database Systems Handbook

Acknowledgments
We are grateful to numerous individuals who contributed

to the preparation of relational database systems and

management, 1st edition. First, we wish to thank our

reviewers for their detailed suggestions and insights,

characteristic of their thoughtful teaching style.

All glories praises and gratitude to Almighty Allah, who


blessed us with a super and unequaled Professor as ‘Brain’.

BY: MUHAMMAD SHARIF 3


Database Systems Handbook

CHAPTER 1 INTRODUCTION TO DATABASE AND DATABASE MANAGEMENT SYSTEM


What is Data?
Data are the raw bits and pieces of information with no context. If I told you, “15, 23, 14, 85,” you would not have
learned anything. But I would have given you data. Data are facts that can be recorded, having explicit meaning.
Major Types of data:
Data can be quantitative or qualitative.
1. Quantitative data is numeric, the result of a measurement, count, or some other mathematical calculation.
2. Qualitative data is descriptive “Ruby Red”.
We can classify data as structured, unstructured, or semi-structured data.
1. Structured data is generally quantitative data, it usually consists of hard numbers or things that can be
counted.
2. Unstructured data is generally categorized as qualitative data, and cannot be analyzed and processed using
conventional tools and methods.
3. Semi-structured data refers to data that is not captured or formatted in conventional ways. Semi-
structured data does not follow the format of a tabular data model or relational databases because it does
not have a fixed schema. XML, JSON are semi-structured example.
Properties:
Structured data is generally stored in data warehouses.
Unstructured data is stored in data lakes.
Structured data requires less storage space while Unstructured data requires more storage space.
Examples:
Structured data (Table, tabular format, or Excel spreadsheets.csv)
Unstructured data (Email and Volume, weather data)
Semi-structured data (Webpages, Resume documents, XML)

BY: MUHAMMAD SHARIF 4


Database Systems Handbook

➢ Category of Data

What is a data item?


The basic component of a file in a file system is a data item.
What are records?
A group of related data items treated as a single unit by an application is called a record.
What is the file?
A file is a collection of records of a single type. A simple file processing system refers to the first computer-based
approach to handling commercial or business applications.
Mapping from file system to Relational Databases:
In a relational database, a data item is called a column or attribute; a record is called a row or tuple, and a file is
called a table.
What is information?
When we organized data that has some meaning, we called information.
What is the database?
A database is an organized collection of related information or collection of related data. It is an interrelated
collection of many different types of database objects (tables, indexes).

BY: MUHAMMAD SHARIF 5


Database Systems Handbook

What is Database Application?


A database application is a program or group of programs that are used for performing certain operations on the
data stored in the database. These operations may contain insertion of data into a database or extracting some data
from the database based on a certain condition, updating data in the database.
Example: Geographic Information Systems (GIS).
What is Knowledge?
Knowledge = information + application
What is Meta Data?
The database definition or descriptive information is also stored by the DBMS in the form of a database catalog or
dictionary; it is called meta-data. Data that describe the properties or characteristics of end-user data and the
context of those data. Information about the structure of the database.
Example Metadata for Relation Class Roster catalogs (Attr_Cat(attr_name, rel_name, type, position like 1,2,3, access
rights on objects. What is the position of attribute in the relation?). Simple definition is data about data.

BY: MUHAMMAD SHARIF 6


Database Systems Handbook

What is Shared Collection?


The logical relationship between data. Data inter-linked between data is called a shared collection. It means data is
in the repository and we can access it.
What is Database Management System (DBMS)?
A database management system (DBMS) is a software package or programs designed to define, retrieve, Control,
manipulate data, and manage data in a database.
What are database systems?
A shared collection of logically related data (comprises entities, attributes, and relationships), is designed to meet
the information needs of the organization. The database and DBMS software together is called a database system.
Components of a Database Environment
1. Hardware (Server),
2. Software (DBMS),
3. Data,
4. Procedure (Govern the design of database)
5. Resources (Who Administer database.
History of Databases
From 1970 to 1972, E.F. Codd published a paper proposing using a relational database model. RDBMS is originally
based on E.F. Codd's relational model invention. Before DBMS, there was a file-based system in the era the 1950s.
Evolution of DB Systems:
➢ Flat files - 1960s - 1980s
➢ Hierarchical – 1970s - 1990s
➢ Network – 1970s - 1990s
➢ Relational – 1980s - present
➢ Object-oriented – 1990s - present
➢ Object-relational – 1990s - present
➢ Data warehousing – 1980s - present
➢ Web-enabled – 1990s – present
Here, are the important landmarks from history:
➢ 1960 – Charles Bachman designed the first DBMS system
➢ 1970 – Codd introduced IBM’S Information Management System (IMS)
➢ 1976- Peter Chen coined and defined the Entity-relationship model also known as the ER model
➢ 1980 – Relational Model becomes a widely accepted database component
➢ 1985- Object-oriented DBMS develops.
➢ 1990- Incorporation of object-orientation in relational DBMS.
➢ 1991- Microsoft ships MS access, a personal DBMS and that displaces all other personal DBMS products.
➢ 1995: First Internet database applications
➢ 1997: XML applied to database processing. Many vendors begin to integrate XML into DBMS products.

The ANSI-SPARC Database Architecture is set up into three levels.


1. The Internal Level (Physical Representation of Data)
2. The Conceptual Level (Holistic Representation of Data)
3. The External Level (User Representation of Data)
Internal level store data physically. The conceptual level tells you how the database was structured logically. External
level gives you different data views. This is the uppermost level in the database.

BY: MUHAMMAD SHARIF 7


Database Systems Handbook

Database Systems Architecture is as following:

Database architecture tiers


Database architecture has 4 types of tiers.
Single tier architecture (for local applications direct communication with database server/disk. It is also called
physical centralized architecture.

2-tier architecture (basic client-server APIs like ODBC, JDBC, and ORDS are used), Client and disk are connected by
APIs called network.
3-tier architecture (Used for web applications, it uses a web server to connect with a database server).

BY: MUHAMMAD SHARIF 8


Database Systems Handbook

Advantages of four tiered ANSI-SPARC Architecture


The ANSI-SPARC standard architecture is three-tiered- but some books refer 4 tiers. These 4-tiered representation
offers several advantages, which are as follows:
Its main objective of it is to provide data abstraction.
Same data can be accessed by different users with different customized views.
The user is not concerned about the physical data storage details.
Physical storage structure can be changed without requiring changes in the internal structure of the
database as well as users view.
The conceptual structure of the database can be changed without affecting end users.

BY: MUHAMMAD SHARIF 9


Database Systems Handbook

It makes the database abstract.


It hides the details of how the data is stored physically in an electronic system, which makes it easier to
understand and easier to use for an average user.
It also allows the user to concentrate on the data rather than worrying about how it should be stored.
Types of databases
There are various types of databases used for storing different varieties of data in their respective DBMS data model
environment. Each database has data models except NoSQL. One is Enterprise Database Management System that
is not included in this figure. I will write details one by one in where appropriate. Sequence of details is not necessary.

Parallel database architectures


Some possible architectures are:
1. Shared-memory
2. Shared-disk
3. Shared-nothing (the most common one)
4. Shared Everything Architecture
5. Non-Uniform Memory Architecture
A hierarchical model system is a hybrid of the shared memory system, a shared disk system, and a shared-nothing
system. The hierarchical model is also known as Non-Uniform Memory Architecture (NUMA). NUMA uses local and
remote memory (Memory from another group); hence it will take a longer time to communicate with each other.
Advantages of NUMA
Improves the scalability of the system.
Memory bottleneck (shortage of memory) problem is minimized in this architecture.

BY: MUHAMMAD SHARIF 10


Database Systems Handbook

Distributed Databases
Distributed database system (DDBS) = DB + Communication
A set of databases in a distributed system that can appear to applications as a single data source.
A distributed DBMS (DDBMS) can have the actual database and DBMS software distributed over many sites,
connected by a computer network.

Distributed DBMS architectures


Three alternative approaches are used to separate functionality across different DBMS-related
processes. These alternative distributed architectures are called
1. Client-server,
2. Collaborating server or multi-Server
3. Middleware or Peer-to-Peer
Explanation
Client-server: Client can send query to server to execute. There may be multiple server process.
The two different client-server architecture models are:
1. Single Server Multiple Client
2. Multiple Server Multiple Client
Client Server architecture layers
1. Presentation layer
2. Logic layer
3. Data layer

BY: MUHAMMAD SHARIF 11


Database Systems Handbook

• Presentation layer
The basic work of this layer provides a user interface. The interface is a graphical user interface. The graphical user
interface is an interface that consists of menus, buttons, icons, etc. The presentation tier presents information
related to such work as browsing, sales purchasing, and shopping cart contents. It attaches with other tiers by
computing results to the browser/client tier and all other tiers in the network. Its other name is external layer.
• Logic layer
The logical tier is also known as the data access tier and middle tier. It lies between the presentation tier and the
data tier. it controls the application’s functions by performing processing. The components that build this layer exist
on the server and assist the resource sharing these components also define the business rules like different
government legal rules, data rules, and different business algorithms which are designed to keep data structure
consistent. This is also known as conceptual layer.
• Data layer
The 3-Data layer is the physical database tier where data is stored or manipulated. It is internal layer of database
management system where data stored.

Collaborative/Multi server:
This is an integrated database system formed by a collection of two or more autonomous database systems.
Multi-DBMS can be expressed through six levels of schema:
Multi-database View Level − Depicts multiple user views comprising subsets of the integrated distributed database.
Multi-database Conceptual Level − Depicts integrated multi-database that comprises global logical multi-database
structure definitions.
Multi-database Internal Level − Depicts the data distribution across different sites and multi-database to local
data mapping.
Local database View Level − Depicts a public view of local data.
Local database Conceptual Level − Depicts local data organization at each site.
Local database Internal Level − Depicts physical data organization at each site.
There are two design alternatives for multi-DBMS −
1. A model with a multi-database conceptual level.
2. Model without multi-database conceptual level.
Peer-to-Peer: Architecture model for DDBMS, In these systems, each peer acts both as a client and a
server for imparting database services. The peers share their resources with other peers and coordinate their
activities. Its scalability and flexibility is growing and shrinking. All nodes have the same role and functionality. Harder
to manage because all machines are autonomous and loosely coupled.
This architecture generally has four levels of schemas:
Global Conceptual Schema − Depicts the global logical view of data.
Local Conceptual Schema − Depicts logical data organization at each site.
Local Internal Schema − Depicts physical data organization at each site.
Local External Schema − Depicts user view of data.

BY: MUHAMMAD SHARIF 12


Database Systems Handbook

Example of Peer-to-peer architecture

Types of homogeneous distributed database


Autonomous − Each database is independent and functions on its own. They are integrated by a controlling
application and use message passing to share data updates.
Non-autonomous − Data is distributed across the homogeneous nodes and a central or master DBMS coordinates
data updates across the sites.
Autonomous databases
1. Autonomous Transaction Processing - Serverless
2. Autonomous Transaction Processing – Dedicated
Serverless is a simple and elastic deployment choice. Oracle autonomously operates all aspects of the database
lifecycle from database placement to backup and updates.
Dedicated is a private cloud in public cloud deployment choice. A completely dedicated compute, storage, network,
and database service for only a single tenant.

BY: MUHAMMAD SHARIF 13


Database Systems Handbook

Autonomous transaction processing: Architecture


Heterogeneous Distributed Databases
Types of Heterogeneous Distributed Databases
1. Federated − The heterogeneous database systems are independent and integrated so that they function as
a single database system.
2. Un-federated − The database systems employ a central coordinating module
In a heterogeneous distributed database, different sites have different operating systems, DBMS products, and data
models.
Its properties are −
Different sites use dissimilar schemas and software.
The system may be composed of a variety of DBMSs like relational, network, hierarchical, or object-
oriented.
Query processing is complex due to dissimilar schemas.
Parameters at which Distributed DBMS Architectures developed
DDBMS architectures are generally developed depending on three parameters:
1. Distribution − It states the physical distribution of data across the different sites.
2. Autonomy − It indicates the distribution of control of the database system and the degree to which each
constituent DBMS can operate independently.
3. Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system components, and
databases.

BY: MUHAMMAD SHARIF 14


Database Systems Handbook

DD Design Approaches

The Semi Join and Bloom Join are two techniques/data fetching method in distributed databases.
Some Popular databases:
• Native XML Databases
We were not surprised that the number of start-up companies as well as some established data management
companies determined that XML data would be best managed by a DBMS that was designed specifically to deal with
semi-structured data — that is, a native XML database.
• Conceptual Database
This step is related to the modeling in the Entity-Relationship (E/R) Model to specify sets of data called entities,
relations among them called relationships and cardinality restrictions identified by letters N and M, in this case, the
many-many relationships stand out.
• Conventional Database
This step includes Relational Modeling where a mapping from MER to relations using rules of mapping is carried
out. The posterior implementation is done in Structured Query Language (SQL).
• Non-Conventional database
This step involves Object-Relational Modeling which is done by the specification in Structured Query Language. In
this case, the modeling is related to the objects and their relationships with the Relational Model.
• Traditional database
• Temporal database
• Conventional Databases
• NewSQL Database
• Autonomous database
• Cloud database
• Spatiotemporal
• Enterprise Database Management System

BY: MUHAMMAD SHARIF 15


Database Systems Handbook

Other popular non-relational databases:


• Google Cloud Firestore
• Cassandra
• Couchbase
• Memcached, Redis, Coherence (key-value store)
• HBase, Big Table, Accumulo (Tabular)
• Amazon DynamoDB
• MongoDB, CouchDB, Cloudant, JSON-like (Document-based)
• Neo4j (Graph Database)
Non-relational (NoSQL) Data model
Base Model:
Basically Available – Rather than enforcing immediate consistency, BASE-modelled NoSQL databases will ensure the
availability of data by spreading and replicating it across the nodes of the database cluster.
Soft State – Due to the lack of immediate consistency, data values may change over time. The BASE model breaks
off with the concept of a database that enforces its consistency, delegating that responsibility to developers.
Eventually Consistent – The fact that BASE does not enforce immediate consistency does not mean that it never
achieves it. However, until it does, data reads are still possible (even though they might not reflect the reality).
Just as SQL databases are almost uniformly ACID compliant, NoSQL databases tend to conform to BASE principles.
NewSQL Database
NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems
for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database
system.
Examples and properties of Relational Non-Relational Database:

The term NewSQL categorizes databases that are the combination of relational models with the advancement in
scalability, and flexibility with types of data. These databases focus on the features which are not present in NoSQL,
which offers a strong consistency guarantee. This covers two layers of data one relational one and a key-value store.

BY: MUHAMMAD SHARIF 16


Database Systems Handbook

Sr. No NoSQL NewSQL

NoSQL is schema-less or has no fixed


schema/unstructured schema. So BASE Data
model exists in NoSQL. NoSQL is a schema-free NewSQL is schema-fixed as well as a schema-free
1. database. database.

2. It is horizontally scalable. It is horizontally scalable.

3. It possesses automatically high availability. It possesses built-in high availability.

It fully supports cloud, on-disk, and cache


storage. It may cause a problem with in-memory
4. It supports cloud, on-disk, and cache storage. architecture for exceeding volumes of data.

5. It promotes CAP properties. It promotes ACID properties.

Online Transactional Processing and


Online Transactional Processing is not implementation to traditional relational
6. supported. databases are fully supported

7. There are low-security concerns. There are moderate security concerns.

Use Cases: Big Data, Social Network Use Cases: E-Commerce, Telecom industry, and
8. Applications, and IoT. Gaming.

Examples: DynamoDB, MongoDB, RaveenDB


9. etc. Examples: VoltDB, CockroachDB, NuoDB etc.

Advantages of Database management systems:


It supports a logical view (schema, subschema),
It supports a physical view (access methods, data clustering),
It supports data definition language, data manipulation language to manipulate data,
It provides important utilities, such as transaction management and concurrency control, data integrity,
crash recovery, and security. Relational database systems, the dominant type of systems for well-formatted
business databases, also provide a greater degree of data independence.
The motivations for using databases rather than files include greater availability to a diverse set of users,
integration of data for easier access to and updating of complex transactions, and less redundancy of data.
Data consistency
Better data security
Faster development of new application

BY: MUHAMMAD SHARIF 17


Database Systems Handbook

CHAPTER 2 DATA TYPES, DATABASE KEYS, SQL FUNCTIONS AND OPERATORS


Data types Overview

Size in Memory Range of Values

1 byte 0 to 255

2 bytes True or False

2 bytes –32,768 to 32,767

4 bytes –2,147,483,648 to 2,147,483,647

4 bytes Approximately –3.4E38 to 3.4E38

8 bytes Approximately –1.8E308 to 4.9E324

Approximately –922,337,203,685,477.5808 to
8 bytes
922,337,203,685,477.5807

8 bytes 1/1/100 to 12/31/9999

4 bytes Any Object reference

Variable length:
Variable length: <= about 2 billion (65,400 for Win 3.1)
10 bytes + string length; Fixed
Fixed length: up to 65,400
length: string length

16 bytes for numbers Variable-length binary data with a maximum length of 2^31 – 1
22 bytes + string length. (2,147,483,647) bytes

BINARY_FLOAT 32-bit floating point number. This data type requires 4 bytes.
BINARY_DOUBLE 64-bit floating point number. This data type requires 8 bytes.

There are two classes of date The DateTime datatypes are −


and time-related data types in ➢ Date
➢ Timestamp
➢ Timestamp with time zone
PL/SQL − ➢ Timestamp with local time zone
1. Datetime datatypes The interval datatypes are −
2. Interval Datatypes ➢ Interval year to month
➢ Interval day to second

BY: MUHAMMAD SHARIF 18


Database Systems Handbook

Size in Memory Range of Values

If max_string_size = extended
Number having precision p and scale s. The precision p can range from
32767 bytes or characters
1 to 38. The scale s can range from -84 to 127. Both precision and scale
---------------------
are in decimal digits. A number value requires from 1 to 22 bytes.
If max_string_size = standard
Number(p,s) data type 4000
bytes or characters

Character data of variable length up to 2 gigabytes, or 231 -1 bytes.


LONG
Provided for backward compatibility.

Data Type Maximum Size in PL/SQL Maximum Size in SQL

CHAR 32,767 bytes 2,000 bytes

NCHAR 32,767 bytes 2,000 bytes

RAW 32,767 bytes 2,000 bytes

VARCHAR2 32,767 bytes 4,000 bytes ( 1 char = 1 byte)

BY: MUHAMMAD SHARIF 19


Database Systems Handbook

Data Type Maximum Size in PL/SQL Maximum Size in SQL

NVARCHAR2 32,767 bytes 4,000 bytes

LONG 32,760 bytes 2 gigabytes (GB) – 1

LONG RAW 32,760 bytes 2 GB

BLOB 8--128 terabytes (TB) (4 GB - 1) database_block_size

CLOB 8--128 TB (Used to store large blocks of character data in (4 GB - 1) database_block_size


the database.)

NCLOB 8-128 TB ( (4 GB - 1) database_block_size


Used to store large blocks of NCHAR data in the
database.)

Database Key A key is a field of a table that identifies the tuple in that table.
➢ Super key
An attribute or a set of attributes that uniquely identifies a tuple within a relation.
➢ Candidate key
A super key such that no proper subset is a super key within the relation. Contains no unique subset (irreducibility).
Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key. PRIMARY KEY
(sid), UNIQUE (id, grade)) A candidate can be unique but its value can be changed.

➢ Natural key PK in OLTP.


It may be a PK in OLAP.
➢ Composite key or concatenate key
A primary key that consists of two or more attributes is known as a composite key.
➢ Primary key
The candidate key is selected to identify tuples uniquely within a relation. Should remain constant over the life of
the tuple. PK is unique, Not repeated, not null, not change for life. If the primary key is to be changed. We will drop
the entity of the table, and add a new entity, In most cases, PK is used as a foreign key. You cannot change the value.
You first delete the child, so that you can modify the parent table

BY: MUHAMMAD SHARIF 20


Database Systems Handbook

➢ Minimal Super Key


All super keys can't be primary keys. The primary key is a minimal super key. KEY is a minimal SUPERKEY, that is, a
minimized set of columns that can be used to identify a single row.
➢ Foreign key
An attribute or set of attributes within one relation that matches the candidate key of some (possibly the same)
relation. Can you add a non-key as a foreign key? Yes, the minimum condition is it should be unique. It should be
candidate key.
➢ Composite Key
The composite key consists of more than one attribute. COMPOSITE KEY is a combination of two or more columns
that uniquely identify rows in a table. The combination of columns guarantees uniqueness, though individually
uniqueness is not guaranteed. Hence, they are combined to uniquely identify records in a table. You can you
composite key as PK but the Composite key will go to other tables as a foreign key.
➢ Alternate key
A relation can have only one primary key. It may contain many fields or a combination of fields that can be used as
the primary key. One field or combination of fields is used as the primary key. The fields or combinations of fields
that are not used as primary keys are known as candidate keys or alternate keys.
➢ Sort Or control key
A field or combination of fields that are used to physically sequence the stored data is called a sort key. It is also
known s the control key.
➢ Alternate key
An alternate key is a secondary key it can be simple to understand an example:
Let's take an example of a student it can contain NAME, ROLL NO., ID, and CLASS.

BY: MUHAMMAD SHARIF 21


Database Systems Handbook

➢ Unique key
A unique key is a set of one or more than one field/column of a table that uniquely identifies a record in a database
table.
You can say that it is a little like a primary key but it can accept only one null value and it cannot have duplicate
values.
The unique key and primary key both provide a guarantee for uniqueness for a column or a set of columns.
There is an automatically defined unique key constraint within a primary key constraint.
There may be many unique key constraints for one table, but only one PRIMARY KEY constraint for one table.
➢ Artificial Key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created when a primary
key is large and complex and has no relationship with many other relations. The data values of the artificial keys are
usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in employee relations.
So it would be better to add a new virtual attribute to identify each tuple in the relation uniquely. Rownum and
rowid are artificial keys. It should be a number or integer, numeric.
Format of Rowid:

➢ Surrogate key
SURROGATE KEYS is An artificial key that aims to uniquely identify each record and is called a surrogate key. This
kind of partial key in DBMS is unique because it is created when you don’t have any natural primary key. You can't
insert values of the surrogate key. Its value comes from the system automatically.
No business logic in key so no changes based on business requirements
Surrogate keys reduce the complexity of the composite key.
Surrogate keys integrate the extract, transform, and load in DBs.
➢ Compound Key
COMPOUND KEY has two or more attributes that allow you to uniquely recognize a specific record. It is possible that
each column may not be unique by itself within the database.

BY: MUHAMMAD SHARIF 22


Database Systems Handbook

Database Keys and Its Meta data’s description:

Operators
Comparison Operator Equality involves comparing two values. To do so requires the use of the (some comparison
operator like) (<), (>), or (=) special characters. Does X = Y? Is Y < X? These are both questions that can be answered
using a SQL Equality Operator expression.
The operator must be a standard comparison operator (=, <>, !=, >, >=, <, or <=).
Comparison Operators
=Equal to
<Less than
>Greater than
<=Less than or equal to
>=Greater than or equal to
< >Not equal to

Arithmetical Operator Arithmetic operators are used to performing mathematical operations.


Few examples of Arithmetic operators +, -, * , /, etc.
Logical Operator Logical operators operate on boolean expressions to combine the results of these boolean
expressions into a single boolean value. &&, ||, ! are a few examples of Logical Operators.

BY: MUHAMMAD SHARIF 23


Database Systems Handbook

Wildcards and Unions Operators


LIKE operator is used to filter the result set based on a string pattern. It is always used in the WHERE clause.
Wildcards are used in SQL to match a string pattern. A wildcard character is used to substitute one or more
characters in a string. Wildcard characters are used with the LIKE operator.
There are two wildcards often used in conjunction with the LIKE operator:
1. The percent sign (%) represents zero, one, or multiple characters
2. The underscore sign (_) represents one, a single character
Two Main Differences Between:
1. LIKE is case-insensitive whereas LIKE is case-sensitive.
2. LIKE is a standard SQL operator, whereas ILIKE is only implemented in certain databases such as
PostgreSQL and Snowflake.
Example: SELECT FROM Customers WHERE City LIKE 'ber%';
SQL UNION clause is used to select distinct values from the tables.
SQL UNION ALL clause used to select all values including duplicates from the tables
The UNION operator is used to combine the result-set of two or more SELECT statements.
Every SELECT statement within UNION must have the same number of columns
The columns must also have similar data types
The columns in every SELECT statement must also be in the same order
The EXCEPT or MINUS ->These are the records that exist in Dataset1 and not in Dataset2.
Each SELECT statement within the EXCEPT query must have the same number of fields in the result sets with similar
data types.
The difference is that EXCEPT is available in the PostgreSQL database while MINUS is available in MySQL and Oracle.
There is absolutely no difference between the EXCEPT clause and the MINUS clause.
The EXISTS function/operator is used to test for the existence of any record in a subquery. The EXISTS operator
returns TRUE if the subquery returns one or more records.
The IN operator allows you to specify multiple values in a WHERE clause. The IN operator is a shorthand for multiple
OR conditions.

The ANY operator


Returns a Boolean value as a result Returns true if any of the subquery values meet the condition
ANY means that the condition will be true if the operation is true for any of the values in the range.
NOT IN can also take literal values whereas not existing need a query to compare the results.
NOT EXISTS could be good to use because it can join with the outer query & can lead to usage of the index if the
criteria use an indexed column.
Examples of NOT in and NOT exists
NOT IN
SELECT CAT_ID FROM CATEGORY_A WHERE CAT_ID NOT IN (SELECT CAT_ID FROM CATEGORY_B)
NOT EXISTS

BY: MUHAMMAD SHARIF 24


Database Systems Handbook

SELECT A.CAT_ID FROM CATEGORY_A A WHERE NOT EXISTS (SELECT B.CAT_ID FROM CATEGORY_B B WHERE
B.CAT_ID = A.CAT_ID)
Supporting operators in different DBMS environments:
For example:
Keyword Database System
TOP SQL Server, MS Access
LIMIT MySQL, PostgreSQL, SQLite
FETCH FIRST Oracle
But 10g onward TOP Clause no longer supported replace with ROWNUM clause.

SQL FUNCTIONS

BY: MUHAMMAD SHARIF 25


Database Systems Handbook

Explanation of Single Row Functions

BY: MUHAMMAD SHARIF 26


Database Systems Handbook

BY: MUHAMMAD SHARIF 27


Database Systems Handbook

BY: MUHAMMAD SHARIF 28


Database Systems Handbook

BY: MUHAMMAD SHARIF 29


Database Systems Handbook

For assignments, Oracle can automatically convert the following:


VARCHAR2 or CHAR to NUMBER
NUMBER to VARCHAR2
VARCHAR2 or CHAR to DATE
DATE to VARCHAR2
VARCHAR2 or CHAR to ROWID
ROWID to VARCHAR2
VARCHAR2 or CHAR to MLSLABEL
MLSLABEL to VARCHAR2
VARCHAR2 or CHAR to HEX
HEX to VARCHAR2
Single row functions properties

BY: MUHAMMAD SHARIF 30


Database Systems Handbook

BY: MUHAMMAD SHARIF 31


Database Systems Handbook

BY: MUHAMMAD SHARIF 32


Database Systems Handbook

BY: MUHAMMAD SHARIF 33


Database Systems Handbook

BY: MUHAMMAD SHARIF 34


Database Systems Handbook

BY: MUHAMMAD SHARIF 35


Database Systems Handbook

BY: MUHAMMAD SHARIF 36


Database Systems Handbook

Subquery Concept

BY: MUHAMMAD SHARIF 37


Database Systems Handbook

BY: MUHAMMAD SHARIF 38


Database Systems Handbook

END

BY: MUHAMMAD SHARIF 39


Database Systems Handbook

CHAPTER 3 DATA MODELS, ITS TYPES, AND MAPPING TECHNIQUES


Overview of data modeling in DBMS
The semantic data model is a method of structuring data to represent it in a specific logical way.
Types of Data Models in history:

Data abstraction Process of hiding (suppressing) unnecessary details so that the high-level concept can be made
more visible. A data model is a relatively simple representation, usually graphical, of more complex real-world data
structures.

Data model Schema and Instance

The overall design of a database is called schema.

BY: MUHAMMAD SHARIF 40


Database Systems Handbook

Database Schema
A database schema is the skeleton structure of the database. It represents the logical view of the entire database.
Database Instance
The data which is stored in the database at a particular moment is called an instance of the database.
The actual data is stored in a database at a particular moment in time.
This includes the collection of all the data in the database.
Also called database state (or occurrence or snapshot).
The term instance is also applied to individual database components,
E.g., record instance, table instance, entity instance

Types of Instances
Initial Database Instance: Refers to the database instance that is initially loaded into the system.
Valid Database Instance: An instance that satisfies the structure and constraints of the database.
The database instance changes every time the database is updated.
An instance is also called an extension
A schema contains schema objects like table, foreign key, primary key, views, columns, data types, stored procedure,
etc.
A database schema can be represented by using a visual diagram. That diagram shows the database objects and
their relationship with each other.
A database schema is designed by the database designers to help programmers whose software will interact with
the database. The process of database creation is called data modeling.
Relational Schema definition
Relational schema refers to the meta-data that describes the structure of data within a certain
domain. It is the blueprint of a database that outlines the way any database will have some
number of constraints that must be applied to ensure correct data (valid states).
Database Schema Definition
A relational schema may also refer to as a database schema. A database schema is the collection
of relation schemas for a whole database. A relational or Database schema is a collection of meta-
data. Database schema describes the structure and constraints of data represented in a particular
domain. A Relational schema can be described as a blueprint of a database that outlines the way
data is organized into tables. This blueprint will not contain any type of data. In a relational
schema, each tuple is divided into fields called Domains.
Other definitions: The overall design of the database.
Schema is also called intension.
Types of Schemas w.r.t Database environment:
DBMS Schemas: Logical/Conceptual/physical schema/external schema
Data warehouse/multi-dimensional schemas: Snowflake/star
OLAP Schemas: Fact constellation schema/galaxy
Three levels of data abstraction / The ANSI/SPARC Three Schema Architecture
External Level: View level, user level, external schema, Client level.
Conceptual Level: Community view, ERD Model, conceptual schema, server level, Conceptual (high-level,
semantic) data models, entity-based or object-based data models, what data is stored .and relationships, it’s deal
Logical data independence (External/conceptual mapping)

BY: MUHAMMAD SHARIF 41


Database Systems Handbook

logical schema: It is sometimes called conceptual schema too (server level), Implementation (representational)
data models. Specific DBMS level modeling.
Internal Level: Physical representation, Internal schema, Database level, Low level. It deals with how data is stored
in the database and Physical data independence (Conceptual/internal mapping)
Physical data level: Physical storage, physical schema, some-time deals with internal schema. It is detailed in
administration manuals.

Data independence
IT is the ability to make changes in either the logical or physical structure of the database without requiring
reprogramming of application programs.
Data Independence types:
Logical data independence=>Immunity of external schemas to changes in the conceptual schema
Physical data independence=>Immunity of the conceptual schema to changes in the internal schema.

BY: MUHAMMAD SHARIF 42


Database Systems Handbook

There are two types of mapping in the database architecture


Conceptual/ Internal Mapping
The Conceptual/ Internal Mapping lies between the conceptual level and the internal level. Its role is to define the
correspondence between the records and fields of the conceptual level and files and data structures of the internal
level.
External/Conceptual Mapping
The external/Conceptual Mapping lies between the external level and the Conceptual level. Its role is to define the
correspondence between a particular external and conceptual view.
Detail description:
When a schema at a lower level is changed, only the mappings.
between this schema and higher-level schemas need to be changed in a DBMS that fully supports data
independence.
The higher-level schemas themselves are unchanged.

BY: MUHAMMAD SHARIF 43


Database Systems Handbook

Hence, the application programs need not be changed since they refer to the external schemas.
For example, the internal schema may be changed when certain file structures are reorganized or new indexes are
created to improve database performance.
Data abstraction
Data abstraction makes complex systems more user-friendly by removing the specifics of the system mechanics.
The conceptual data model has been most successful as a tool for communication between the designer and the
end user during the requirements analysis and logical design phases. Its success is because the model, using either
ER or UML, is easy to understand and convenient to represent. Another reason for its effectiveness is that it is a top-
down approach using the concept of abstraction. In addition, abstraction techniques such as generalization provide
useful tools for integrating
end user views to define a global conceptual schema.
These differences show up in conceptual data models as different levels of abstraction; connectivity of relationships
(one-to-many, many-to-many, and so on); or as the same concept being modeled as an entity, attribute, or
relationship, depending on the user’s perspective.
Techniques used for view integration include abstraction, such as generalization and aggregation to create new
supertypes or subtypes, or even the introduction of new relationships. The higher-level abstraction, the entity
cluster, must maintain the same relationships between entities inside and outside the entity cluster as those that
occur between the same entities in the lower-level diagram.
ERD, EER terminology is not only used in conceptual data modeling but also in artificial intelligence literature when
discussing knowledge representation (KR).
The goal of KR techniques is to develop concepts for accurately modeling some domain of knowledge by creating an
ontology.
Ontology is the fundamental part of Semantic Web. The goal of World Wide Web Consortium (W3C) is to bring the
web into (its full potential) a semantic web with reusing previous systems and artifacts. Most legacy systems have
been documented in structural analysis and structured design (SASD), especially in simple or Extended ER Diagram
(ERD). Such systems need up-gradation to become the part of semantic web. In this paper, we present ERD to OWL-
DL ontology transformation rules at concrete level. These rules facilitate an easy and understandable transformation
from ERD to OWL. Ontology engineering is an important aspect of semantic web vision to attain the meaningful
representation of data. Although various techniques exist for the creation of ontology, most of the methods involve
the number of complex phases, scenario-dependent ontology development, and poor validation of ontology. This
research work presents a lightweight approach to build domain ontology using Entity Relationship (ER) model.

BY: MUHAMMAD SHARIF 44


Database Systems Handbook

We now discuss four abstraction concepts that are used in semantic data models, such as the EER model as well as
in KR schemes: (1) classification and instantiation, (2) identification, (3) specialization and generalization, and (4)
aggregation and association.
One ongoing project that is attempting to allow information exchange among computers on the Web is called the
Semantic Web, which attempts to create knowledge representation models that are quite general in order to allow
meaningful information exchange and search among machines.
One commonly used definition of ontology is a specification of a conceptualization. In this definition, a
conceptualization is the set of concepts that are used to represent the part of reality or knowledge that is of interest
to a community of users.
Types of Abstractions
Classification: A is a member of class B
Aggregation: B, C, D Are Aggregated Into A, A Is Made Of/Composed Of B, C, D, Is-Made-Of, Is-
Associated-With, Is-Part-Of, Is-Component-Of. Aggregation is an abstraction through which relationships are
treated as higher-level entities.
Generalization: B,C,D can be generalized into a, b is-a/is-an a, is-as-like, is-kind-of.
Category or Union: A category represents a single superclass or subclass relationship with more than one
superclass.
Specialization: A can be specialized into B, C, DB, C, or D (special cases of A) Has-a, Has-A, Has An, Has-An
approach is used in the specialization
Composition: IS-MADE-OF (like aggregation)
Identification: IS-IDENTIFIED-BY
UML Diagrams Notations
UML stands for Unified Modeling Language. ERD stands for Entity Relationship Diagram. UML is a popular and
standardized modeling language that is primarily used for object-oriented software. Entity-Relationship diagrams
are used in structured analysis and conceptual modeling.
Object-oriented data models are typically depicted using Unified Modeling Language (UML) class diagrams. Unified
Modeling Language (UML) is a language based on OO concepts that describes a set of diagrams and symbols that
can be used to graphically model a system. UML class diagrams are used to represent data and their relationships
within the larger UML object-oriented system’s modeling language.

BY: MUHAMMAD SHARIF 45


Database Systems Handbook

Associations
UML uses Boolean attributes instead of unary relationships but allows relationships of all other entities. Optionally,
each association may be given at most one name. Association names normally start with a capital letter. Binary
associations are depicted as lines between classes. Association lines may include elbows to assist with layout or
when needed (e.g., for ring relationships).

ER Diagram and Class Diagram Synchronization Sample


Supporting the synchronization between ERD and Class Diagram. You can transform the system design from the
data model to the Class model and vice versa, without losing its persistent logic.

BY: MUHAMMAD SHARIF 46


Database Systems Handbook

Conversions of Terminology of UML and ERD

BY: MUHAMMAD SHARIF 47


Database Systems Handbook

Relational Data Model and its Major Evolution


Inclusion ER Model is the Class diagram of the UML Series.

BY: MUHAMMAD SHARIF 48


Database Systems Handbook

➢ ER Notation Comparison with UML and Their relationship

BY: MUHAMMAD SHARIF 49


Database Systems Handbook

ER Construct Notation Comparison

BY: MUHAMMAD SHARIF 50


Database Systems Handbook

➢ Rest ER Construct Notation Comparison

BY: MUHAMMAD SHARIF 51


Database Systems Handbook

APPROPRIATE ER MODEL DESIGN NAMING CONVENTIONS


Guideline 1
Nouns => Entity, object, relation, table_name.
Verbs => Indicate relationship_types.
Common Nouns=> A common noun (such as student and employee) in English corresponds to an entity type in an
ER diagram:
Proper Nouns=> Proper nouns are entities, not entity types, e.g. John, Singapore, New York City.

Note: A relational database uses relations or two-dimensional tables to store information.

BY: MUHAMMAD SHARIF 52


Database Systems Handbook

BY: MUHAMMAD SHARIF 53


Database Systems Handbook

Types of Attributes-
In ER diagram, attributes associated with an entity set may be of the following types-
1. Simple attributes/atomic attributes/Static attributes
2. Key attribute
3. Unique attributes
4. Stored attributes
5. Prime attributes
6. Derived attributes (DOB, AGE)
7. Composite attribute (Address (street, door#, city, town, country))
8. The multivalued attribute (double ellipse (Phone#, Hobby, Degrees))
9. Dynamic Attributes
10. Boolean attributes
The fundamental new idea in the MOST model is the so-called dynamic attributes. Each attribute of an object class
is classified to be either static or dynamic. A static attribute is as usual. A dynamic attribute changes its value with
time automatically.
Attributes of the database tables which are candidate keys of the database tables are called prime attributes.

BY: MUHAMMAD SHARIF 54


Database Systems Handbook

Symbols of Attributes:

The Entity
The entity is the basic building block of the E-R data model. The term entity is used in three different meanings or
for three different terms and are:
Entity type
Entity instance
Entity set

Technical Types of Entity:


Tangible Entity:
Tangible Entities are those entities that exist in the real world physically. Example: Person, car, etc.
Intangible Entity:
Intangible (Concepts) Entities are those entities that exist only logically and have no physical existence. Example:
Bank Account, etc.
Classification of entity types
Strong Entity Type
Weak Entity Type
Naming Entity

BY: MUHAMMAD SHARIF 55


Database Systems Handbook

Details of Classification of entity types


An entity type whose instances can exist independently, that is, without being linked to the instances of any other
entity type is called a strong entity type.
A weak entity can be identified uniquely only by considering the primary key of another (owner) entity.
The owner entity set and weak entity set must participate in a one-to-many relationship set (one owner, many
weak entities).
The weak entity set must have total participation in this identifying relationship set.
Weak entities have only a “partial key” (dashed underline), When the owner entity is deleted, all owned weak
entities must also be deleted
Types Following are some recommendations for naming entity types.
Singular nouns are recommended, but still, plurals can also be used
Organization-specific names, like a customer, client, owner anything will work
Write in capitals, yes, this is something that is generally followed, otherwise will also work.
Abbreviations can be used, be consistent. Avoid using confusing abbreviations, if they are confusing for
others today, tomorrow they will confuse you too.

Database Design Tools


Some commercial products are aimed at providing environments to support the DBA in
performing database design. These environments are provided by database design tools, or
sometimes as part of a more general class of products known as computer-aided software
engineering (CASE) tools. Such tools usually have some components, choose from the following
kinds. It would be rare for a single product to offer all these capabilities.
1. ER Design Editor
2. ER to Relational Design Transformer
3. FD to ER Design Transformer
4. Design Analyzers

BY: MUHAMMAD SHARIF 56


Database Systems Handbook

ER Modeling Rules to design database:


Three components:
1. Structural part - set of rules applied to the construction of the database
2. Manipulative part - defines the types of operations allowed on the data
3. Integrity rules - ensure the accuracy of the data
The first step of Data modeling:
DFD Data Flow Model
Data flow diagrams: the most common tool used for designing database systems is a data flow
diagram. It is used to design systems graphically and expresses different system detail at different
DFD levels.
Characteristics
DFDs show the flow of data between different processes or a specific system.
DFDs are simple and hide complexities.
DFDs are descriptive and links between processes describe the information flow.
DFDs are focused on the flow of information only.
Data flows are pipelines through which packets of information flow.
DBMS applications store data as a file. RDBMS applications store data in a tabular form.
In the file system approach, there is no concept of data models exists. It mostly consists
of different types of files like mp3, mp4, txt, doc, etc. that are grouped into directories
on a hard drive.
Collection of logical constructs used to represent data structure and relationships within
the database.
A data flow diagram shows the way information flows through a process or system. It includes data inputs
and outputs, data stores, and the various subprocesses the data moves through.
Symbols used in DFD
Dataflow => Arrow symbol
Data store=> It is expressed with a rectangle open on the right width and the left width of the
rectangle drawn with double lines.
Processes=> Circle or near squire rectangle
DFD-process => Numbered DFD processes circle and rectangle by passing a line above the
center of the circle or rectangle
To create DFD following steps:
1. Create a list of activities
2. Construct Context Level DFD (external entities, processes)
3. Construct Level 0 DFD (manageable sub-process)
4. Construct Level 1- n DFD (actual data flows and data stores)
Types of DFD
1. Context diagram
2. Level 0 diagram
3. Detailed diagram

BY: MUHAMMAD SHARIF 57


Database Systems Handbook

Context diagrams are the most basic data flow diagrams. They provide a broad view that is easily digestible but
offers little detail. They always consist of a single process and describe a single system. The only process displayed
in the CDFDs is the process/system being analyzed. The name of the CDFDs is generally a Noun Phrase.

Example Context DFD Diagram


In the context level, DFDs no data stores are created.
0-Level DFD The level 0 Diagram in the DFD is used to describe the working of the whole system. Once a context
DFD has been created the level zero diagram or level ‘not’ diagram is created. The level zero diagram contains all
the apparent details of the system. It shows the interaction between some processes and may include a large
number of external entities. At this level, the designer must keep a balance in describing the system using the level
0 diagram. Balance means that he should give proper depth to the level 0 diagram processes.
1-level DFD In 1-level DFD, the context diagram is decomposed into multiple bubbles/processes. In this level,
we highlight the main functions of the system and breakdown the high-level process of 0-level DFD into
subprocesses.
2-level DFD In 2-level DFD goes one step deeper into parts of 1-level DFD. It can be used to plan or record the
specific/necessary detail about the system’s functioning.
Detailed DFDs are detailed enough that it doesn’t usually make sense to break them down further.
Logical data flow diagrams focus on what happens in a particular information flow: what information is being
transmitted, what entities are receiving that info, what general processes occur, etc. It describes the functionality
of the processes that we showed briefly in the Level 0 Diagram. It means that generally detailed DFDS is expressed
as the successive details of those processes for which we do not or could not provide enough details.
Logical DFD:
Logical data flow diagram mainly focuses on the system process. It illustrates how data flows in the system. Logical
DFD is used in various organizations for the smooth running of system. Like in a Banking software system, it is used
to describe how data is moved from one entity to another.

Physical DFD:
Physical data flow diagram shows how the data flow is actually implemented in the system. Physical DFD is more
specific and closer to implementation.

BY: MUHAMMAD SHARIF 58


Database Systems Handbook

➢ Conceptual models are (Entity-relationship database model (ERDBD), Object-oriented


model (OODBM), Record-based data model)
➢ Implementation models (Types of Record-based logical Models) are (Hierarchical
database model (HDBM), Network database model (NDBM), Relational database model
(RDBM)
➢ Semi-structured Data Model (The semi-structured data model allows the data specifications at places
where the individual data items of the same type may have different attribute sets. The Extensible
Markup Language, also known as XML, is widely used for representing semi-structured data).

BY: MUHAMMAD SHARIF 59


Database Systems Handbook

Evolution Records of Data model and types

BY: MUHAMMAD SHARIF 60


Database Systems Handbook

BY: MUHAMMAD SHARIF 61


Database Systems Handbook

ERD Modeling and Database table relationships:


What is ERD: structure OR schema OR logical Design of database is called Entity-Relationship diagram.
Category of relationships
Optional relationship
Mandatory relationship

BY: MUHAMMAD SHARIF 62


Database Systems Handbook

Types of relationships concerning degree.


Unary or self or recursive relationship
A single entity, recursive, exists between occurrences of the same entity set
Binary
Two entities are associated in a relationship
Ternary
A ternary relationship is when three entities participate in the relationship.
A ternary relationship is a relationship type that involves many many relationships between three tables.
For Example:
The University might need to record which teachers taught which subjects in which courses.

BY: MUHAMMAD SHARIF 63


Database Systems Handbook

N-ary
N-ary (n entities involved in the relationship)
An N-ary relationship exists when there are n types of entities. There is one limitation of the N-ary any entities so it
is very hard to convert into an entity, a rational table.
A relationship between more than two entities is called an n-ary relationship.
Examples of relationships R between two entities E and F

Relationship Notations with entities:


Because it uses diamonds for relationships, Chen notation takes up more space than Crow’s Foot notation. Chen's
notation also requires symbols. Crow’s Foot has a slight learning curve.
Chen notation has the following possible cardinality:
One-to-One, Many-to-Many, and Many-to-One Relationships
One-to-one (1:1) – both entities are associated with only one attribute of another entity
One-to-many (1:N) – one entity can be associated with multiple values of another entity
Many-to-one (N:1) – many entities are associated with only one attribute of another entity
Many-to-many (M: N) – multiple entities can be associated with multiple attributes of another entity
ER Design Issues
Here, we will discuss the basic design issues of an ER database schema in the following points:
1) Use of Entity Set vs Attributes
The use of an entity set or attribute depends on the structure of the real-world enterprise that is being modeled
and the semantics associated with its attributes.
2) Use of Entity Set vs. Relationship Sets
It is difficult to examine if an object can be best expressed by an entity set or relationship set.
3) Use of Binary vs n-ary Relationship Sets
Generally, the relationships described in the databases are binary relationships. However, non-binary relationships
can be represented by several binary relationships.

Transforming Entities and Attributes to Relations


Our ultimate aim is to transform the ER design into a set of definitions for relational
tables in a computerized database, which we do through a set of transformation
rules.

BY: MUHAMMAD SHARIF 64


Database Systems Handbook

ER and Relational Model Mapping

BY: MUHAMMAD SHARIF 65


Database Systems Handbook

The first step is to design a rough schema by analyzing of requirements

Normalize the ERD and remove FD from Entities to enter the final steps

BY: MUHAMMAD SHARIF 66


Database Systems Handbook

Transformation Rule 1. Each entity in an ER diagram is mapped to a single table in a relational database;

Transformation Rule 2. A key attribute of the entity type is represented by the primary key.
All single-valued attribute becomes a column for the table
Transformation Rule 3. Given an entity E with primary identify, a multivalued attributed attached to E in an ER
diagram is mapped to a table of its own;

BY: MUHAMMAD SHARIF 67


Database Systems Handbook

Transforming Binary Relationships to Relations


We are now prepared to give the transformation rule for a binary many-to-many relationship.
Transformation Rule 3.5. N – N Relationships: When two entities E and F take part in a many-to-many binary
relationship R, the relationship is mapped to a representative table T in the related relational database design. The
table contains columns for all attributes in the primary keys of both tables transformed from entities E and F, and
this set of columns form the primary key for table T.
Table T also contains columns for all attributes attached to the relationship. Relationship occurrences are
represented by rows of the table, with the related entity instances uniquely identified by their primary key values
as rows.
Case 1: Binary Relationship with 1:1 cardinality with the total participation of an entity
Total participation, i.e. min occur is 1 with double lines in total. Oval is a derived attribute
A person has 0 or 1 passport number and the Passport is always owned by 1 person. So it is 1:1 cardinality with full
participation constraint from Passport. First Convert each entity and relationship to tables.
Case 2: Binary Relationship with 1:1 cardinality and partial participation of both entities
A male marries 0 or 1 female and vice versa as well. So it is a 1:1 cardinality with partial participation constraint
from both. First Convert each entity and relationship to tables. Male table corresponds to Male Entity with key as
M-Id. Similarly, the Female table corresponds to Female Entity with the key as F-Id. Marry Table represents the
relationship between Male and Female (Which Male marries which female). So it will take attribute M-Id from
Male and F-Id from Female.
Case 3: Binary Relationship with n: 1 cardinality
Case 4: Binary Relationship with m: n cardinality
Case 5: Binary Relationship with weak entity
In this scenario, an employee can have many dependents and one dependent can depend on one employee. A
dependent does not have any existence without an employee (e.g; you as a child can be dependent on your father
in his company). So it will be a weak entity and its participation will always be total.

EERD Data Model

BY: MUHAMMAD SHARIF 68


Database Systems Handbook

Top-down and bottom-up approach:


Generalization is the concept that some entities are the subtypes of other more general entities. They are
represented by an "is a" relationship. Faculty (ISA OR IS-A OR IS A) subtype of the employee. One method of
representing subtype relationships shown below is also known as the top-down approach.
Exclusive Subtype
If subtypes are exclusive, one supertype relates to at most one subtype.
Inclusive Subtype
If subtypes are inclusive, one supertype can relate to one or more subtypes

BY: MUHAMMAD SHARIF 69


Database Systems Handbook

Data abstraction at EERD levels:


Concepts of total and partial, subclasses and superclasses, specializations and generalizations.
View level: The highest level of data abstraction like EERD.
Middle level: Middle level of data abstraction like ERD
The lowest level of data abstraction like Physical/internal data stored at disk/bottom level
Specialization
Subgrouping into subclasses (top-down approach)( HASA, HAS-A, HAS AN, HAS-AN)
Inheritance – Inherit attributes and relationships from the superclass (Name, Birthdate, etc.)

BY: MUHAMMAD SHARIF 70


Database Systems Handbook

Generalization
Reverse processes of defining subclasses (bottom-up approach). Bring together common attributes in entities (ISA,
IS-A, IS AN, IS-AN)
Union
Models a class/subclass with more than one superclass of distinct entity types. Attribute inheritance is selective.

BY: MUHAMMAD SHARIF 71


Database Systems Handbook

Constraints on Specialization and Generalization


We have four types of specialization/generalization constraints:
Disjoint, total
Disjoint, partial
Overlapping, total
Overlapping, partial
Multiplicity (relationship constraint)
Covering constraints whether the entities in the subclasses collectively include all entities in the superclass
Note: Generalization usually is total because the superclass is derived from the subclasses.
The term Cardinality has two different meanings based on the context you use.

BY: MUHAMMAD SHARIF 72


Database Systems Handbook

Relationship Constraints types


Cardinality ratio
Specifies the maximum number of relationship instances in which each entity can participate
Types 1:1, 1:N, or M:N
Participation constraint
Specifies whether the existence of an entity depends on its being related to another entity
Types: total and partial
Thus the minimum number of relationship instances in which entities can participate: thus1 for total participation,
0 for partial
Diagrammatically, use a double line from relationship type to entity type
There are two types of participation constraints:
Total participation, i.e. min occur is 1 with double lines in total. DottedOval is a derived attribute
Partial Participation
Total Participation
When we require all entities to participate in the relationship (total participation), we use double lines to specify.
(Every loan has to have at least one customer)

BY: MUHAMMAD SHARIF 73


Database Systems Handbook

Cardinality

It expresses some entity occurrences associated with one occurrence of the related entity=>The specific.
The cardinality of a relationship is the number of instances of entity B that can be associated with entity A. There is
a minimum cardinality and a maximum cardinality for each relationship, with an unspecified maximum cardinality
being shown as N. Cardinality limits are usually derived from the organization's policies or external constraints.
For Example:
At the University, each Teacher can teach an unspecified maximum number of subjects as long as his/her weekly
hours do not exceed 24 (this is an external constraint set by an industrial award). Teachers may teach 0 subjects if
they are involved in non-teaching projects. Therefore, the cardinality limits for TEACHER are (O, N).
The University's policies state that each Subject is taught by only one teacher, but it is possible to have Subjects
that have not yet been assigned a teacher. Therefore, the cardinality limits for SUBJECT are (0,1). Teacher and
subject have M: N relationship connectivity. And they are binary (two) ternary too if we break this relationship.
Such situations are modeled using a composite entity (or gerund)

BY: MUHAMMAD SHARIF 74


Database Systems Handbook

Cardinality Constraint: Quantification of the relationship between two concepts or classes (a constraint on
aggregation)
Remember cardinality is always a relationship to another thing.
Max Cardinality(Cardinality) Always 1 or Many. Class A has a relationship to Package B with a cardinality of one,
which means at most there can be one occurrence of this class in the package. The opposite could be a Package
that has a Max Cardinality of N, which would mean there can be N number of classes
Min Cardinality(Optionality) Simply means "required." Its always 0 or 1. 0 would mean 0 or more, 1 or more
The three types of cardinality you can define for a relationship are as follows:
Minimum Cardinality. Governs whether or not selecting items from this relationship is optional or required. If you
set the minimum cardinality to 0, selecting items is optional. If you set the minimum cardinality to greater than 0,
the user must select that number of items from the relationship.
Optional to Mandatory, Optional to Optional, Mandatory to Optional, Mandatory to Mandatory
Summary Of ER Diagram Symbols
Maximum Cardinality. Sets the maximum number of items that the user can select from a relationship. If you set the
minimum cardinality to greater than 0, you must set the maximum cardinality to a number at least as large If you do
not enter a maximum cardinality, the default is 999.
Type of Max Cardinality: 1 to 1, 1 to many, many to many, many to 1
Default Cardinality. Specifies what quantity of the default product is automatically added to the initial solution that
the user sees. Default cardinality must be equal to or greater than the minimum cardinality and must be less than
or equal to the maximum cardinality.
Replaces cardinality ratio numerals and single/double line notation
Associate a pair of integer numbers (min, max) with each participant of an entity type E in a relationship type R,
where 0 ≤ min ≤ max and max ≥ 1 max=N => finite, but unbounded
Relationship types can also have attributes
Attributes of 1:1 or 1:N relationship types can be migrated to one of the participating entity types
For a 1:N relationship type, the relationship attribute can be migrated only to the entity type on the N-side of the
relationship
Attributes on M: N relationship types must be specified as relationship attributes

In the case of Data Modelling, Cardinality defines the number of attributes in one entity set, which can be associated
with the number of attributes of other sets via a relationship set. In simple words, it refers to the relationship one
table can have with the other table. They can be One-to-one, One-to-many, Many-to-one, or Many-to-many. And
third may be the number of tuples in a relation.
In the case of SQL, Cardinality refers to a number. It gives the number of unique values that appear in the table for
a particular column. For eg: you have a table called Person with the column Gender. Gender column can have values
either 'Male' or 'Female''.
cardinality is the number of tuples in a relation (number of rows).
The Multiplicity of an association indicates how many objects the opposing class of an object can be instantiated.
When this number is variable then the.
Multiplicity Cardinality + Participation dictionary definition of cardinality is the number of elements in a particular
set or other.
Multiplicity can be set for attribute operations and associations in a UML class diagram (Equivalent to ERD) and
associations in a use case diagram.
A cardinality is how many elements are in a set. Thus, a multiplicity tells you the minimum and maximum allowed
members of the set. They are not synonymous.

BY: MUHAMMAD SHARIF 75


Database Systems Handbook

Given the example below:


0-1 ---------- 1-
Multiplicities:
The first multiplicity, for the left entity: 0-1
The second multiplicity, for the right entity: 1-
Cardinalities for the first multiplicity:
Lower cardinality: 0
Upper cardinality: 1
Cardinalities for the second multiplicity:
Lower cardinality: 1
Upper cardinality:
Multiplicity is the constraint on the collection of the association objects whereas Cardinality is the count of the
objects that are in the collection. The multiplicity is the cardinality constraint.
A multiplicity of an event = Participation of an element + cardinality of an element.
UML uses the term Multiplicity, whereas Data Modelling uses the term Cardinality. They are for all intents and
purposes, the same.
Cardinality (sometimes referred to as Ordinality) is what is used in ER modeling to "describe" a relationship between
two Entities.
Cardinality and Modality
The major difference between cardinality and modality is that cardinality is defined as the metric used to specify
the number of occurrences of one object related to the number of occurrences of another object. On the contrary,
modality signifies whether a certain data object must participate in the relationship or not.
Cardinality refers to the maximum number of times an instance in one entity can be associated with instances in
the related entity. Modality refers to the minimum number of times an instance in one entity can be associated
with an instance in the related entity.
Cardinality can be 1 or Many and the symbol is placed on the outside ends of the relationship line, closest to the
entity, Modality can be 1 or 0 and the symbol is placed on the inside, next to the cardinality symbol. For a
cardinality of 1, a straight line is drawn. For a cardinality of Many a foot with three toes is drawn. For a modality of
1, a straight line is drawn. For a modality of 0, a circle is drawn.
zero or more

1 or more
1 and only 1 (exactly 1)
Multiplicity = Cardinality + Participation
Cardinality: Denotes the maximum number of possible relationship occurrences in which a certain entity can
participate (in simple terms: at most).

Note: Connectivity and Modality/ multiplicity/ Cardinality and Relationship are same terms.

BY: MUHAMMAD SHARIF 76


Database Systems Handbook

Participation: Denotes if all or only some entity occurrences participate in a relationship (in simple terms: at least).

BASIS FOR
CARDINALITY MODALITY
COMPARISON

Basic A maximum number of associations between the A minimum number of row


table rows. associations.

Types One-to-one, one-to-many, many-to-many. Nullable and not nullable.

BY: MUHAMMAD SHARIF 77


Database Systems Handbook

Generalization is like a bottom-up approach in which two or more entities of lower levels combine to form a
higher level entity if they have some attributes in common.
Generalization is more like a subclass and superclass system, but the only difference is the approach.
Generalization uses the bottom-up approach. Like subclasses are combined to make a superclass. IS-A, ISA, IS A, IS
AN, IS-AN Approach is used in generalization
Generalization is the result of taking the union of two or more (lower level) entity types to produce a higher level
entity type.
Generalization is the same as UNION. Specialization is the same as ISA.
A specialization is a top-down approach, and it is the opposite of Generalization. In specialization, one higher-level
entity can be broken down into two lower-level entities. Specialization is the result of taking a subset of a higher-
level entity type to form a lower-level entity type.
Normally, the superclass is defined first, the subclass and its related attributes are defined next, and the
relationship set is then added. HASA, HAS-A, HAS AN, HAS-AN.

UML to EER specialization or generalization comes in the form of hierarchical entity set:

BY: MUHAMMAD SHARIF 78


Database Systems Handbook

BY: MUHAMMAD SHARIF 79


Database Systems Handbook

Transforming EERD to Relational Database Model

BY: MUHAMMAD SHARIF 80


Database Systems Handbook

Specialization / Generalization Lattice Example (UNIVERSITY) EERD TO Relational Model

BY: MUHAMMAD SHARIF 81


Database Systems Handbook

Mapping Process
1. Create tables for all higher-level entities.
2. Create tables for lower-level entities.
3. Add primary keys of higher-level entities in the table of lower-level entities.
4. In lower-level tables, add all other attributes of lower-level entities.
5. Declare the primary key of the higher-level table and the primary key of the lower-level table.
6. Declare foreign key constraints.
This section presents the concept of entity clustering, which abstracts the ER schema to such a degree that the
entire schema can appear on a single sheet of paper or a single computer screen.

END

BY: MUHAMMAD SHARIF 82


Database Systems Handbook

CHAPTER 4 DISCOVERING BUSINESS RULES AND DATABASE CONSTRAINTS


Definition of Data integrity Constraints placed on the set of values allowed for the attributes of relation as
relational Integrity.
Constraints– These are special restrictions on allowable values. For example, the Passing marks for a student must
always be greater than 50%.
Categories of Constraints
Constraints on databases can generally be divided into three main categories:
1. Constraints that are inherent in the data model. We call these inherent model-based constraints or implicit
constraints.
2. Constraints that can be directly expressed in schemas of the data model, typically by specifying them in the
DDL (data definition language, we call these schema-based constraints or explicit constraints.
3. Constraints that cannot be directly expressed in the schemas of the data model, and hence must be
expressed and enforced by the application programs. We call these application-based or semantic
constraints or business rules.
Types of data integrity
1. Physical Integrity
Physical integrity is the process of ensuring the wholeness, correctness, and accuracy of data when data is stored
and retrieved.
2. Logical integrity
Logical integrity refers to the accuracy and consistency of the data itself. Logical integrity ensures that the data
makes sense in its context.
Types of logical integrity
1. Entity integrity
2. Domain integrity

The schema-based constraints or explicit include domain constraints, key constraints, constraints on NULLs, entity
integrity constraints, and referential integrity constraints.
Insertions Constraints are also called explicit
Insert can violate any of the four types of constraints discussed in the previous section. Domain constraints can be
violated if an attribute value is given that does not appear in the corresponding domain or is not of the appropriate
data type. Key constraints can be violated if a key value in the new tuple already exists in another tuple in the
relation r(R). Entity integrity can be violated if any part of the primary key of the new tuple t is NULL. Referential
integrity can be violated if the value of any foreign key in t refers to a tuple that does not exist in the referenced
relation.
1. Business Rule constraints
These rules are applied to data before (first) the data is inserted into the table columns. For example, Unique, Not
NULL, Default constraints.
1. The primary key value can’t be null.
2. Not null (absence of any value (i.e., unknown or nonapplicable to a tuple)
3. Unique
4. Primary key
5. Foreign key
6. Check
7. Default

BY: MUHAMMAD SHARIF 83


Database Systems Handbook

2. Null Constraints
Comparisons Involving NULL and Three-Valued Logic:
SQL has various rules for dealing with NULL values. Recall from Section 3.1.2 that NULL is used to represent a missing
value, but that it usually has one of three different interpretations—value unknown (exists but is not known), value
not available (exists but is purposely withheld), or value not applicable (the attribute is undefined for this tuple).
Consider the following examples to illustrate each of the meanings of NULL.
1. Unknownalue. A person’s date of birth is not known, so it is represented by NULL in the database.
2. Unavailable or withheld value. A person has a home phone but does not want it to be listed, so it is withheld
and represented as NULL in the database.
3. Not applicable attribute. An attribute Last_College_Degree would be NULL for a person who has no college
degrees because it does not apply to that person.

3. Enterprise Constraints
Enterprise constraints – sometimes referred to as semantic constraints – are additional rules specified by users or
database administrators and can be based on multiple tables.
Here are some examples.
A class can have a maximum of 30 students.
A teacher can teach a maximum of four classes per semester.
An employee cannot take part in more than five projects.
The salary of an employee cannot exceed the salary of the employee’s manager.

4. Domain integrity/ Constraints


A domain of possible values must be associated with every attribute (for example, integer types, character types,
date/time types). Declaring an attribute to be of a particular domain act as the constraint on the values that it can
take. Domain Integrity rules govern the values. In a database system, the domain integrity is defined by:
The datatype and the length
1. The NULL value acceptance
2. The allowable values, through techniques like constraints or rules the default value.

BY: MUHAMMAD SHARIF 84


Database Systems Handbook

Some examples of Domain Level Integrity are mentioned below;


Data Type– For example integer, characters, etc.
Date Format– For example dd/mm/yy or mm/dd/yyyy or yy/mm/dd.
Null support– Indicates whether the attribute can have null values.
Length– Represents the length of characters in a value.
Range– The range specifies the lower and upper boundaries of the values the attribute may legally have.
1. Field integrity
In the specific field/cell values must be with in column domain and represent a specific location within at table
2. Row integrity/entity/row integrity
No attribute of a primary key can be null (every tuple must be uniquely identified)
5. Referential Integrity Constraints
A referential integrity constraint is famous as a foreign key constraint. The value of foreign key values is derived
from the Primary key of another table. Similar options exist to deal with referential integrity violations caused by
Update as those options discussed for the Delete operation.
There are two types of referential integrity constraints:
1. Insert Constraint: We can’t inert value in CHILD Table if the value is not stored in MASTER Table
2. Delete Constraint: We can’t delete a value from MASTER Table if the value is existing in CHILD Table
6. Assertions constraints
An assertion is any condition that the database must always satisfy. Domain constraints and Integrity constraints
are special forms of assertions.

BY: MUHAMMAD SHARIF 85


Database Systems Handbook

7. Authorization constraints
We may want to differentiate among the users as far as the type of access they are permitted to various data
values in the database. This differentiation is expressed in terms of Authorization. The most common being:
Read authorization – which allows reading but not the modification of data;
Insert authorization – which allows the insertion of new data but not the modification of existing data
Update authorization – which allows modification, but not deletion.
The three rules that referential integrity enforces are:
1. A foreign key must have a corresponding primary key. (“No orphans” rule.)
2. When a record in a primary table is deleted, all related records referencing the primary key must also be
deleted, which is typically accomplished by using cascade delete.
3. If the primary key for record changes, all corresponding records in other tables using the primary key as a
foreign key must also be modified. This can be accomplished by using a cascade update.
The preceding integrity constraints are included in the data definition language because they occur in most
database applications. However, they do not include a large class of general constraints, sometimes called
semantic integrity constraints, which may have to be specified and enforced on a relational database.
The types of constraints we discussed so far may be called state constraints because they define the constraints that
a valid state of the database must satisfy. Another type of constraint, called transition constraints, can be defined
to deal with state changes in the database. An example of a transition constraint is: “the salary of an employee can
only increase.”

END

BY: MUHAMMAD SHARIF 86


Database Systems Handbook

CHAPTER 5 DATABASE DESIGN STEPS AND IMPLEMENTATIONS

SQL version:
➢ 1970 – Dr. Edgar F. “Ted” Codd described a relational model for databases.
➢ 1974 – Structured Query Language appeared.
➢ 1978 – IBM released a product called System/R.
➢ 1986 – IBM developed the prototype of a relational database, which is standardized by ANSI.
➢ 1989- First ever version launched of SQL
➢ 1999 – SQL 3 launched with features like triggers, object orientation, etc.
➢ SQL2003- window functions, XML-related features, etc.
➢ SQL2006- Support for XML Query Language
➢ SQL2011-improved support for temporal databases
➢ SQL-86 in 1986, the most recent version in 2011 (SQL:2016). Accepted by the American National Standards
Institute (ANSI) in 1986 and by the International Organization for Standardization (ISO) in 1987 Each vendor
provides its implementation (also called SQL dialect) of SQL.
Standard of SQL ANSI and ISO
In 1993, the ANSI and ISO development committees decided to split future SQL development into a multi-part
standard. The Parts, as of December 1995, are:
Part 1: Framework. A non-technical description of how the document is structured.
Part 2: Foundation. The core specification, including all of the new ADT and Object SQL, features; is currently over
800 pages.
Part 3: SQL/CLI. The call level interface. A version dependent only on SQL-92 was published in 1995 as ISO/IEC 9075-
3:1995. A follow-on, providing support for new features in other Parts of SQL is under development.
Part 4: SQL/PSM. The stored procedures specification, including computational completeness. Currently being
processed for DIS Ballot.

BY: MUHAMMAD SHARIF 87


Database Systems Handbook

Part 5: SQL/Bindings. The Dynamic SQL and Embedded SQL bindings are taken from SQL-92. No active new work at
this time, although C++ and Java interfaces are under discussion.
Part 6: SQL/XA. An SQL specialization of the popular XA Interface developed by X/Open (see below).
Part 7: SQL/Temporal. A newly approved SQL subproject to develop enhanced facilities for temporal data
management using SQL.
Part 8: SQL Multimedia (SQL/Mm)
A new ISO/IEC international standardization project for the development of an SQL class library for multimedia
applications was approved in early 1993. This new standardization activity, named SQL Multimedia (SQL/MM), will
specify packages of SQL abstract data type (ADT) definitions using the facilities for ADT specification and invocation
provided in the emerging SQL3 specification.
Part 1 will be a Framework that specifies how the other parts are to be constructed. Each of the other parts will be
devoted to a specific SQL application package.

The following SQL/MM Part structure exists as of December 1995:


Part 1: Framework. A non-technical description of how the document is structured.
Part 2: FullText. Methods and ADTs for text data processing. Only minimal content at present.
Part 3: Spatial. Methods and ADTs for spatial data management. About 125+ pages with active contributions from
Spatial Data experts from 3 national bodies.
Part 4: Gen Purpose Facilities. Methods and ADTs for complex numbers, trig and exponential functions, vectors,
sets, etc.
Embedded SQL
Overcomes the need for pre- and post- processing of the data and provides an interface to applications.
An extension to a host language. Most languages support mechanisms to access databases
Statement Level Interface or Call Level Interface. Example: Retrieving Multiple Tuples with Embedded SQL Using
Cursors.
SLI requires a preprocessor to the compiler along with a library of routines for DB access.
CLI consists only of a library of routines.
Requires concept of a result set and cursor.
Extended SQL
Extended SQL called SQL3 OR SQL-99

BY: MUHAMMAD SHARIF 88


Database Systems Handbook

BY: MUHAMMAD SHARIF 89


Database Systems Handbook

Query-By-Example (QBE)
Query-By-Example (QBE) is the first interactive database query language to exploit such modes of HCI. In QBE, a
query is constructed on an interactive terminal involving two-dimensional ‘drawings’ of one or more relations,

BY: MUHAMMAD SHARIF 90


Database Systems Handbook

visualized in tabular form, which are filled in selected columns with ‘examples’ of data items to be retrieved (thus
the phrase query-by-example).
It is different from SQL, and from most other database query languages, in having a graphical user interface that
allows users to write queries by creating example tables on the screen.
QBE, like SQL, was developed at IBM and QBE is an IBM trademark, but a number of other companies sell QBE-like
interfaces, including Paradox.
A convenient shorthand notation is that if we want to print all fields in some relation, we can place P. under the
name of the relation. This notation is like the SELECT * convention in SQL. It is equivalent to placing a P. in every
field:

BY: MUHAMMAD SHARIF 91


Database Systems Handbook

Example of QBE:

AND, OR Conditions in QBE:

BY: MUHAMMAD SHARIF 92


Database Systems Handbook

Common Table Expressions (CTE)


Common table expressions (CTEs) enable you to name subqueries temporarily for a result set. You then refer to
these like normal tables elsewhere in your query. This can make your SQL easier to write and understand later. CTEs
go in with the clause above the select statement.
Recursive common table expression (CTE)
RCTE is a CTE that references itself. By doing so, the CTE repeatedly executes, and returns subsets of data, until it
returns the complete result set.
A recursive CTE is useful in querying hierarchical data such as organization charts where one employee reports to a
manager or a multi-level bill of materials when a product consists of many components, and each component itself
also consists of many other components.
Key characteristics of SQL
Set-oriented and declarative
Free-form language
Case insensitive
Can be used both interactively from a command prompt or executed by a program

BY: MUHAMMAD SHARIF 93


Database Systems Handbook

Rules to write commands:


➢ Table names cannot exceed 20 characters.
➢ The name of the table must be unique.
➢ Field names also must be unique.
➢ The field list and filed length must be enclosed in parentheses.
➢ The user must specify the field length and type.
➢ The field definitions must be separated with commas.
➢ SQL statements must end with a semicolon.
DBLC/DDLC and SDLC

Hoffer, Ramesh, and Topi (2019) named the database life cycle as the database development activities, consisting of
seven phases:

BY: MUHAMMAD SHARIF 94


Database Systems Handbook

Database Design Phases/Stages

BY: MUHAMMAD SHARIF 95


Database Systems Handbook

BY: MUHAMMAD SHARIF 96


Database Systems Handbook

BY: MUHAMMAD SHARIF 97


Database Systems Handbook

BY: MUHAMMAD SHARIF 98


Database Systems Handbook

BY: MUHAMMAD SHARIF 99


Database Systems Handbook

III. Physical design. The physical design step involves the selection of indexes (access methods),
partitioning, and clustering of data. The logical design methodology in step II simplifies the
approach to designing large relational databases by reducing the number of data dependencies
that need to be analyzed. This is accomplished by inserting conceptual data modeling and
integration steps (II(a) and II(b) of pictures into the traditional relational design
approach.
IV. Database implementation, monitoring, and modification. Once the
design is completed, and the database can be created through the implementation of the formal
schema using the data definition language (DDL) of a DBMS.

BY: MUHAMMAD SHARIF 100


Database Systems Handbook

General Properties of Database Objects


Entity Distinct object, Class, Table, Relation
Entity Set A collection of similar entities. E.g., all employees. All entities in an entity set have the same set of
attributes.
Kinds of Entities
You should also be familiar with different kinds of entities including:
1. Independent entities,
2. Dependent entities
3. Characteristic entities.

Attribute Describes some aspect of the entity/object, characteristics of object. An attribute is a data item that
describes a property of an entity or a relationship
Column or field The column represents the set of values for a specific attribute. An attribute is for a model and a
column is for a table, a column is a column in a database table whereas attribute(s) are externally visible
facets of an object.
A relation instance is a finite set of tuples in the RDBMS system. Relation instances never have duplicate tuples.
Relationship Association between entities, connected entities are called participants, Connectivity describes the
relationship (1-1, 1-M, M-N)
The degree of a relationship refers to the=> number of entities

BY: MUHAMMAD SHARIF 101


Database Systems Handbook

Following the relation in above image consist degree=4, 5=cardinality, data values/cells = 20.
Characteristics of relation
1. Distinct Relation/table name
2. Relations are unordered
3. Cells contain exactly one atomic (Single) value means Each cell (field) must contain a single value
4. No repeating groups
5. Distinct attributes name
6. Value of attribute comes from the same domain
7. Order of attribute has no significant
8. The attributes in R(A1, ...,An) and the values in t = <V1,V2, ..... , Vn> are ordered.
9. Each tuple is a distinct
10. order of tuples that has no significance.
11. tuples may be stored and retrieved in an arbitrary order
12. Tables manage attributes. This means they store information in form of attributes only
13. Tables contain rows. Each row is one record only
14. All rows in a table have the same columns. Columns are also called fields
15. Each field has a data type and a name
16. A relation must contain at least one attribute (column) that identifies each tuple (row) uniquely

Database Table type


Temporary table: Here are RDBMS, which supports temporary tables. Temporary Tables are a great feature that
lets you store and process intermediate results by using the same selection, update, and join capabilities of tables.
Temporary tables store session-specific data. Only the session that adds the rows can see them. This can be handy
to store working data.
In ANSI there are two types of temp tables. There are two types of temporary tables in the Oracle Database: global
and private.
Global Temporary Tables
To create a global temporary table add the clause "global temporary" between create and table. For Example:

BY: MUHAMMAD SHARIF 102


Database Systems Handbook

create global temporary table toys_gtt (


toy_name varchar2(100));
The global temp table is accessible to everyone. Global, you create this and it is registered in the data dictionary, it
lives "forever". the global pertains to the schema definition
Private/Local Temporary Tables
Starting in Oracle Database 18c, you can create private temporary tables. These tables are only visible in your
session. Other sessions can't see the table!
The temporary tables could be very useful in some cases to keep temporary data. Local, it is created "on the fly"
and disappears after its use. you never see it in the data dictionary.
Details of temp tables:
A temporary table is owned by the person who created it and can only be accessed by that user.
A global temporary table is accessible to everyone and will contain data specific to the session using it;
multiple sessions can use the same global temporary table simultaneously. It is a global definition for a temporary
table that all can benefit from.
Local temporary table – These tables are invisible when there is a connection and are deleted when it is closed.
Clone Table: Temporary tables are available in MySQL version 3.23 onwards
There may be a situation when you need an exact copy of a table and the CREATE TABLE . or the SELECT. commands
do not suit your purposes because the copy must include the same indexes, default values, and so forth.
There are Magic Tables (virtual tables) in SQL Server that hold the temporal information of recently inserted and
recently deleted data in the virtual table.
The INSERTED magic table stores the before version of the row, and the DELETED table stores the after version of
the row for any INSERT, UPDATE, or DELETE operations.
A record is a collection of data objects that are kept in fields, each having its name and datatype. A Record can be
thought of as a variable that can store a table row or a set of columns from a table row. Table columns relate to the
fields.

External Tables
An external table is a read-only table whose metadata is stored in the database but whose data
is stored outside the database.

BY: MUHAMMAD SHARIF 103


Database Systems Handbook

Process to create external table

BY: MUHAMMAD SHARIF 104


Database Systems Handbook

Partitioning Tables
Partitioning logically splits up a table into smaller tables according to the partition column(s). So rows with the
same partition key are stored in the same physical location.
There are four types of partitioning available:

BY: MUHAMMAD SHARIF 105


Database Systems Handbook

1. Range
2. List
3. Hash
4. Round robin
To create a partitioned table, you need to:
• Choose a partition method
• State the partition columns
• Define the initial partitions

Table Splitting

BY: MUHAMMAD SHARIF 106


Database Systems Handbook

BY: MUHAMMAD SHARIF 107


Database Systems Handbook

Collections Records

All items are of the same data type All items are different data types

Same data type items are called elements Different data type items are called fields

Syntax: variable_name(index) Syntax: variable_name.field_name

For creating a collection variable you can use %TYPE For creating a record variable you can use %ROWTYPE or
%TYPE

Lists and arrays are examples Tables and columns are examples

Correlated vs. Uncorrelated SQL Expressions


A subquery is correlated when it joins to a table from the parent query. If you don't, then it's uncorrelated.
This leads to a difference between IN and EXISTS. EXISTS returns rows from the parent query, as long as the subquery
finds at least one row.
So the following uncorrelated EXISTS returns all the rows in colors:
select from colors
where exists (
select null from bricks);
Table Organizations
Create a table in Oracle Database that has an organization clause. This defines how it physically stores rows in the
table.
The options for this are:
1. Heap table organization (Some DBMS provide for tables to be created without indexes, and access
data randomly)
2. Index table organization or Index Sequential table.
3. Hash table organization (Some DBMS provide an alternative to an index to access data by trees or
hashing key or hashing function).

By default, tables are heap-organized. This means the database is free to store rows wherever there is space. You
can add the "organization heap" clause if you want to be explicit.

Big picture of database languages and command types

BY: MUHAMMAD SHARIF 108


Database Systems Handbook

SQL offers a DDL and a DML.


Embedded DML are of two types –
Low-level or Procedural DMLs: require a user to specify what data are needed and how to get those data. PLSQL,
Java, and Relational Algebra are the best examples. It can be used for query optimization.
High-level or Declarative DMLs (also referred to as non-procedural DMLs): require a user to specify what data are
needed without specifying how to get those data. SQL or Google Search are the best examples. It is not suitable for
query optimization. TRC and DRC are declarative languages.

BY: MUHAMMAD SHARIF 109


Database Systems Handbook

BY: MUHAMMAD SHARIF 110


Database Systems Handbook

BY: MUHAMMAD SHARIF 111


Database Systems Handbook

• Windowing Clause When you use order by, the database adds a default windowing clause of range
between unbounded preceding and current row.
• Sliding Windows As well as running totals so far, you can change the windowing clause to be a subset of
the previous rows.
The following shows the total weight of:
1. The current row + the previous row
2. All rows with the same weight as the current + all rows with a weight one less than the current
Strategies for Schema design in DBMS
Top-down strategy –
Bottom-up strategy –
Inside-Out Strategy –
Mixed Strategy –
Identifying correspondences and conflicts among the schema integration in DBMS
Naming conflict
Type conflicts
Domain conflicts
Conflicts among constraints

BY: MUHAMMAD SHARIF 112


Database Systems Handbook

Process of SQL
When we are executing the command of SQL on any Relational database management system, then the system
automatically finds the best routine to carry out our request, and the SQL engine determines how to interpret that
particular command.
Structured Query Language contains the following four components in its process:
1. Query Dispatcher
2. Optimization Engines
3. Classic Query Engine
4. SQL Query Engine, etc.

SQL Programming
Approaches to Database Programming| Comparing the Three Approaches
In this section, we briefly compare the three approaches for database programming
and discuss the advantages and disadvantages of each approach.
Several techniques exist for including database interactions in application programs.
The main approaches for database programming are the following:
1. Embedding database commands in a general-purpose programming language.
Embedded SQL Approach. The main advantage of this approach is that the query text is part of
the program source code itself, and hence can be checked for syntax errors and validated
against the database schema at compile time.

BY: MUHAMMAD SHARIF 113


Database Systems Handbook

2. Using a library of database functions. A library of functions is made available to the


host programming language for database calls.
Library of Function Calls Approach. This approach provides more flexibility in that queries can
be generated at runtime if needed.
3. Designing a brand-new language. A database programming language is designed from
scratch to be compatible with the database model and query language.
Database Programming Language Approach. This approach does not suffer from the impedance
mismatch problem, as the programming language data types are the same as the database data
types.
Standard SQL order of execution

BY: MUHAMMAD SHARIF 114


Database Systems Handbook

BY: MUHAMMAD SHARIF 115


Database Systems Handbook

BY: MUHAMMAD SHARIF 116


Database Systems Handbook

TYPES OF SUB QUERY (SUBQUERY)


Subqueries Types
1. FROM Subqueries
2. Attribute List Subqueries
3. Another example of an inline subquery
4. Correlated Subqueries
5. WHERE Subqueries
6. IN Subqueries
7. HAVING Subqueries
8. Multirow Subquery Operators: ANY and ALL
Scalar Subqueries
Scalar subqueries return one column and at most one row. You can replace a column with a scalar subquery in most
cases.

BY: MUHAMMAD SHARIF 117


Database Systems Handbook

We can once again be faced with possible ambiguity among attribute names if attributes of the same name exist—
one in a relation in the FROM clause of the outer query, and another in a relation in the FROM clause of the nested
query. The rule is that a reference to an unqualified attribute refers to the relation declared in the innermost nested
query.

BY: MUHAMMAD SHARIF 118


Database Systems Handbook

BY: MUHAMMAD SHARIF 119


Database Systems Handbook

BY: MUHAMMAD SHARIF 120


Database Systems Handbook

Some important differences in DML statements:


Difference between DELETE and TRUNCATE statements
There is a slight difference b/w delete and truncate statements. The DELETE statement only deletes the rows from
the table based on the condition defined by the WHERE clause or deletes all the rows from the table when the
condition is not specified.
But it does not free the space contained by the table.
The TRUNCATE statement: is used to delete all the rows from the table and free the containing space.
Difference b/w DROP and TRUNCATE statements
When you use the drop statement it deletes the table's row together with the table's definition so all the
relationships of that table with other tables will no longer be valid.
When you drop a table
Table structure will be dropped
Relationships will be dropped
Integrity constraints will be dropped
Access privileges will also be dropped

BY: MUHAMMAD SHARIF 121


Database Systems Handbook

On the other hand, when we TRUNCATE a table, the table structure remains the same, so you will not face any of
the above problems.
In general, ANSI SQL permits the use of ON DELETE and ON UPDATE clauses to cover
CASCADE, SET NULL, or SET DEFAULT.
MS Access, SQL Server, and Oracle support ON DELETE CASCADE.
MS Access and SQL Server support ON UPDATE CASCADE.
Oracle does not support ON UPDATE CASCADE.
Oracle supports SET NULL.
MS Access and SQL Server do not support SET NULL.
Refer to your product manuals for additional information on referential constraints.
While MS Access does not support ON DELETE CASCADE or ON UPDATE CASCADE at the SQL command-line level,

Types of Multitable INSERT statements

BY: MUHAMMAD SHARIF 122


Database Systems Handbook

DML before and after processing in triggers

Database views and their types:


The definition of views is one of the final stages in database design since it relies on the logical schema being
finalized. Views are “virtual tables” that are a selection of rows and columns from one or more real tables and can
include calculated values in additional virtual columns.

A view is a virtual relation or one that does not exist but is dynamically derived it can be constructed by performing
operations (i.e., select, project, join, etc.) on values of existing base relation (a named relation that is designed in a
conceptual schema whose tuples are physically stored in the database). Views are viewable in the external
schema.

BY: MUHAMMAD SHARIF 123


Database Systems Handbook

Types of View
1. User-defined view
a. Simple view (Single table view)
b. Complex View (Multiple tables having joins, group by, and functions)
c. Inline View (Based on a subquery in from clause to create a temp table and form a complex
query)
d. Materialized View (It stores physical data, definitions of tables)
e. Dynamic view
f. Static view
2. Database View
3. System Defined Views
4. Information Schema View
5. Catalog View
6. Dynamic Management View
7. Server-scoped Dynamic Management View
8. Sources of Data Dictionary Information View
a. General Views
b. Transaction Service Views
c. SQL Service Views
Advantages of View:
Provide security
Hide specific parts of the database from certain users
Customize base relations based on their needs
It supports the external model
Provide logical independence
Views don't store data in a physical location.
Views can provide Access Restriction, since data insertion, update, and deletion is not possible with the
view.
We can DML on view if it is derived from a single base relation, and contains the primary key or a
candidate key
When can a view be updated?
1. The view is defined based on one and only one table.
2. The view must include the PRIMARY KEY of the table based upon which the view has been created.
3. The view should not have any field made out of aggregate functions.
4. The view must not have any DISTINCT clause in its definition.
5. The view must not have any GROUP BY or HAVING clause in its definition.
6. The view must not have any SUBQUERIES in its definitions.
7. If the view you want to update is based upon another view, the latter should be updatable.
8. Any of the selected output fields (of the view) must not use constants, strings, or value expressions.

END

BY: MUHAMMAD SHARIF 124


Database Systems Handbook

CHAPTER 6 DATABASE NORMALIZATION AND DATABASE JOINS


Quick Overview of 12 Codd's Rule
Every database has tables, and constraints cannot be referred to as a rational database system. And if any database
has only a relational data model, it cannot be a Relational Database System (RDBMS). So, some rules define a
database to be the correct RDBMS. These rules were developed by Dr. Edgar F. Codd (E.F. Codd) in 1985, who has
vast research knowledge on the Relational Model of database Systems. Codd presents his 13 rules for a database to
test the concept of DBMS against his relational model, and if a database follows the rule, it is called a true relational
database (RDBMS). These 13 rules are popular in RDBMS, known as Codd's 12 rules.
Rule 0: The Foundation Rule
The database must be in relational form. So that the system can handle the database through its relational
capabilities.
Rule 1: Information Rule
A database contains various information, and this information must be stored in each cell of a table in the form of
rows and columns.
Rule 2: Guaranteed Access Rule
Every single or precise data (atomic value) may be accessed logically from a relational database using the
combination of primary key value, table name, and column name. Each attribute of relation has a name.
Rule 3: Systematic Treatment of Null Values
This rule defines the systematic treatment of Null values in database records. The null value has various meanings
in the database, like missing the data, no value in a cell, inappropriate information, unknown data, and the primary
key should not be null.
Rule 4: Active/Dynamic Online Catalog based on the relational model
It represents the entire logical structure of the descriptive database that must be stored online and is known as a
database dictionary. It authorizes users to access the database and implement a similar query language to access
the database.
Rule 5: Comprehensive Data SubLanguage Rule
The relational database supports various languages, and if we want to access the database, the language must be
explicit, linear, or well-defined syntax, and character strings and supports the comprehensive: data definition, view
definition, data manipulation, integrity constraints, and limit transaction management operations. If the database
allows access to the data without any language, it is considered a violation of the database.
Rule 6: View Updating Rule
All views tables can be theoretically updated and must be practically updated by the database systems.
Rule 7: Relational Level Operation (High-Level Insert, Update, and delete) Rule
A database system should follow high-level relational operations such as insert, update, and delete in each level or
a single row. It also supports the union, intersection, and minus operation in the database system.
Rule 8: Physical Data Independence Rule
All stored data in a database or an application must be physically independent to access the database. Each data
should not depend on other data or an application. If data is updated or the physical structure of the database is
changed, it will not show any effect on external applications that are accessing the data from the database.
Rule 9: Logical Data Independence Rule
It is similar to physical data independence. It means, that if any changes occurred to the logical level (table
structures), it should not affect the user's view (application). For example, suppose a table either split into two tables,
or two table joins to create a single table, these changes should not be impacted on the user view application.
Rule 10: Integrity Independence Rule
A database must maintain integrity independence when inserting data into a table's cells using the SQL query
language. All entered values should not be changed or rely on any external factor or application to maintain integrity.
It is also helpful in making the database independent for each front-end application.

BY: MUHAMMAD SHARIF 125


Database Systems Handbook

Rule 11: Distribution Independence Rule


The distribution independence rule represents a database that must work properly, even if it is stored in different
locations and used by different end-users. Suppose a user accesses the database through an application; in that case,
they should not be aware that another user uses particular data, and the data they always get is only located on one
site. The end users can access the database, and these access data should be independent for every user to perform
the SQL queries.
Rule 12: Non-Subversion Rule
The non-submersion rule defines RDBMS as a SQL language to store and manipulate the data in the database. If a
system has a low-level or separate language other than SQL to access the database system, it should not subvert or
bypass integrity to transform data.
Big picture of Codd rules:
Big picture

BY: MUHAMMAD SHARIF 126


Database Systems Handbook

Normalizations:
Ans It is a refinement technique, it reduces redundancy and eliminates undesirable’s characteristics like insertion,
updating, and deletions. Removal of anomalies and reputations.
That normalization and E-R modeling are used concurrently to produce a good database design.
Advantages of normalization
Reduces data redundancies
Expending entities
Helps eliminate data anomalies
Produces controlled redundancies to link tables
Cost more processing efforts

BY: MUHAMMAD SHARIF 127


Database Systems Handbook

Series steps called normal forms


1NF - First normal form
2NF - Second normal form
3NF - Third normal form
3.5NF BCNF
4NF - Fourth normal form
5NF - Fifth normal form

Big picture of Normal Forms

Anomalies of a Bad Database Design:


The table displays data redundancies which yield the following anomalies
1. Update anomalies
Changing the price of product ID 4 requires an update in several records. If data items are scattered and are not
linked to each other properly, then it could lead to strange situations.
2. Insertion anomalies
The new employee must be assigned a project (phantom project). We tried to insert data in a record that does not
exist at all.
3. Deletion anomalies
If an employee is deleted, other vital data is lost. We tried to delete a record, but parts of it were left undeleted
because of unawareness, the data is also saved somewhere else.
if we delete the Dining Table from Order 1006, we lose information concerning this item's finish and price

BY: MUHAMMAD SHARIF 128


Database Systems Handbook

Anomalies type w.r.t Database table constraints

In most cases, if you can place your relations in the third normal form (3NF), then you will have avoided most of
the problems common to bad relational designs. Boyce-Codd (BCNF) and the fourth normal form (4NF) handle
special situations that arise only occasionally.

➢ 1st Normal form:


Normally every table before normalization has repeating groups In the first normal for conversion we do eliminate
Repeating groups in table records
Proper primary key developed/All attributes depends on the primary key.
Uniquely identifies attribute values (rows) (Fields)
Dependencies can be identified, No multivalued attributes
Every attribute value is atomic
A functional dependency exists when the value of one thing is fully determined by another. For example, given the
relation EMP(empNo, emp name, sal), attribute empName is functionally dependent on attribute empNo. If we
know empNo, we also know the empName.
Types of dependencies
Partial (Based on part of composite primary key)
Transitive (One non-prime attribute depends on another nonprime attribute)

BY: MUHAMMAD SHARIF 129


Database Systems Handbook

PROJ_NUM,EMP_NUM → PROJ_NAME, EMP_NAME, JOB_CLASS,CHG_HOUR, HOURS

➢ 2nd Normal form:


Start with the 1NF format:
Write each key component on a separate line
Partial dependency has been ended by separating the table with its original key as a new table.
Each component is a new table
Write dependent attributes after each key
Keys with their respective attributes would be a new table.
Still possible to exhibit transitive dependency
A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional and dependent on the primary
key. No partial dependency should exist in the relation
1NF PLUS every non-key attribute is fully functionally dependent on the ENTIRE primary key
➢ 3rd Normal form:
Create a separate table(s) to eliminate transitive functional dependencies
2NF PLUS no transitive dependencies (functional dependencies on non-primary-key attributes)
In 3NF no transitive functional dependency exists for non-prime attributes in a relation. It will be when a non-key
attribute is dependent on a non-key attribute or a functional dependency exists between non-key attributes.
➢ Boyce-Codd Normal Form (BCNF)
3NF table with one candidate key is already in BCNF
It contains a fully functional dependency
Every determinant in the table is a candidate key.
BCNF is the advanced version of 3NF. It is stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

➢ 4th Fourth normal form (4NF)


A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency.
For a dependency A → B, if for a single value of A, multiple values of B exist, then the relationship will be a multi-
valued dependency.

BY: MUHAMMAD SHARIF 130


Database Systems Handbook

➢ 5th Fifth normal form (5NF)


A relation is in 5NF if it is in 4NF and does not contain any join dependency and joining should be lossless.
5NF is satisfied when all the tables are broken into as many tables as possible to avoid redundancy.
5NF is also known as Project-join normal form (PJ/NF).

Denormalization in Databases
Denormalization is a database optimization technique in which we add redundant data to one or more tables. This
can help us avoid costly joins in a relational database. Note that denormalization does not mean not doing
normalization. It is an optimization technique that is applied after normalization.

BY: MUHAMMAD SHARIF 131


Database Systems Handbook

Types of Denormalization
The two most common types of denormalization are two entities in a one-to-one relationship and two entities in a
one-to-many relationship.
Pros of Denormalization: -
Retrieving data is faster since we do fewer joins
Queries to retrieve can be simpler (and therefore less likely to have bugs),
since we need to look at fewer tables.
Cons of Denormalization: -
Updates and inserts are more expensive.
Denormalization can make an update and insert code harder to write.
Data may be inconsistent. Which is the “correct” value for a piece of data?
Data redundancy necessities more storage.
Relational Decomposition
Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and
redundancy.
When a relation in the relational model is not inappropriate normal form then the decomposition of a relationship
is required. In a database, it breaks the table into multiple tables.
Types of Decomposition
1 Lossless Decomposition
If the information is not lost from the relation that is decomposed, then the decomposition will be lossless. The
process of normalization depends on being able to factor or decompose a table into two or smaller tables, in such a
way that we can recapture the precise content of the original table by joining the decomposed parts.
2 Lossy Decomposition
Data will be lost for more decomposition of the table.

Database SQL Joins


Join is a combination of a Cartesian product followed by a selection process.
Database join types:
➢ Non-ANSI Format Join
1. Non-Equi join
2. Self-join
3. Equi Join
➢ ANSI format join
1. Semi Join
2. Left/right Anti Semi join
3. Bloom Join
4. Natural Join(Inner join, self join, theta join, cross join/cartesian product, conditional join)

BY: MUHAMMAD SHARIF 132


Database Systems Handbook

5. Inner join (Equi and theta join/self-join)


6. Theta (θ)
7. Cross join
8. Cross products
9. Multi-join operation
10. Outer
o Left outer join
o Right outer join
o Full outer join
➢ Several different algorithms can be used to implement joins (natural, condition-join)
1. Nested Loops join
o Simple nested loop join
o Block nested loop join
o Index nested loop join
2. Sort merge join/external sort join
3. Hash join

BY: MUHAMMAD SHARIF 133


Database Systems Handbook

END

BY: MUHAMMAD SHARIF 134


Database Systems Handbook

CHAPTER 7 FUNCTIONAL DEPENDENCIES IN THE DATABASE MANAGEMENT SYSTEM


Types of Dependencies Schema-bound vs Non-schema-bound
There are two types of dependencies: Schema-bound and Non-schema-bound dependencies.
A Schema-bound dependency (SCHEMABINDING) prevents referenced objects from being altered or dropped as
long as the referencing object exists
A Non-schema-bound dependency: does not prevent the referenced object from being altered or dropped.
Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional dependency
says that if two tuples have the same values for attributes A1, A2,..., An, then those two tuples must have to have
same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally determines Y. The
left-hand side attributes determine the values of attributes on the right-hand side.
Inference Rule (IR)
Armstrong's axioms are the basic inference rule.
Armstrong's axioms are used to conclude functional dependencies on a relational database.
The inference rule is a type of assertion. It can apply to a set of FD (functional dependency) to derive other FD.
Armstrong’s Axioms
The inclusion rule is one rule of implication by which FDs can be generated that are guaranteed to hold for all possible
tables. It turns out that from a small set of basic rules of implication, we can derive all others. We list here three
basic rules that we call Armstrong’s Axioms
Definition: Armstrong’s Axioms. Assume in what follows that we are given a table T and that all sets of attributes X,
Y, and Z are contained in Head(T). Then we have the following rules of implication.
1. Inclusion rule: If Y ⊆ X, then X → Y.
2. Transitivity rule: If X → Y and Y → Z, then X → Z.
3. Augmentation rule: If X → Y, then X Z → Y Z.
Example: Given a relational Schema R( A, B, C, D) and set of Function Dependency FD = { B → A, AD → BC, C → ABD
}. Find the canonical cover?
Solution: Given FD = { B → A, AD → BC, C → ABD }, now decompose the FD using decomposition rule( Armstrong
Axiom ).
B→A
AD → B ( using decomposition inference rule on AD → BC)
AD → C ( using decomposition inference rule on AD → BC)
C → A ( using decomposition inference rule on C → ABD)
C → B ( using decomposition inference rule on C → ABD)
C → D ( using decomposition inference rule on C → ABD)
Now set of FD = { B → A, AD → B, AD → C, C → A, C → B, C → D }
Functional Dependency (FD) is a constraint that determines the relation of one attribute to another attribute.
A functional dependency is denoted by an arrow “→”. The functional dependency of X on Y is represented by X →
Y.
In this example, if we know the value of the Employee number, we can obtain Employee Name, city, salary, etc. By
this, we can say that the city, Employee Name, and salary are functionally dependent on the Employee number.
Key Terms Description
Axioms are a set of inference rules used to infer all the functional dependencies on a relational
Axiom
database.

BY: MUHAMMAD SHARIF 135


Database Systems Handbook

Key Terms Description


It is a rule that suggests if you have a table that appears to contain two entities that are
Decomposition determined by the same primary key then you should consider breaking them up into two
different tables.
Dependent It is displayed on the right side of the functional dependency diagram.
Determinant It is displayed on the left side of the functional dependency Diagram.
It suggests that if two tables are separate, and the PK is the same, you should consider putting
Union
them. Together

The Functional dependency has 6 types of inference rules:


1. Reflexive Rule (IR1)
2. Augmentation Rule (IR2)
3. Transitive Rule (IR3)
4. Union Rule (IR4)
5. Decomposition Rule (IR5)
6. Pseudo transitive Rule (IR6)
Functional Dependency type
Dependencies in DBMS are a relation between two or more attributes. It has the following types in DBMS
Functional Dependency
If the information stored in a table can uniquely determine another information in the same table, then it is called
Functional Dependency. Consider it as an association between two attributes of the same relation.
Partial Dependency
Partial Dependency occurs when a nonprime attribute is functionally dependent on part of a candidate key.
Multivalued Dependency
When the existence of one or more rows in a table implies one or more other rows in the same table, then the
Multi-valued dependencies occur.
Transitive Dependency
When an indirect relationship causes functional dependency it is called Transitive Dependency.
Fully-functionally Dependency
An attribute is fully functional dependent on another attribute if it is Functionally Dependent on that attribute and
not on any of its proper subset.
Trivial functional dependency
A → B has trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A → A, B → B
Non-trivial functional dependency
A → B has a non-trivial functional dependency if B is not a subset of A.
When A intersection B is NULL, then A → B is called a complete non-trivial.
Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial FD. Trivial
FDs always hold.
Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD.
Completely non-trivial − If an FD X → Y holds, where x intersects Y = Φ, it is said to be a completely non-trivial FD.
Multivalued Dependency and its types
1. Join Dependency
2. Join decomposition is a further generalization of Multivalued dependencies.
3. Inclusion Dependency

BY: MUHAMMAD SHARIF 136


Database Systems Handbook

Example:

Dependency Preserving
If a relation R is decomposed into relations R1 and R2, then the dependencies of R either must be a part of R1 or
R2 or must be derivable from the combination of functional dependencies of R1 and R2.
For example, suppose there is a relation R (A, B, C, D) with a functional dependency set (A->BC). The relational R is
decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a part of relation
R1(ABC)
Multivalued Dependency
Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on
a third attribute.
A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it
always requires at least three attributes.
Join Dependency
Join decomposition is a further generalization of Multivalued dependencies.
If the join of R1 and R2 over C is equal to relation R, then we can say that a join dependency (JD) exists.
Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D).
Alternatively, R1 and R2 are lossless decompositions of R.
A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join decomposition.
The (A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the relation R.
Here, (R1, R2, R3) is used to indicate that relation R1, R2, R3, and so on are a JD of R.
Inclusion Dependency
Multivalued dependency and join dependency can be used to guide database design although they both are less
common than functional dependencies. The inclusion dependency is a statement in which some columns of a
relation are contained in other columns.
Canonical Cover/ irreducible
A canonical cover or irreducible set of functional dependencies FD is a simplified set of FD that has a similar closure
as the original set FD.
Extraneous attributes
An attribute of an FD is said to be extraneous if we can remove it without changing the closure of the set of FD.
Closure properties of attributes
Closure of an Attribute: Closure of an Attribute can be defined as a set of attributes that can be functionally
determined from it.
Closure of a set of attributes X concerning F is the set X+ of all attributes that are functionally determined by X

BY: MUHAMMAD SHARIF 137


Database Systems Handbook

END

BY: MUHAMMAD SHARIF 138


Database Systems Handbook

CHAPTER 8 DATABASE TRANSACTION, SCHEDULES, AND DEADLOCKS


Overview: Transaction
A Transaction is an atomic sequence of actions in the Database (reads and writes, commit, and abort)
Each Transaction must be executed completely and must leave the Database in a consistent state. The transaction
is a set of logically related operations. It contains a group of tasks.
A transaction is an action or series of actions. It is performed by a single user to perform operations for accessing
the contents of the database. A transaction can be defined as a group of tasks. A single task is the minimum
processing unit which cannot be divided further.
ACID
Data concurrency means that many users can access data at the same time.
Data consistency means that each user sees a consistent view of the data, including visible
changes made by the user's transactions and transactions of other users.
The fundamental difference between ACID and BASE database models is the way they deal with
this limitation.
The ACID model provides a consistent system for Relational databases.
The BASE model provides high availability for Non-relational databases like NoSQL MongoDB
Techniques for achieving ACID properties
Write-ahead logging and checkpointing
Serializability and two-phase locking
Some important points:
Property Responsibility for maintaining Transactions:
Atomicity Transaction Manager
Consistency Application programmer / Application logic checks/ it related to rollbacks
Isolation Concurrency Control Manager/Handle concurrency
Durability Recovery Manager (Algorithms for Recovery and Isolation Exploiting Semantics (aries)
Handle failures, Logging, and recovery (A, D)
Concurrency control, rollback, application programmer (C, I)
To maintain the integrity of the data, there are four properties described in the database management system,
which are known as the ACID properties.
Automaticity and its types: Data remains atomic, executed completely, or should not be executed at all, the
operation should not break in between or execute partially. Either all R(A) and W(A) are done or none is done.
Consistency: The word consistency means that the value should remain preserved always, the database remains
consistent before and after the transaction.
Isolation and levels of isolation: The term 'isolation' means separation. Any changes that occur in any particular
transaction will not be seen by other transactions until the change is not committed in the memory.
A transaction isolation level is defined by the following phenomena:
Concurrency Control Problems and isolation levels are the same
The Three Bad Transaction Dependencies. Locks are often used to prevent these dependencies

BY: MUHAMMAD SHARIF 139


Database Systems Handbook

Dirty Read – A Dirty read is a situation when a transaction reads data that has not yet been committed. For example,
Let’s say transaction 1 updates a row and leaves it uncommitted, meanwhile, Transaction 2 reads the updated row.
If transaction 1 rolls back the change, transaction 2 will have read data that is considered never to have existed.
Lost Updates occur when multiple transactions select the same row and update the row based on the value
selected
Non Repeatable read – Non Repeatable read occurs when a transaction reads the same row twice and gets a
different value each time. For example, suppose transaction T1 reads data. Due to concurrency, another
transaction T2 updates the same data and commits, Now if transaction T1 rereads the same data, it will retrieve a
different value.
Phantom Read – Phantom Read occurs when two same queries are executed, but the rows retrieved by the two, are
different. For example, suppose transaction T1 retrieves a set of rows that satisfy some search criteria. Now,
Transaction T2 generates some new rows that match the search criteria for transaction T1. If transaction T1 re-
executes the statement that reads the rows, it gets a different set of rows this time.
Based on these phenomena, The SQL standard defines four isolation levels :
Read Uncommitted – Read Uncommitted is the lowest isolation level. In this level, one transaction may read not
yet committed changes made by another transaction, thereby allowing dirty reads. At this level, transactions are
not isolated from each other.
Read Committed – This isolation level guarantees that any data read is committed at the moment it is read. Thus it
does not allows dirty reading. The transaction holds a read or write lock on the current row, and thus prevents
other transactions from reading, updating, or deleting it.
Repeatable Read – This is the most restrictive isolation level. The transaction holds read locks on all rows it
references and writes locks on all rows it inserts, updates, or deletes. Since other transactions cannot read, update
or delete these rows, consequently it avoids non-repeatable read.
Serializable – This is the highest isolation level. A serializable execution is guaranteed to be serializable. Serializable
execution is defined to be an execution of operations in which concurrently executing transactions appear to be
serially executing.

BY: MUHAMMAD SHARIF 140


Database Systems Handbook

Durability: Durability ensures the permanency of something. In DBMS, the term durability ensures that the data
after the successful execution of the operation becomes permanent in the database. If a transaction is committed,
it will remain even error, power loss, etc.
ACID Example:

States of Transaction
Begin, active, partially committed, failed, committed, end, aborted
Aborted details are necessary
If any of the checks fail and the transaction has reached a failed state then the database recovery system will make
sure that the database is in its previous consistent state. If not then it will abort or roll back the transaction to bring
the database into a consistent state.
If the transaction fails in the middle of the transaction then before executing the transaction, all the executed
transactions are rolled back to their consistent state.
After aborting the transaction, the database recovery module will select one of the two operations:
Re-start the transaction
Kill the transaction

BY: MUHAMMAD SHARIF 141


Database Systems Handbook

The scheduler
A module that schedules the transaction’s actions, ensuring serializability
Two main approaches
1. Pessimistic: locks
2. Optimistic: time stamps, MV, validation
Scheduling
A schedule is responsible for maintaining jobs/transactions if many jobs are entered at the
same time(by multiple users) to execute state and read/write operations performed at that jobs.
A schedule is a sequence of interleaved actions from all transactions. Execution of several Facts while preserving
the order of R(A) and W(A) of any 1 Xact.
Note: Two schedules are equivalent if:
Two Schedules are equivalent if they have the same dependencies.
They contain the same transactions and operations
They order all conflicting operations of non-aborting transactions in the same way
A schedule is serializable if it is equivalent to a serial schedule

Serial Schedule
The serial schedule is a type of schedule where one transaction is executed completely before starting another
transaction.
Example of Serial Schedule

BY: MUHAMMAD SHARIF 142


Database Systems Handbook

Non-Serial Schedule and its types:


If interleaving of operations is allowed, then there will be a non-serial schedule.
Serializable schedule
Serializability is a guarantee about transactions over one or more objects
Doesn’t impose real-time constraints
The schedule is serializable if the precedence graph is acyclic
The serializability of schedules is used to find non-serial schedules that allow the transaction to execute
concurrently without interfering with one another.
Example of Serializable

A serializable schedule always leaves the database in a consistent state. A serial schedule is always a serializable
schedule because, in a serial schedule, a transaction only starts when the other transaction finished execution.
However, a non-serial schedule needs to be checked for Serializability.
A non-serial schedule of n number of transactions is said to be a serializable schedule if it is equivalent to the serial
schedule of those n transactions. A serial schedule doesn’t allow concurrency, only one transaction executes at a
time, and the other stars when the already running transaction is finished.

BY: MUHAMMAD SHARIF 143


Database Systems Handbook

Linearizability: a guarantee about single operations on single objects Once the write completes, all later reads (by
wall clock) should reflect that write.
Types of Serializability
There are two types of Serializability.
1. Conflict Serializability
2. View Serializability
Conflict Serializable A schedule is conflict serializable if it is equivalent to some serial schedule
Non-conflicting operations can be reordered to get a serial schedule.
In general, a schedule is conflict-serializable if and only if its precedence graph is acyclic
A precedence graph is used for Testing for Conflict-Serializability

View serializability/view equivalence is a concept that is used to compute whether schedules are View-
Serializable or not. A schedule is said to be View-Serializable if it is view equivalent to a Serial Schedule (where no
interleaving of transactions is possible).

BY: MUHAMMAD SHARIF 144


Database Systems Handbook

Note: A schedule is view serializable if it is view equivalent to a serial schedule


If a schedule is conflict serializable, then it is also viewed as serializable but not vice versa
Non Serializable Schedule

The non-serializable schedule is divided into two types, Recoverable and Non-recoverable Schedules.
1. Recoverable Schedule(Cascading Schedule, cascades Schedule, strict Schedule). In a recoverable schedule, if a
transaction T commits, then any other transaction that T read from must also have committed.
A schedule is recoverable if:
It is conflict-serializable, and
Whenever a transaction T commits, all transactions that have written elements read by T have already been
committed.

Example of Recoverable Schedule

BY: MUHAMMAD SHARIF 145


Database Systems Handbook

2. Non-Recoverable Schedule
The relation between various types of schedules can be depicted as:

It can be seen that:


1. Cascadeless schedules are stricter than recoverable schedules or are a subset of recoverable schedules.
2. Strict schedules are stricter than cascade-less schedules or are a subset of cascade-less schedules.
3. Serial schedules satisfy constraints of all recoverable, cascadeless, and strict schedules and hence is a subset of
strict schedules.
Note: Linearizability + serializability = strict serializability
Transaction behavior equivalent to some serial execution
And that serial execution agrees with real-time
Serializability Theorems
Wormhole Theorem: A history is isolated if, and only if, it has no wormhole transactions.
Locking Theorem: If all transactions are well-formed and two-phase, then any legal history will be isolated.
Locking Theorem (converse): If a transaction is not well-formed or is not two-phase, then it is possible to write
another transaction, such that the resulting pair is a wormhole.
Rollback Theorem: An update transaction that does an UNLOCK and then a ROLLBACK is not two-phase.
Thomas Write Rule provides the guarantee of serializability order for the protocol. It improves the Basic
Timestamp Ordering Algorithm.
The basic Thomas writing rules are as follows:

BY: MUHAMMAD SHARIF 146


Database Systems Handbook

If TS(T) < R_TS(X) then transaction T is aborted and rolled back, and the operation is rejected.
If TS(T) < W_TS(X) then don't execute the W_item(X) operation of the transaction and continue
processing.
Different Types of reading Write Conflict in DBMS
As I mentioned earlier, the read operation is safe as it does modify any information. So, there is no Read-Read (RR)
conflict in the database.
Problem 1: Reading Uncommitted Data (WR Conflicts)
Reading the value of an uncommitted object might yield an inconsistency
Dirty Reads or Write-then-Read (WR) Conflicts.
Problem 2: Unrepeatable Reads (RW Conflicts)
Reading the same object twice might yield an inconsistency
Read-then-Write (RW) Conflicts (Write-After-Read)
Problem 3: Overwriting Uncommitted Data (WW Conflicts)
Overwriting an uncommitted object might yield an inconsistency
Lost Update or Write-After-Write (WW) Conflicts.
So, there are three types of conflict in the database transaction.
Write-Read (WR) conflict
Read-Write (RW) conflict
Write-Write (WW) conflict
What is Write-Read (WR) conflict?
This conflict occurs when a transaction read the data which is written by the other transaction before committing.
What is Read-Write (RW) conflict?
Transaction T2 is Writing data that is previously read by transaction T1.
Here if you look at the diagram above, data read by transaction T1 before and after T2 commits is different.
What is Write-Write (WW) conflict?
Here Transaction T2 is writing data that is already written by other transaction T1. T2 overwrites the data written
by T1. It is also called a blind write operation.
Data written by T1 has vanished. So it is data update loss.
Phase Commit (PC)
One-phase commit
The Single Phase Commit protocol is more efficient at run time because all updates are done without any explicit
coordination.
BEGIN
INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)
VALUES (1, 'Ramesh', 32, 'Ahmedabad', 2000.00 );
INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)
VALUES (2, 'Khilan', 25, 'Delhi', 1500.00 );
COMMIT;
Two-Phase Commit (2PC)
The most commonly used atomic commit protocol is a two-phase commit. You may notice that is very similar to
the protocol that we used for total order multicast. Whereas the multicast protocol used a two-phase approach to
allow the coordinator to select a commit time based on information from the participants, a two-phase commit
lets the coordinator select whether or not a transaction will be committed or aborted based on information from
the participants.
Three-phase Commit

BY: MUHAMMAD SHARIF 147


Database Systems Handbook

Another real-world atomic commit protocol is a three-phase commit (3PC). This protocol can reduce the amount of
blocking and provide for more flexible recovery in the event of failure. Although it is a better choice in unusually
failure-prone environments, its complexity makes 2PC the more popular choice.
Transaction atomicity using a two-phase commit
Transaction serializability using distributed locking.
DBMS Deadlock Types or techniques
All lock requests are made to the concurrency-control manager. Transactions proceed only once the lock request is
granted. A lock is a variable, associated with the data item, which controls the access of that data item. Locking is
the most widely used form of concurrency control.
Deadlock Example:

Lock modes and types

BY: MUHAMMAD SHARIF 148


Database Systems Handbook

1. Binary Locks: A Binary lock on a data item can either be locked or unlocked states.
2. Shared/exclusive: This type of locking mechanism separates the locks in DBMS based on their uses. If a lock is
acquired on a data item to perform a write operation, it is called an exclusive lock.
3. Simplistic Lock Protocol: This type of lock-based protocol allows transactions to obtain a lock on every object
before beginning operation. Transactions may unlock the data item after finishing the ‘write’ operation.
4. Pre-claiming Locking: Two-Phase locking protocol which is also known as a 2PL protocol needs a transaction
should acquire a lock after it releases one of its locks. It has 2 phases growing and shrinking.
5. Shared lock: These locks are referred to as read locks, and denoted by 'S'.
If a transaction T has obtained Shared-lock on data item X, then T can read X, but cannot write X. Multiple Shared
locks can be placed simultaneously on a data item.
A deadlock is an unwanted situation in which two or more transactions are waiting indefinitely for one another to
give up locks.

Four necessary conditions for deadlock


Mutual exclusion -- only one process at a time can use the resource
Hold and wait -- there must exist a process that is holding at least one resource and is waiting to acquire
additional resources that are currently being held by other processes.
No preemption -- resources cannot be preempted; a resource can be released only voluntarily by the
process holding it.
Circular wait – one waits for others, others wait for one.

BY: MUHAMMAD SHARIF 149


Database Systems Handbook

The Bakery algorithm is one of the simplest known solutions to the mutual exclusion problem for the general case
of the N process. The bakery Algorithm is a critical section solution for N processes. The algorithm preserves the first
come first serve the property.
Before entering its critical section, the process receives a number. The holder of the smallest number enters the
critical section.
Deadlock detection
This technique allows deadlock to occur, but then, it detects it and solves it. Here, a database is periodically checked
for deadlocks. If a deadlock is detected, one of the transactions, involved in the deadlock cycle, is aborted. Other
transactions continue their execution. An aborted transaction is rolled back and restarted.
When a transaction waits more than a specific amount of time to obtain a lock (called the deadlock timeout),
Derby can detect whether the transaction is involved in a deadlock.
If deadlocks occur frequently in your multi-user system with a particular application, you might need to do some
debugging.
A deadlock where two transactions are waiting for one another to give up locks.
Deadlock detection and removal schemes
Wait-for-graph
This scheme allows the older transaction to wait but kills the younger one.

Phantom deadlock detection is the condition where the deadlock does not exist but due to a delay in propagating
local information, deadlock detection algorithms identify the locks that have been already acquired.
There are three alternatives for deadlock detection in a distributed system, namely.
Centralized Deadlock Detector − One site is designated as the central deadlock detector.
Hierarchical Deadlock Detector − Some deadlock detectors are arranged in a hierarchy.
Distributed Deadlock Detector − All the sites participate in detecting deadlocks and removing them.
The deadlock detection algorithm uses 3 data structures –
Available
Vector of length m
Indicates the number of available resources of each type.
Allocation
Matrix of size n*m
A[i,j] indicates the number of j the resource type allocated to I the process.
Request
Matrix of size n*m

BY: MUHAMMAD SHARIF 150


Database Systems Handbook

Indicates the request of each process.


Request[i,j] tells the number of instances Pi process is the request of jth resource type.
Deadlock Avoidance
Deadlock avoidance
Acquire locks in a pre-defined order
Acquire all locks at once before starting transactions
Aborting a transaction is not always a practical approach. Instead, deadlock avoidance mechanisms can be used to
detect any deadlock situation in advance.
The deadlock prevention technique avoids the conditions that lead to deadlocking. It requires that every
transaction lock all data items it needs in advance. If any of the items cannot be obtained, none of the items are
locked.
The transaction is then rescheduled for execution. The deadlock prevention technique is used in two-phase
locking.
To prevent any deadlock situation in the system, the DBMS aggressively inspects all the operations, where
transactions are about to execute. If it finds that a deadlock situation might occur, then that transaction is never
allowed to be executed.
Deadlock Prevention Algo
1. Wait-Die scheme
2. Wound wait scheme
Note! Deadlock prevention is more strict than Deadlock Avoidance.
The algorithms are as follows −
Wait-Die − If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is younger than T2, T1 is aborted and later
restarted.
Wait-die: permit older waits for younger
Wound-Wait − If T1 is older than T2, T2 is aborted and later restarted. Otherwise, if T1 is younger than T2, T1 is
allowed to wait.
Wound-wait: permit younger waits for older.
Note: In a bulky system, deadlock prevention techniques may work well.
Here, we want to develop an algorithm to avoid deadlock by making the right choice all the time
Dijkstra's Banker's Algorithm is an approach to trying to give processes as much as possible while guaranteeing no
deadlock.
safe state -- a state is safe if the system can allocate resources to each process in some order and still avoid a
deadlock.

Banker’s Algorithm
Banker's Algorithm for Single Resource Type is a resource allocation and deadlock avoidance algorithm. This name
has been given since it is one of most problems in Banking Systems these days.
In this, as a new process P1 enters, it declares the maximum number of resources it needs.
The system looks at those and checks if allocating those resources to P1 will leave the system in a safe state or not.
If after allocation, it will be in a safe state, the resources are allocated to process P1.
Otherwise, P1 should wait till the other processes release some resources.
This is the basic idea of Banker’s Algorithm.
A state is safe if the system can allocate all resources requested by all processes ( up to their stated maximums )
without entering a deadlock state.
Resource Preemption:

BY: MUHAMMAD SHARIF 151


Database Systems Handbook

To eliminate deadlocks using resource preemption, we preempt some resources from processes and give those
resources to other processes. This method will raise three issues –
(a) Selecting a victim:
We must determine which resources and which processes are to be preempted and also order to minimize the
cost.
(b) Rollback:
We must determine what should be done with the process from which resources are preempted. One simple idea
is total rollback. That means aborting the process and restarting it.
(c) Starvation:
In a system, the same process may be always picked as a victim. As a result, that process will never complete its
designated task. This situation is called Starvation and must be avoided. One solution is that a process must be
picked as a victim only a finite number of times.

Concurrent vs non-concurrent data access

BY: MUHAMMAD SHARIF 152


Database Systems Handbook

Concurrent executions are done for Better transaction throughput, response time
Done via better utilization of resources:

Concurrency Control
What is Concurrency Control?
Concurrent access is quite easy if all users are just reading data. There is no way they can interfere with one
another. Though for any practical Database, it would have a mix of READ and WRITE operations, and hence the
concurrency is a challenge. DBMS Concurrency Control is used to address such conflicts, which mostly occur with a
multi-user system.

Various concurrency control techniques/Methods are:


1. Two-phase locking Protocol
2. Time stamp ordering Protocol
3. Multi-version concurrency control
4. Validation concurrency control
5. Optimistic Methods
Two Phase Locking Protocol is also known as 2PL protocol is a method of concurrency control in DBMS that
ensures serializability by applying a lock to the transaction data which blocks other transactions to access the same
data simultaneously. Two Phase Locking protocol helps to eliminate the concurrency problem in DBMS. Every 2PL
schedule is serializable.

BY: MUHAMMAD SHARIF 153


Database Systems Handbook

Theorem: 2PL ensures/enforce conflict serializability schedule


But does not enforce recoverable schedules
2PL rule: Once a transaction has released a lock it is not allowed to obtain any other locks
This locking protocol divides the execution phase of a transaction into three different parts.
In the first phase, when the transaction begins to execute, it requires permission for the locks it needs.
The second part is where the transaction obtains all the locks. When a transaction releases its first lock, the third
phase starts.
In this third phase, the transaction cannot demand any new locks. Instead, it only releases the acquired locks.

The Two-Phase Locking protocol allows each transaction to make a lock or unlock request Growing Phase and
Shrinking Phase.
2PL has the following two phases:
A growing phase, in which a transaction acquires all the required locks without unlocking any data. Once all locks
have been acquired, the transaction is in its locked
point.
A shrinking phase, in which a transaction releases all locks and cannot obtain any new lock.
In practice:
– Growing phase is the entire transaction
– Shrinking phase is during the commit

BY: MUHAMMAD SHARIF 154


Database Systems Handbook

The 2PL protocol indeed offers serializability. However, it does not ensure that deadlocks do not happen.
In the above-given diagram, you can see that local and global deadlock detectors are searching for deadlocks and
solving them by resuming transactions to their initial states.
Strict Two-Phase Locking Method
Strict-Two phase locking system is almost like 2PL. The only difference is that Strict-2PL never releases a lock after
using it. It holds all the locks until the commit point and releases all the locks at one go when the process is over.
Strict 2PL: All locks held by a transaction are released when the transaction is completed. Strict 2PL guarantees
conflict serializability, but not serializability.
Centralized 2PL
In Centralized 2PL, a single site is responsible for the lock management process. It has only one lock manager for
the entire DBMS.
Primary copy 2PL
Primary copy 2PL mechanism, many lock managers are distributed to different sites. After that, a particular lock
manager is responsible for managing the lock for a set of data items. When the primary copy has been updated,
the change is propagated to the slaves.
Distributed 2PL
In this kind of two-phase locking mechanism, Lock managers are distributed to all sites. They are responsible for
managing locks for data at that site. If no data is replicated, it is equivalent to primary copy 2PL. Communication
costs of Distributed 2PL are quite higher than primary copy 2PL
Time-Stamp Methods for Concurrency control:
The timestamp is a unique identifier created by the DBMS to identify the relative starting time of a transaction.
Typically, timestamp values are assigned in the order in which the transactions are submitted to the system. So, a
timestamp can be thought of as the transaction start time. Therefore, time stamping is a method of concurrency
control in which each transaction is assigned a transaction timestamp.
Timestamps must have two properties namely
Uniqueness: The uniqueness property assures that no equal timestamp values can exist.
Monotonicity: monotonicity assures that timestamp values always increase.
Timestamps are divided into further fields:
Granule Timestamps

BY: MUHAMMAD SHARIF 155


Database Systems Handbook

Timestamp Ordering
Conflict Resolution in Timestamps
Timestamp-based Protocol in DBMS is an algorithm that uses the System Time or Logical Counter as a timestamp
to serialize the execution of concurrent transactions. The Timestamp-based protocol ensures that every conflicting
read and write operation is executed in timestamp order.
The timestamp-based algorithm uses a timestamp to serialize the execution of concurrent transactions. The
protocol uses the System Time or Logical Count as a Timestamp.
Conflict Resolution in Timestamps:
To deal with conflicts in timestamp algorithms, some transactions involved in conflicts are made to wait and abort
others.

Following are the main strategies of conflict resolution in timestamps:


Wait-die:
The older transaction waits for the younger if the younger has accessed the granule first.
The younger transaction is aborted (dies) and restarted if it tries to access a granule after an older concurrent
transaction.
Wound-wait:
The older transaction pre-empts the younger by suspending (wounding) it if the younger transaction tries to access
a granule after an older concurrent transaction.
An older transaction will wait for a younger one to commit if the younger has accessed a granule that both want.
Timestamp Ordering:
Following are the three basic variants of timestamp-based methods of concurrency control:
Total timestamp ordering
Partial timestamp ordering
Multiversion timestamp ordering
Multi-version concurrency control
Multiversion Concurrency Control (MVCC) enables snapshot isolation. Snapshot isolation means that whenever a
transaction would take a read lock on a page, it makes a copy of the page instead, and then performs its
operations on that copied page. This frees other writers from blocking due to read lock held by other transactions.
Maintain multiple versions of objects, each with its timestamp. Allocate the correct version to reads. Multiversion
schemes keep old versions of data items to increase concurrency.
The main difference between MVCC and standard locking:
read locks do not conflict with write locks ⇒ reading never blocks writing, writing blocks reading
Advantage of MVCC
locking needed for serializability considerably reduced
Disadvantages of MVCC
visibility-check overhead (on every tuple read/write)
Validation-Based Protocols
Validation-based Protocol in DBMS also known as Optimistic Concurrency Control Technique is a method to avoid
concurrency in transactions. In this protocol, the local copies of the transaction data are updated rather than the
data itself, which results in less interference while the execution of the transaction.
Optimistic Methods of Concurrency Control:
The optimistic method of concurrency control is based on the assumption that conflicts in database operations are
rare and that it is better to let transactions run to completion and only check for conflicts before they commit.
The Validation based Protocol is performed in the following three phases:
Read Phase
Validation Phase

BY: MUHAMMAD SHARIF 156


Database Systems Handbook

Write Phase
Read Phase
In the Read Phase, the data values from the database can be read by a transaction but the write operation or
updates are only applied to the local data copies, not the actual database.
Validation Phase
In the Validation Phase, the data is checked to ensure that there is no violation of serializability while applying the
transaction updates to the database.
Write Phase
In the Write Phase, the updates are applied to the database if the validation is successful, else; the updates are not
applied, and the transaction is rolled back.
Laws of concurrency control
First Law of Concurrency Control
Concurrent execution should not cause application programs to malfunction.
Second Law of Concurrency Control
Concurrent execution should not have lower throughput or much higher response times than serial execution.
Lock Thrashing is the point where system performance(throughput) decreases with increasing load
(adding more active transactions). It happens due to the contention of locks. Transactions waste time
on lock waits.
The default concurrency control mechanism depends on the table type:
Disk-based tables (D-tables) are by default optimistic.
Main-memory tables (M-tables) are always pessimistic.
Pessimistic locking (Locking and timestamp) is useful if there are a lot of updates and relatively high chances of
users trying to update data at the same time.
Optimistic (Validation)locking is useful if the possibility for conflicts is very low – there are many records but
relatively few users, or very few updates and mostly read-type operations.
Optimistic concurrency control is based on the idea of conflicts and transaction restart while pessimistic
concurrency control uses locking as the basic serialization mechanism (it assumes that two or more users will want
to update the same record at the same time, and then prevents that possibility by locking the record, no matter
how unlikely conflicts are.
Properties
Optimistic locking is useful in stateless environments (such as mod_plsql and the like). Not only useful but critical.
optimistic locking -- you read data out and only update it if it did not change.
Optimistic locking only works when developers modify the same object. The problem occurs when multiple
developers are modifying different objects on the same page at the same time. Modifying one
object may affect the process of the entire page, which other developers may not be aware of.
pessimistic locking -- you lock the data as you read it out AND THEN modify it.
Lock Granularity:
A database is represented as a collection of named data items. The size of the data item chosen as the unit of
protection by a concurrency control program is called granularity. Locking can take place at the following level :
Database level.
Table level(Coarse-grain locking).
Page level.
Row (Tuple) level.
Attributes (fields) level.
Multiple Granularity
Let's start by understanding the meaning of granularity.
Granularity: It is the size of the data item allowed to lock.

BY: MUHAMMAD SHARIF 157


Database Systems Handbook

It can be defined as hierarchically breaking up the database into blocks that can be locked.
The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
It maintains the track of what to lock and how to lock.
It makes it easy to decide either to lock a data item or to unlock a data item. This type of hierarchy can be
graphically represented as a tree.
There are three additional lock modes with multiple granularities:
Intention-shared (IS): It contains explicit locking at a lower level of the tree but only with shared locks.
Intention-Exclusive (IX): It contains explicit locking at a lower level with exclusive or shared locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared mode, and some node is locked in
exclusive mode by the same transaction.
Compatibility Matrix with Intention Lock Modes: The below table describes the compatibility matrix for these lock
modes:

The phantom problem


A database is a collection of static elements like tuples.
If tuples are inserted/deleted then the phantom problem appears
A “phantom” is a tuple that is invisible during part of a transaction execution but not invisible during the entire
execution
Even if they lock individual data items, could result in non-serializable execution

In our example:
– T1: reads the list of products
– T2: inserts a new product
– T1: re-reads: a new product appears!

BY: MUHAMMAD SHARIF 158


Database Systems Handbook

Dealing With Phantoms


Lock the entire table, or
Lock the index entry for ‘blue’
– If the index is available
Or use predicate locks
– A lock on an arbitrary predicate
Dealing with phantoms is expensive!

END

BY: MUHAMMAD SHARIF 159


Database Systems Handbook

CHAPTER 9 RELATIONAL ALGEBRA AND QUERY PROCESSING


Relational algebra is a procedural query language. It gives a step-by-step process to obtain the result of the query.
It uses operators to perform queries.
What is an “Algebra”
Answer: Set of operands and operations that are “closed” under all compositions
What is the basis of Query Languages?
Answer: Two formal Query Languages form the basis of “real” query languages (e.g., SQL) are:
1) Relational Algebra: Operational, it provides a recipe for evaluating the query. Useful for representing execution
plans. A language based on operators and a domain of values. The operator's map values are taken from the domain
into other domain values. Domain: The set of relations/tables.
2) Relational Calculus: Let users describe what they want, rather than how to compute it. (Nonoperational, Non-
Procedural, declarative.)
SQL is an abstraction of relational algebra. It makes using it much easier than writing a bunch of math. Effectively,
the parts of SQL that directly relate to relational algebra are:
SQL -> Relational Algebra
Select columns-> Projection
Select row -> Selection (Where Clause)
INNER JOIN -> Set Union
OUTER JOIN -> Set Difference
JOIN -> Cartesian Product (when you screw up your join statement)

Details Explanation of Relational Operators are the following:


Operation (Symbols) Purpose

Select(σ) The SELECT operation is used for selecting a subset of the tuples according
to a given selection condition (Unary operator)

Projection(π) The projection eliminates all attributes of the input relation but those
mentioned in the projection list. (Unary operator)/ Projection operator has
to eliminate duplicates!

BY: MUHAMMAD SHARIF 160


Database Systems Handbook

Union Operation(∪) UNION is symbolized by the symbol. It includes all tuples that are in tables
A or B.

Set Difference(-) - Symbol denotes it. The result of A - B, is a relation that includes all tuples
that are in A but not in B.

Intersection(∩) Intersection defines a relation consisting of a set of all tuples that are in
both A and B.

Cartesian Product(X) Cartesian operation is helpful to merge columns from two relations.

Inner Join Inner join includes only those tuples that satisfy the matching criteria.

Theta Join(θ) The general case of the JOIN operation is called a Theta join. It is denoted
by the symbol θ.

EQUI Join When a theta join uses only an equivalence condition, it becomes an equi
join.

Natural Join(⋈) Natural join can only be performed if there is a common attribute (column)
between the relations.

Outer Join In an outer join, along with tuples that satisfy the matching criteria.

Left Outer Join( ) In the left outer join, the operation allows keeping all tuples in the left
relation.

Right Outer join( ) In the right outer join, the operation allows keeping all tuples in the right
relation.

Full Outer Join( ) In a full outer join, all tuples from both relations are included in the result
irrespective of the matching condition.

BY: MUHAMMAD SHARIF 161


Database Systems Handbook

BY: MUHAMMAD SHARIF 162


Database Systems Handbook

Select Operation
Notation: ⴋp(r) p is called the selection predicate

Project Operation
Notation: πA1,..., Ak (r)
The result is defined as the relation of k columns obtained by deleting the columns that are not listed

BY: MUHAMMAD SHARIF 163


Database Systems Handbook

Condition join/theta join

BY: MUHAMMAD SHARIF 164


Database Systems Handbook

Union Operation
Notation: r Us

BY: MUHAMMAD SHARIF 165


Database Systems Handbook

What is the composition of operators/operations?


In general, since the result of a relational-algebra operation is of the same type (relation) as its inputs, relational-
algebra operations can be composed together into a relational-algebra expression. Composing relational-algebra
operations into relational-algebra expressions is just like composing arithmetic operations (such as −, ∗, and ÷) into
arithmetic expressions.

BY: MUHAMMAD SHARIF 166


Database Systems Handbook

BY: MUHAMMAD SHARIF 167


Database Systems Handbook

BY: MUHAMMAD SHARIF 168


Database Systems Handbook

Examples of Relational Algebra

BY: MUHAMMAD SHARIF 169


Database Systems Handbook

BY: MUHAMMAD SHARIF 170


Database Systems Handbook

Aggregate operation in relational algebra with SQL comparisons as extended RA.

BY: MUHAMMAD SHARIF 171


Database Systems Handbook

BY: MUHAMMAD SHARIF 172


Database Systems Handbook

BY: MUHAMMAD SHARIF 173


Database Systems Handbook

BY: MUHAMMAD SHARIF 174


Database Systems Handbook

BY: MUHAMMAD SHARIF 175


Database Systems Handbook

BY: MUHAMMAD SHARIF 176


Database Systems Handbook

Relational Calculus
There is an alternate way of formulating queries known as Relational Calculus. Relational calculus is a non-procedural
query language. In the non-procedural query language, the user is concerned with the details of how to obtain the
results. The relational calculus tells what to do but never explains how to do it. Most commercial relational languages
are based on aspects of relational calculus including SQL-QBE and QUEL.
It is based on Predicate calculus, a name derived from a branch of symbolic language. A predicate is a truth-valued
function with arguments.

BY: MUHAMMAD SHARIF 177


Database Systems Handbook

Sr. No. Key Relational Algebra Relational Calculus

Language Relational Algebra is a procedural query Relational Calculus is a non-procedural


1
Type language. or declarative query language.

Relational Algebra targets how to obtain the Relational Calculus targets what result
2 Objective
result. to obtain.

Relational Algebra specifies the order in Relational Calculus specifies no such


3 Order
which operations are to be performed. order of executions for its operations.

Relational Calculus can be domain


4 Dependency Relational Algebra is domain-independent.
dependent.

Programming Relational Algebra is close to programming Relational Calculus is not related to


5
Language language concepts. programming language concepts.

Notations of RC

Types of Relational calculus:


TRC: Variables range over (i.e., get bound to) tuples.
DRC: Variables range over domain elements (= field values
Tuple Relational Calculus (TRC)
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal Quantifiers (∀)
Domain Relational Calculus (DRC)
Domain relational calculus uses the same operators as tuple calculus. It uses logical connectives ∧ (and), ∨ (or), and
┓ (not). It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable. The QBE or Query by example is a
query language related to domain relational calculus.

BY: MUHAMMAD SHARIF 178


Database Systems Handbook

Differences in TRC and DRC

Tuple Relational Calculus (TRC) Domain Relational Calculus (DRC)

In TRS, the variables represent the tuples In DRS, the variables represent the value drawn from the
from specified relations. specified domain.

A tuple is a single element of relation. In A domain is equivalent to column data type and any
database terms, it is a row. constraints on the value of data.

This filtering variable uses a tuple of the


relation. This filtering is done based on the domain of attributes.

A query cannot be expressed using a


membership condition. A query can be expressed using a membership condition.

The QUEL or Query Language is a query


language related to it, The QBE or Query-By-Example is query language related to it.

It reflects traditional pre-relational file


structures. It is more similar to logic as a modeling language.

Notation : Notation :
{T | P (T)} or {T | Condition (T)} { a1, a2, a3, …, an | P (a1, a2, a3, …, an)}

Example : Example :
{T | EMPLOYEE (T) AND T.DEPT_ID = 10} { | < EMPLOYEE > DEPT_ID = 10 }

Examples of RC:

BY: MUHAMMAD SHARIF 179


Database Systems Handbook

Query Block in RA

BY: MUHAMMAD SHARIF 180


Database Systems Handbook

Query tree plan

BY: MUHAMMAD SHARIF 181


Database Systems Handbook

SQL, Relational Algebra, Tuple Calculus, and domain calculus examples: Comparisons
Select Operation
R = (A, B)
Relational Algebra: σB=17 (r)
Tuple Calculus: {t | t ∈ r ∧ B = 17}
Domain Calculus: {<a, b> | <a, b> ∈ r ∧ b = 17}
Project Operation
R = (A, B)

BY: MUHAMMAD SHARIF 182


Database Systems Handbook

Relational Algebra: ΠA(r)


Tuple Calculus: {t | ∃ p ∈ r (t[A] = p[A])}
Domain Calculus: {<a> | ∃ b ( <a, b> ∈ r )}
Combining Operations
R = (A, B)
Relational Algebra: ΠA(σB=17 (r))
Tuple Calculus: {t | ∃ p ∈ r (t[A] = p[A] ∧ p[B] = 17)}
Domain Calculus: {<a> | ∃ b ( <a, b> ∈ r ∧ b = 17)}
Natural Join
R = (A, B, C, D) S = (B, D, E)
Relational Algebra: r ⋈ s
Πr.A,r.B,r.C,r.D,s.E(σr.B=s.B ∧ r.D=s.D (r × s))
Tuple Calculus: {t | ∃ p ∈ r ∃ q ∈ s (t[A] = p[A] ∧ t[B] = p[B] ∧
t[C] = p[C] ∧ t[D] = p[D] ∧ t[E] = q[E] ∧
p[B] = q[B] ∧ p[D] = q[D])}
Domain Calculus: {<a, b, c, d, e> | <a, b, c, d> ∈ r ∧ <b, d, e> ∈ s}

BY: MUHAMMAD SHARIF 183


Database Systems Handbook

BY: MUHAMMAD SHARIF 184


Database Systems Handbook

Query Processing in DBMS


Query Processing is the activity performed in extracting data from the database. In query processing, it takes
various steps for fetching the data from the database. The steps involved are:
Parsing and translation
Optimization
Evaluation

The query processing works in the following way:


Parsing and Translation
As query processing includes certain activities for data retrieval.
select emp_name from Employee where salary>10000;

BY: MUHAMMAD SHARIF 185


Database Systems Handbook

Thus, to make the system understand the user query, it needs to be translated in the form of relational algebra.
We can bring this query in the relational algebra form as:
σsalary>10000 (πsalary (Employee))
πsalary (σsalary>10000 (Employee))
After translating the given query, we can execute each relational algebra operation by using different algorithms.
So, in this way, query processing begins its working.
Query processor

Query processor assists in the execution of database

queries such as retrieval, insertion, update, or removal of data

Key components:

Data Manipulation Language (DML) compiler


Query parser
Query rewriter
Query optimizer
Query executor

Query Processing Workflow

Right from the moment the query is written and submitted by the user, to the point of its execution and the
eventual return of the results, there are several steps involved. These steps are outlined below in the following
diagram.

What Does Parsing a Query Mean?

BY: MUHAMMAD SHARIF 186


Database Systems Handbook

The parsing of a query is performed within the database using the Optimizer component. Taking all of these inputs
into consideration, the Optimizer decides the best possible way to execute the query. This information is stored
within the SGA in the Library Cache – a sub-pool within the Shared Pool.

The memory area within the Library Cache in which the information about a query’s processing is kept is called the
Cursor. Thus, if a reusable cursor is found within the library cache, it’s just a matter of picking it up and using it to
execute the statement. This is called Soft Parsing. If it’s not possible to find a reusable cursor or if the query has
never been executed before, query optimization is required. This is called Hard Parsing.

BY: MUHAMMAD SHARIF 187


Database Systems Handbook

Understanding Hard Parsing

Hard parsing means that either the cursor was not found in the library cache or it was found but was invalidated for
some reason. For whatever reason, Hard Parsing would mean that work needs to be done by the optimizer to ensure
the most optimal execution plan for the query.

Before the process of finding the best plan is started for the query, some tasks are completed. These tasks are
repeatedly executed even if the same query executes in the same session for N number of times:

1. Syntax Check
2. Semantics Check
3. Hashing the query text and generating a hash key-value pair

Various phases of query executation in system


First query go from client process to server process and in PGA SQL area then following
phases start:

BY: MUHAMMAD SHARIF 188


Database Systems Handbook

1 Parsing (Parse query tree, (syntax check, semantic check, shared pool check)
used for soft parse
2 Transformation (Binding)
3 Estimation/query optimization
4 Plan generation, row source generation
5 Query Execution & plan
6 Query result
Index and Table scan in the query execution process

BY: MUHAMMAD SHARIF 189


Database Systems Handbook

BY: MUHAMMAD SHARIF 190


Database Systems Handbook

Query Evaluation

The logic applied to the evaluation of SELECT statements, as described here, does not precisely reflect how the
DBMS Server evaluates your query to determine the most efficient way to return results. However, by applying this
logic to your queries and data, the results of your queries can be anticipated.
1. Evaluate the FROM clause. Combine all the sources specified in the FROM clause to create a Cartesian product (a
table composed of all the rows and columns of the sources). If joins are specified, evaluate each join to obtain its
results table, and combine it with the other sources in the FROM clause. If SELECT DISTINCT is specified, discard
duplicate rows.
2. Apply the WHERE clause. Discard rows in the result table that do not fulfill the restrictions specified in the
WHERE clause.
3. Apply the GROUP BY clause. Group results according to the columns specified in the GROUP BY clause.

BY: MUHAMMAD SHARIF 191


Database Systems Handbook

4. Apply the HAVING clause. Discard rows in the result table that do not fulfill the restrictions specified in the HAVING
clause.
5. Evaluate the SELECT clause. Discard columns that are not specified in the SELECT clause. (In case of SELECT FIRST
n… UNION SELECT …, the first n rows of the result from the union are chosen.)
6. Perform any unions. Combine result tables as specified in the UNION clause. (In case of SELECT FIRST n… UNION
SELECT …, the first n rows of the result from the union are chosen.)
7. Apply for the ORDER BY clause. Sort the result rows as specified.
Query Evaluation Techniques for Large Databases
Steps to process a query: parsing, validation, resolution, optimization, plan compilation, execution.
The architecture of query engines:
Query processing algorithms iterate over members of input sets; algorithms are algebra operators. The physical
algebra is the set of operators, data representations, and associated cost functions that the database execution
engine supports, while the logical algebra is more related to the data model and expressible queries of the data
model (e.g. SQL).
Synchronization and transfer between operators are key. Naïve query plan methods include the creation of
temporary files/buffers, using one process per operator, and using IPC. The practical method is to implement all
operators as a set of procedures (open, next, and close), and have operators schedule each other within a single
process via simple function calls. Each time an operator needs another piece of data ("granule"), it calls its data
input operator's next function to produce one. Operators structured in such a manner are called iterators.
Note: Three SQL relational algebra query plans one pushed, nearly fully pushed
Query plans are algebra expressions and can be represented as trees. Left-deep (every right subtree is a leaf),
right-deep (every left-subtree is a leaf), and bushy (arbitrary) are the three common structures. In a left-deep tree,
each operator draws input from one input and an inner loop integrates over the other input.
Evaluation Plan

BY: MUHAMMAD SHARIF 192


Database Systems Handbook

BY: MUHAMMAD SHARIF 193


Database Systems Handbook

Cost Estimation
The cost estimation of a query evaluation plan is calculated in terms of various resources that include: Number of
disk accesses. Execution time is taken by the CPU to execute a query.
Query Optimization
Summary of steps of processing an SQL query:
Lexical analysis, parsing, validation, Query Optimizer, Query Code Generator, Runtime Database Processor
The term optimization here has the meaning “choose a reasonably efficient strategy” (not necessarily the best
strategy)
Query optimization: choosing a suitable strategy to execute a particular query more efficiently
An SQL query undergoes several stages: lexical analysis (scanning, LEX), parsing (YACC), validation
Scanning: identify SQL tokens
Parser: check the query syntax according to the SQL grammar
Validation: check that all attributes/relation names are valid in the particular database being queried
Then create the query tree or the query graph (these are internal representations of the query)
Main techniques to implement query optimization
• Heuristic rules (to order the execution of operations in a query)
• Computing cost estimates of different execution strategies
Process for heuristics optimization
1. The parser of a high-level query generates an initial
internal representation;

BY: MUHAMMAD SHARIF 194


Database Systems Handbook

2. Apply heuristics rules to optimize the internal


representation.
3. A query execution plan is generated to execute groups of
operations based on the access paths available on the files
involved in the query.

BY: MUHAMMAD SHARIF 195


Database Systems Handbook

Query optimization Example:

Basic algorithms for executing query operations/ query optimization


External sorting is a basic ingredient of relational operators that use sort-merge strategies
Sorting is used implicitly in SQL in many situations:
Order by clause, join a union, intersection, duplicate elimination distinct.
Sorting can be avoided if we have an index (ordered access to the data)
External Sorting: (sorting large files of records that don’t fit entirely in the main memory)
Internal Sorting: (sorting files that fit entirely in the main memory)
Sorting:

BY: MUHAMMAD SHARIF 196


Database Systems Handbook

All sorting in "real" database systems uses merging techniques since very large data sets are expected. Sorting
modules' interfaces should follow the structure of iterators.
Exploit the duality of quicksort and mergesort. Sort proceeds in divide phase and combines phase. One of the two
phases is based on logical keys (indexes), the physically arranges data items (which phase is logical is particular to
an algorithm). Two sub algorithms: one for sorting a run within main memory, another for managing runs on disk
or tape. The degree of fan-in (number of runs merged in a given step) is a key parameter.
External sorting:
The first step is bulk loading the B+ tree index (i.e., sort data entries and records). Useful for eliminating duplicate
copies in a collection of records (Why?)
Sort-merge join algorithm involves sorting.
Hashing:
Hashing should be considered for equality matches, in general.
Hashing-based query processing algos use the in-memory hash table of database objects; if data in the hash table
is bigger than the main memory (common case), then hash table overflow occurs. Three techniques for overflow
handling exist:
Avoidance: input set is partitioned into F files before any in-memory hash table is built. Partitions can be dealt with
independently. Partition sizes must be chosen well, or recursive partitioning will be needed.
Resolution: assume overflow won't occur; if it does, partition dynamically.
Hybrid: like resolution, but when partition, only write one partition to disk, keep the rest in memory.

END

BY: MUHAMMAD SHARIF 197


Database Systems Handbook

CHAPTER 10 FILE STRUCTURES, INDEXING, AND HASHING


Overview: Relative data and information is stored collectively in file formats. A file is a sequence of records stored
in binary format.

File Organization
File Organization defines how file records are mapped onto disk blocks. We have four types of File Organization to
organize file records −

Sorted Files: Best if records must be retrieved in some order, or only a `range’ of records is needed.

Sequential File Organization


Store records in sequential order based on the value of the search key of each record. Each record organized by
index or key process is called a sequential file organization that would be much faster to find records based on the
key.
Hashing File Organization
A hash function is computed on some attribute of each record; the result specifies in which block of the file the
record is placed. Data structures to organize records via trees or hashing on some key Called a hashing file
organization.

Heap File Organization


A record can be placed anywhere in the file where there is space; there is no ordering in the file. Some records are
organized randomly Called a heap file organization.
Every record can be placed anywhere in the table file, wherever there is space for the record Virtually all databases
provide heap file organization
Note: Generally, each relation is stored in a separate file.

Clustered File Organization

BY: MUHAMMAD SHARIF 198


Database Systems Handbook

Clustered file organization is not considered good for large databases. In this mechanism, related records from one
or more relations are kept in the same disk block, that is, the ordering of records is not based on the primary key
or search key.
File Operations
Operations on database files can be broadly classified into two categories −
Update Operations
Retrieval Operations
Update operations change the data values by insertion, deletion, or update. Retrieval operations, on the other
hand, do not alter the data but retrieve them after optional conditional filtering. In both types of operations,
selection plays a significant role. Other than the creation and deletion of a file, there could be several operations,
which can be done on files.
Open − A file can be opened in one of the two modes, read mode or write mode. In read mode, the operating
system does not allow anyone to alter data. In other words, data is read-only. Files opened in reading mode can be
shared among several entities. Write mode allows data modification. Files opened in write mode can be read but
cannot be shared.
Locate − Every file has a file pointer, which tells the current position where the data is to be read or written. This
pointer can be adjusted accordingly. Using the find (seek) operation, it can be moved forward or backward.
Read − By default, when files are opened in reading mode, the file pointer points to the beginning of the file. There
are options where the user can tell the operating system where to locate the file pointer at the time of opening a
file. The very next data to the file pointer is read.
Write − Users can select to open a file in write mode, which enables them to edit its contents. It can be deletion,
insertion, or modification. The file pointer can be located at the time of opening or can be dynamically changed if
the operating system allows it to do so.
Close − This is the most important operation from the operating system’s point of view. When a request to close a
file is generated, the operating system removes all the locks (if in shared mode).

BY: MUHAMMAD SHARIF 199


Database Systems Handbook

BY: MUHAMMAD SHARIF 200


Database Systems Handbook

Indexing
Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes
on which the indexing has been done. Indexing in database systems is like what we see in books.
Indexing is defined based on its indexing attributes.

BY: MUHAMMAD SHARIF 201


Database Systems Handbook

Indexing can be of the following types −


1. Primary Index − Primary index is defined on an ordered data file. The data file is ordered on a key field.
The key field is generally the primary key of the relation.
2. Secondary Index − Secondary index may be generated from a field that is a candidate key and has a
unique value in every record, or a non-key with duplicate values.
3 Clustering index-The clustering index is defined on an ordered data file. The data file is ordered on a non-
key field. In a clustering index, the search key order corresponds to the sequential order of the records in
the data file. If the search key is a candidate key (and therefore unique) it is also called a primary index.

4 Non-Clustering The Non-Clustering indexes are used to quickly find all records whose values in a certain
field satisfy some condition. Non-clustering index (different order of data and index). Non-clustering Index
whose search key specifies an order different from the sequential order of the file. Non-clustering indexes
are also called secondary indexes.

Depending on what we put into the index we have a

Sparse index (index entry for some tuples only)


Dense index (index entry for each tuple)
A clustering index is usually sparse(Clustering indexes can be dense or sparse.)
A non-clustering index must be dense

Ordered Indexing is of two types −


1. Dense Index
2. Sparse Index

BY: MUHAMMAD SHARIF 202


Database Systems Handbook

Dense Index
In a dense index, there is an index record for every search key value in the database. This makes searching faster
but requires more space to store index records themselves. Index records contain a search key value and a pointer
to the actual record on the disk.

Sparse Index
In a sparse index, index records are not created for every search key. An index record here contains a search key
and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach the
actual location of the data. If the data we are looking for is not where we directly reach by following the index,
then the system starts a sequential search until the desired data is found.

BY: MUHAMMAD SHARIF 203


Database Systems Handbook

Multilevel Index
Index records comprise search-key values and data pointers. The multilevel index is stored on the disk along with
the actual database files. As the size of the database grows, so does the size of the indices. There is an immense
need to keep the index records in the main memory to speed up the search operations. If the single-level index is
used, then a large size index cannot be kept in memory which leads to multiple disk accesses.

A multi-level Index helps in breaking down the index into several smaller indices to make the outermost level so
small that it can be saved in a single disk block, which can easily be accommodated anywhere in the main memory.

B+ Tree
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf nodes of a B+ tree
denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same height, thus balanced.

BY: MUHAMMAD SHARIF 204


Database Systems Handbook

Additionally, the leaf nodes are linked using a link list; therefore, a B+ tree can support random access as well as
sequential access.
Structure of B+ Tree
Every leaf node is at an equal distance from the root node. A B+ tree is of the order n where n is fixed for every
B+ tree.

Internal nodes −
Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
At most, an internal node can contain n pointers.
Leaf nodes −
Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
At most, a leaf node can contain n record pointers and n key values.
Every leaf node contains one block pointer P to point to the next leaf node and forms a linked list.

Hash Organization
Hashing uses hash functions with search keys as parameters to generate the address of a data record.
Bucket − A hash file stores data in bucket format. The bucket is considered a unit of storage. A bucket typically
stores one complete disk block, which in turn can store one or more records.

BY: MUHAMMAD SHARIF 205


Database Systems Handbook

Hash Function − A hash function, h, is a mapping function that maps all the set of search keys K to the address
where actual records are placed. It is a function from search keys to bucket addresses.
Types of Hashing Techniques
There are mainly two types of SQL hashing methods/techniques:
1 Static Hashing
2 Dynamic Hashing/Extendible hashing
Static Hashing
In static hashing, when a search-key value is provided, the hash function always computes the same address.
Static hashing is further divided into:
1. Open hashing
2. Close hashing.

Dynamic Hashing or Extendible hashing


Dynamic hashing offers a mechanism in which data buckets are added and removed dynamically and on demand.
In this hashing, the hash function helps you to create a large number of values.
The problem with static hashing is that it does not expand or shrink dynamically as the size of the database grows
or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and
on-demand. Dynamic hashing is also known as extended hashing.
Key terms when dealing with hashing the records:
Bucket Overflow
The condition of bucket-overflow is known as a collision. This is a fatal state for any static hash function. In this
case, overflow chaining can be used.
Overflow Chaining − When buckets are full, a new bucket is allocated for the same hash result and is linked after
the previous one. This mechanism is called Closed Hashing.

Linear Probing − When a hash function generates an address at which data is already stored, the next free bucket
is allocated to it. This mechanism is called Open Hashing.

BY: MUHAMMAD SHARIF 206


Database Systems Handbook

Data bucket – Data buckets are memory locations where the records are stored. It is also known as a Unit of
storage.
Key: A DBMS key is an attribute or set of an attribute that helps you to identify a row(tuple) in a relation(table).
This allows you to find the relationship between two tables.
Hash function: A hash function, is a mapping function that maps all the set of search keys to the address where
actual records are placed.
Linear Probing – Linear probing is a fixed interval between probes. In this method, the next available data block is
used to enter the new record, instead of overwriting the older record.
Quadratic probing– It helps you to determine the new bucket address. It helps you to add Interval between probes
by adding the consecutive output of quadratic polynomial to starting value given by the original computation.
Hash index – It is an address of the data block. A hash function could be a simple mathematical function to even a
complex mathematical function.
Double Hashing –Double hashing is a computer programming method used in hash tables to resolve the issues of a
collision.
Bucket Overflow: The condition of bucket overflow is called a collision. This is a fatal stage for any static to
function.
Hashing function h(r) Mapping from the index’s search key to a bucket in which the (data entry for) record r
belongs.
What is Collision?
Hash collision is a state when the resultant hashes from two or more data in the data set, wrongly map the same
place in the hash table.
How to deal with Hashing Collision?
There is two technique that you can use to avoid a hash collision:
1. Rehashing: This method, invokes a secondary hash function, which is applied continuously until an empty slot is
found, where a record should be placed.
2. Chaining: The chaining method builds a Linked list of items whose key hashes to the same value. This method
requires an extra link field to each table position.

BY: MUHAMMAD SHARIF 207


Database Systems Handbook

Indexing
An index is an on-disk structure associated with a table or view that speeds the retrieval of rows from the table or
view. An index contains keys built from one or more columns in the table or view. Indexes are automatically created
when PRIMARY KEY and UNIQUE constraints are defined on table columns. An index on a file speeds up selections
on the search key fields for the index.
The index is a collection of buckets.
Bucket = primary page plus zero or more overflow pages. Buckets contain data entries.

Types of Indexes
1 Clustered Index
2 Non-Clustered Index
3 Column Store Index
4 Filtered Index
5 Hash-based Index
6 Dense primary index
7 sparse index
8 b or b+ tree index
9 FK index
10 Secondary index
11 File Indexing – B+ Tree
12 Bitmap Indexing
13 Inverted Index
14 Forward Index
15 Function-based index
16 Spatial index
17 Bitmap Join Index
18 Composite index

BY: MUHAMMAD SHARIF 208


Database Systems Handbook

19 Primary key index If the search key contains a primary key, then it is called a primary index.
20 Unique index: Search key contains a candidate key.
21 Multilevel index(A multilevel index considers the index file, which we will now refer to as the first (or
base) level of a multilevel index, as an ordered file with a distinct value for each K(i))
22 Inner index: The main index file for the data
23 Outer index: A sparse index on the index

END

BY: MUHAMMAD SHARIF 209


Database Systems Handbook

CHAPTER 11 DATABASE USERS AND DATABASE SECURITY MANAGEMENT


Overview of User and Schema in Oracle DBMS environment
A schema is a collection of database objects, including logical structures such as tables, views, sequences, stored
procedures, synonyms, indexes, clusters, and database links.
A user owns a schema.
A user and a schema have the same name.

BY: MUHAMMAD SHARIF 210


Database Systems Handbook

DBA basic roles and responsibilities:


Duties of the DBA A Database administrator has some very precisely defined duties which need to be performed by
the DBA very religiously. A short account of these jobs is listed below:
1. Schema definition
2. Granting data access
3. Routine Maintenance
4. Backups Management
5. Monitoring jobs running
6. Installation and integration
7. Configuration and migration
8. Optimization and maintenance
9. administration and Customization
10. Upgradation and backup recovery
11. Database storage reorganization
12. Performance monitoring
13. Tablespace and Monitoring disk storage space
Roles Category
Normally Organization hires DBA in three roles:
1. L1=Junior/fresher dba, having 1–2-year exp.
2. L2=Intermediate dba, having 2+ to 4-year exp.
3. L3=Advanced/Expert dba, having 4+ to 6-year exp.
Component modules of a DBMS and their interactions.

BY: MUHAMMAD SHARIF 211


Database Systems Handbook

Database users:
The Create User command creates a user. It also automatically creates a schema for that user.
The Schema Also Logical Structure to process the data in the Database(Memory Component). It's created
automatically by Oracle when the user is created.
Create Profile
SQL> Create profile clerk limit
sessions_per_user 1
idle_time 30
connect_time 600;
Create User
SQL> Create user dcranney
identified by bedrock
default tablespace users
temporary tablespace temp_ts
profile clerk

BY: MUHAMMAD SHARIF 212


Database Systems Handbook

quota 500k on users1


quota 0 on test_ts
quota unlimited on users;

Roles And Privileges


What Is Role
Roles are grouping of SYSTEM PRIVILEGES AND/OR OBJECT PRIVILEGES. Managing and controlling privileges is much
easier when using roles. You can create roles, grant system and object privilege to the roles and grant roles to the
user.
Example of Roles: CONNECT, RESOURCE & DBA roles are pre-defined roles. These are created by oracle when the
database is created. You can grant these roles when you create a user.
Syntax to check roles we use following command:
SYS> select * from ROLE_SYS_PRIVS where role='CONNECT';
SYS> select * from ROLE_SYS_PRIVS where role = 'DBA';
Note: A DBA role does NOT include startup & shutdown the databases.
Roles are group of privileges under a single name.
Those privileges are assigned to users through ROLES.
When you adding or deleting a privilege from a role, all users and roles that are assigned that role automatically
receive or lose that privilege. Assigning password to role is optional.
Whenever you create a role that is NOT IDENTIFIED or IDENTIFIED EXTERNALLY or BY PASSWORD, then oracle grants
you the role WITH ADMIN OPTION. If you create a role IDENTIFIED GLOBALLY, then the database does NOT grant
you the role. If you omit both NOT IDENTIFIED/IDENTIFIED clause then default goes to NOT IDENTIFIED clause.

CREATE A ROLE
SYS> create role SHARIF IDENTIFIED BY devdb;
GRANTING SYSTEM PRIVILEGES TO A ROLE
SYS> GRANT create table, create view, create synonym, create sequence, create trigger to SHARIF;
Grant succeeded
GRANT A ROLE TO USERS
SYS> grant SHARIF to sony, scott;

BY: MUHAMMAD SHARIF 213


Database Systems Handbook

ACTIVATE A ROLE
SCOTT> set role SHARIF identified by devdb;
TO DISABLING ALL ROLE
SCOTT> set role none;
GRANT A PRIVILEGE
SYS> grant create any table to SHARIF;
REVOKE A PRIVILEGE
SYS> revoke create any table from SHARIF;
SET ALL ROLES ASSIGNED TO scott AS DEFAULT
SYS> alter user scott default role all;
SYS> alter user scott default role SHARIF;

BY: MUHAMMAD SHARIF 214


Database Systems Handbook

Grants and revoke Privileges/Role/Objects to users


Sql> grant insert, update, delete, select on hr. employees to Scott;
Grant succeeded.
Sql> grant insert, update, delete, select on hr.departments to Scott;
Grant succeeded.
Sql> grant flashback on hr. employees to Scott;
Grant succeeded.

BY: MUHAMMAD SHARIF 215


Database Systems Handbook

Sql> grant flashback on hr.departments to Scott;


Grant succeeded.
Sql> grant select any transaction to Scott;
Sql> Grant create any table,alter/select/insert/update/delete/drop any table to dba/sharif;
Grant succeeded.
SHAM> grant all on EMP to SCOTT;
Grant succeeded.
SHAM> grant references on EMP to SCOTT;
Grant succeeded.
Sql> Revoke all suppliers from the public;
SHAM> revoke all on EMP from SCOTT;
SHAM> revoke references on EMP from SCOTT CASCADE CONSTRAINTS;
Grant succeeded.
SHAM> grant select on EMP to PUBLIC;
SYS> grant create session to PUBLIC;
Grant succeeded.
Note: If a privilege has been granted to PUBLIC, all users in the database can use it.
Note: Public acts like a ROLE, sometimes acts like a USER.
Note: NOTE: Is there DROP TABLE PRIVILEGE in oracle? NO. DROP TABLE is NOT a PRIVILEGE.
What is Privilege
Privilege is special right or permission. Privileges are granted to perform operations in a database.
Example of Privilege: CREATE SESSION privilege is used to a user connect to the oracle database.
The syntax for revoking privileges on a table in oracle is:
Revoke privileges on the object from a user;
Privileges can be assigned to a user or a role. Privileges are given to users with GRANT command and taken away
with REVOKE command.
There are two distinct type of privileges.
1. SYSTEM PRIVILEGES (Granted by DBA like ALTER DATABASE, ALTER SESSION, ALTER SYSTEM, CREATE USER)
2. SCHEMA OBJECT PRIVILEGES.
SYSTEM privileges are NOT directly related to any specific object or schema.
Two type of users can GRANT, REVOKE SYSTEM PRIVILEGES to others.
• User who have been granted specific SYSTEM PRIVILEGE WITH ADMIN OPTION.
• User who have been granted GRANT ANY PRIVILEGE.
You can GRANT and REVOKE system privileges to the users and roles.
Powerful system Privileges DBA, SYSDBA, SYSOPER(Roles or Privilleges); SYS, SYSTEM (tablespace or user)

BY: MUHAMMAD SHARIF 216


Database Systems Handbook

OBJECT privileges are directly related to specific object or schema.


• GRANT -> To assign privileges or roles to a user, use GRANT command.
• REVOKE -> To remove privileges or roles from a user, use REVOKE command.
Object privilege is the permission to perform certain action on a specific schema objects, including tables, views,
sequence, procedures, functions, packages.
Some common system and object Privileges

BY: MUHAMMAD SHARIF 217


Database Systems Handbook

• SYSTEM PRIVILEGES can be granted WITH ADMIN OPTION.


• OBJECT PRIVILEGES can be granted WITH GRANT OPTION.

BY: MUHAMMAD SHARIF 218


Database Systems Handbook

Admin And Grant Options


With ADMIN Option (to USER, Role)

BY: MUHAMMAD SHARIF 219


Database Systems Handbook

SYS> select * from dba_sys_privs where grantee in('A','B','C');


GRANTEE PRIVILEGE ADM
------------------------------------------------
C CREATE SESSION YES
Note: By default ADM column in dba-sys_privs is NO. If you revoke a SYSTEM PRIVILEGE from a user, it has NO
IMPACT on GRANTS that user has made.
With GRANT Opetion (to USER, Role)

BY: MUHAMMAD SHARIF 220


Database Systems Handbook

SONY can access user sham.emp table because SELECT PRIVILEGE given to ‘PUBLIC’. So that sham.emp is available
to everyone of the database. SONY has created a view EMP_VIEW based on sham.emp.
Note: If you revoke OBJECT PRIVILEGE from a user, that privilege also revoked to whom it was granted.

BY: MUHAMMAD SHARIF 221


Database Systems Handbook

Note: If you grant RESOURCE role to the user, this privilege overrides all explicit tablespace quotas. The UNLIMITED
TABLESPACE system privilege lets the user allocate as much space in any tablespaces that make up the database.
Database account locks and unlock
Alter user admin identified by admin account lock;
Select u.username from all_users u where u.username like 'info';

END

BY: MUHAMMAD SHARIF 222


Database Systems Handbook

CHAPTER 12 BUSINESS INTELLIGENCE TERMINOLOGIES IN DATABASE SYSTEMS


Overview: Database systems are used for processing day-to-day transactions, such as sending a text or booking a
ticket online. This is also known as online transaction processing (OLTP). Databases are good for storing
information about and quickly looking up specific transactions.
Decision support systems (DSS) are generally defined as the class of warehouse system that deals with solving a
semi-structured problem.
DSS
DSS helps businesses make sense of data so they can undergo more informed management decision-making. It has
three branches DWH, OLAP, and DM. I will discuss this in detail below.
Characteristics of a decision support system
DSS frameworks typically consist of three main components or characteristics:
The model management system: Uses various algorithms in creating, storing, and manipulating data models
The user interface: The front-end program enables end users to interact with the DSS
The knowledge base: A collection or summarization of all information including raw data, documents, and
personal knowledge

What is a data warehouse?


A data warehouse is a collection of multidimensional, organization-wide data, typically used in business decision-
making.
Data warehouse toolkits for building out these large repositories generally use one of two architectures.
Different approaches to building a data warehouse concentrate on the data storage layer:
Inmon’s approach – designing centralized storage first and then creating data marts from the summarized data
warehouse data and metadata.
Type is Normalized.
Focuses on data reorganization using relational database management systems (RDBMS)
Holds simple relational data between a core data repository and data marts, or subject-oriented databases Ad-hoc
SQL queries needed to access data are simple

BY: MUHAMMAD SHARIF 223


Database Systems Handbook

Kimball’s approach – creating data marts first and then developing a data warehouse database incrementally from
independent data marts.
Type is Denormalized.
Focuses on infrastructure functionality using multidimensional database management systems (MDBMS) like star
schema or snowflake schema

Data Warehouse vs. Transactional System


Following are a few differences between Data Warehouse and Operational Database (Transaction System)
A transactional system is designed for known workloads and transactions like updating a user record, searching a
record, etc. However, DW transactions are more complex and present a general form of data.
A transactional system contains the current data of an organization whereas DW normally contains historical data.
The transactional system supports the parallel processing of multiple transactions. Concurrency control and
recovery mechanisms are required to maintain the consistency of the database.
An operational database query allows to read and modify operations (delete and update), while an OLAP query
needs only read-only access to stored data (select statement).
DW involves data cleaning, data integration, and data consolidations.
DW has a three-layer architecture − Data Source Layer, Integration Layer, and Presentation Layer. The following
diagram shows the common architecture of a Data Warehouse system.

BY: MUHAMMAD SHARIF 224


Database Systems Handbook

Types of Data Warehouse System


Following are the types of DW systems −
1. Data Mart
2. Online Analytical Processing (OLAP)
3. Online Transaction Processing (OLTP)
4. Predictive Analysis
Three-Tier Data Warehouse Architecture
Generally, a data warehouse adopts a three-tier architecture. Following are the three tiers of the data warehouse
architecture.
Bottom Tier − The bottom tier of the architecture is the data warehouse database server. It is a relational database
system. We use the back-end tools and utilities to feed data into the bottom tier. These back-end tools and utilities
perform the Extract, Clean, Load, and refresh functions.
Middle Tier − In the middle tier, we have the OLAP Server that can be implemented in either of the following ways.
By Relational OLAP (ROLAP), which is an extended relational database management system. The ROLAP maps the
operations on multidimensional data to standard relational operations.
By Multidimensional OLAP (MOLAP) model, directly implements the multidimensional data and operations.
Top-Tier − This tier is the front-end client layer. This layer holds the query tools and reporting tools, analysis tools,
and data mining tools.
The following diagram depicts the three-tier architecture of the data warehouse −

BY: MUHAMMAD SHARIF 225


Database Systems Handbook

Data Warehouse Models


From the perspective of data warehouse architecture, we have the following data warehouse models −
Virtual Warehouse
1. Data mart
2. Enterprise Warehouse
3. Virtual Warehouse
The view over an operational data warehouse is known as a virtual warehouse. It is easy to build a virtual
warehouse. Building a virtual warehouse requires excess capacity on operational database servers.

BY: MUHAMMAD SHARIF 226


Database Systems Handbook

Building A Data Warehouse From Scratch: A Step-By-Step Plan


Step 1. Goals elicitation
Step 2. Conceptualization and platform selection
Step 3. Business case and project roadmap
Step 4. System analysis and data warehouse architecture design
Step 5. Development and stabilization
Step 6. Launch
Step 7. After-launch support

BY: MUHAMMAD SHARIF 227


Database Systems Handbook

Data Mart
A data mart(s) can be created from an existing data warehouse—the top-down approach—or other sources, such as
internal operational systems or external data. Similar to a data warehouse, it is a relational database that stores
transactional data (time value, numerical order, reference to one or more objects) in columns and rows making it
easy to organize and access.
Data marts and data warehouses are both highly structured repositories where data is stored and managed until it
is needed. Data marts are designed for a specific line of business and DWH is designed for enterprise-wide range
use. The data mart is >100 and DWH is >100 and the Data mart is a single subject but DWH is a multiple subjects
repository. Data marts are independent data marts and dependent data marts.
Data mart contains a subset of organization-wide data. This subset of data is valuable to specific groups of an
organization.

BY: MUHAMMAD SHARIF 228


Database Systems Handbook

Fact and Dimension Tables:


Type of facts Explanation
Additive Measures should be added to all dimensions.
Semi-Additive In this type of fact, measures may be added to some dimensions and not to others.
It stores some basic units of measurement of a business process. Some real-world examples
Non-Additive
include sales, phone calls, and orders.

Definition
Types of Dimensions
Conformed Conformed dimensions are the very fact to which it relates. This dimension is used in more
Dimensions than one-star schema or Datamart.
A dimension may have a reference to another dimension table. These secondary dimensions
Outrigger Dimensions
are called outrigger dimensions. This kind of Dimension should be used carefully.
Shrunken Rollup Shrunken Rollup dimensions are a subdivision of rows and columns of a base dimension.
Dimensions These kinds of dimensions are useful for developing aggregated fact tables.
Dimension-to-
Dimensions may have references to other dimensions. However, these relationships can be
Dimension Table
modeled with outrigger dimensions.
Joins
Role-Playing A single physical dimension helps to reference multiple times in a fact table as each
Dimensions reference links to a logically distinct role for the dimension.
It is a collection of random transactional codes, flags, or text attributes. It may not logically
Junk Dimensions
belong to any specific dimension.
A degenerate dimension is without a corresponding dimension. It is used in the transaction
Degenerate
and collecting snapshot fact tables. This kind of dimension does not have its dimension as it
Dimensions
is derived from the fact table.
Swappable They are used when the same fact table is paired with different versions of the same
Dimensions dimension.
Sequential processes, like web page events, mostly have a separate row in a fact table for
Step Dimensions
every step in a process. It tells where the specific step should be used in the overall session.

BY: MUHAMMAD SHARIF 229


Database Systems Handbook

Extract Transform Load Tool configuration (ETL/ELT)


Successful data migration includes:
Extracting the existing data.
Transforming data so it matches the new formats.
Cleansing the data to address any quality issues.
Validating the data to make sure the move goes as planned.
Loading the data into the new system.
Staging area

ETL Cycle Flow

BY: MUHAMMAD SHARIF 230


Database Systems Handbook

ETL to Data warehouse, OLAP, Business Reporting Tiers

Types of Data Warehouse Extraction Methods


There are two types of data warehouse extraction methods: Logical and Physical extraction methods.
1. Logical Extraction
The logical Extraction method in turn has two methods:
i) Full Extraction

BY: MUHAMMAD SHARIF 231


Database Systems Handbook

For example, exporting a complete table in the form of a flat file.


ii) Incremental Extraction
In incremental extraction, the changes in source data need to be tracked since the last successful extraction.
2. Physical Extraction
Physical extraction has two methods: Online and Offline extraction:
i) Online Extraction
In this process, the extraction process directly connects to the source system and extracts the source data.
ii) Offline Extraction
The data is not extracted directly from the source system but is staged explicitly outside the source system.
Data Capture
Data capture is an advanced extraction process. It enables the extraction of data from documents, converting it
into machine-readable data. This process is used to collect important organizational information when the source
systems are in the form of paper/electronic documents (receipts, emails, contacts, etc.)

OLAP Model and Its types:


Online Analytical Processing (OLAP) is a tool that enables users to perform data analysis from various database
systems simultaneously. Users can use this tool to extract, query, and retrieve data. OLAP enables users to analyze
the collected data from diverse points of view.
There are three main types of OLAP servers as follows:
ROLAP stands for Relational OLAP, an application based on relational DBMSs.
MOLAP stands for Multidimensional OLAP, an application based on multidimensional DBMSs.

BY: MUHAMMAD SHARIF 232


Database Systems Handbook

HOLAP stands for Hybrid OLAP, an application using both relational and multidimensional techniques.
OLAP Architecture has these three components of each type:
Database server.
Rolap/molap/holap server.
Front-end tool.

BY: MUHAMMAD SHARIF 233


Database Systems Handbook

Characteristics of OLAP
In the FASMI characteristics of OLAP methods, the term derived from the first letters of the characteristics are:
Fast
It defines which system is targeted to deliver the most feedback to the client within about five seconds, with the
elementary analysis taking no more than one second and very few taking more than 20 seconds.
Analysis
It defines which method can cope with any business logic and statistical analysis that is relevant for the function and
the user, and keep it easy enough for the target client. Although some preprogramming may be needed we do not
think it acceptable if all application definitions have to allow the user to define new Adhoc calculations as part of the
analysis and to document the data in any desired method, without having to program so we exclude products (like
Oracle Discoverer) that do not allow the user to define new Adhoc calculation as part of the analysis and to document
on the data in any desired product that do not allow adequate end user-oriented calculation flexibility.
Share
It defines which the system tools all the security requirements for understanding and, if multiple write connection
is needed, concurrent update location at an appropriated level, not all functions need the customer to write data
back, but for the increasing number which does, the system should be able to manage multiple updates in a timely,
secure manner.
Multidimensional
This is the basic requirement. OLAP system must provide a multidimensional conceptual view of the data, including
full support for hierarchies, as this is certainly the most logical method to analyze businesses and organizations.

BY: MUHAMMAD SHARIF 234


Database Systems Handbook

OLAP Operations
Since OLAP servers are based on a multidimensional view of data, we will discuss OLAP operations in
multidimensional data.
Here is the list of OLAP operations −
1. Roll-up
2. Drill-down
3. Slice and dice

BY: MUHAMMAD SHARIF 235


Database Systems Handbook

Pivot (rotate)

BY: MUHAMMAD SHARIF 236


Database Systems Handbook

Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −
By climbing up a concept hierarchy for a dimension
By dimension reduction
The following diagram illustrates how roll-up works.

BY: MUHAMMAD SHARIF 237


Database Systems Handbook

Roll-up is performed by climbing up a concept hierarchy for the dimension location.


Initially the concept hierarchy was "street < city < province < country".
On rolling up, the data is aggregated by ascending the location hierarchy from the level of the city to the level of
the country.
The data is grouped into cities rather than countries.
When roll-up is performed, one or more dimensions from the data cube are removed.
Drill-down
Drill-down is the reverse operation of roll-up. It is performed in either of the following ways −
By stepping down a concept hierarchy for a dimension
By introducing a new dimension.
The following diagram illustrates how drill-down works −

BY: MUHAMMAD SHARIF 238


Database Systems Handbook

Drill-down is performed by stepping down a concept hierarchy for the dimension time.
Initially, the concept hierarchy was "day < month < quarter < year."
On drilling down, the time dimension descended from the level of the quarter to the level of the month.
When drill-down is performed, one or more dimensions from the data cube are added.
It navigates the data from less detailed data to highly detailed data.
Slice
The slice operation selects one particular dimension from a given cube and provides a new sub-cube. Consider the
following diagram that shows how a slice works.

Here Slice is performed for the dimension "time" using the criterion time = "Q1".

BY: MUHAMMAD SHARIF 239


Database Systems Handbook

It will form a new sub-cube by selecting one or more dimensions.


Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider the following
diagram that shows the dice operation.

The dice operation on the cube based on the following selection criteria involves three dimensions.
(location = "Toronto" or "Vancouver")
(time = "Q1" or "Q2")
(item =" Mobile" or "Modem")

Pivot
The pivot operation is also known as rotation. It rotates the data axes in view to provide an alternative
presentation of data. Consider the following diagram that shows the pivot operation.

BY: MUHAMMAD SHARIF 240


Database Systems Handbook

Data mart also have Hybrid Data Marts


A hybrid data mart combines data from an existing data warehouse and other operational source systems. It unites
the speed and end-user focus of a top-down approach with the benefits of the enterprise-level integration of the
bottom-up method.
Data mining techniques
There are many techniques used by data mining technology to make sense of your business data. Here are a few of
the most common:
Association rule learning:
Also known as market basket analysis, association rule learning looks for interesting relationships between variables
in a dataset that might not be immediately apparent, such as determining which products are typically purchased
together. This can be incredibly valuable for long-term planning.
Classification: This technique sorts items in a dataset into different target categories or classes based on common
features. This allows the algorithm to neatly categorize even complex data cases.
Clustering:
This approach groups similar data in a cluster. The outliers may be undetected or they will fall outside the clusters.
To help users understand the natural groupings or structure within the data, you can apply the process of partitioning
a dataset into a set of meaningful sub-classes called clusters. This process looks at all the objects in the dataset and
groups them together based on similarity to each other, rather than on predetermined features.
Modeling is what people often think of when they think of data mining. Modeling is the process of taking some data
(usually) and building a model that reflects that data. Usually, the aim is to address a specific problem through
modeling the world in some way and from the model develop a better understanding of the world.
Decision tree: Another method for categorizing data is the decision tree. This method asks a series of cascading
questions to sort items in the dataset into relevant classes.
Regression: This technique is used to predict a range of numeric values, such as sales, temperatures, or stock prices,
based on a particular data set.

BY: MUHAMMAD SHARIF 241


Database Systems Handbook

Here data can be made smooth by fitting it to a regression function. The regression used may be linear (having one
independent variable) or multiple (having multiple independent variables).
Regression is a technique that conforms data values to a function. Linear regression involves finding the “best” line
to fit two attributes (or variables) so that one attribute can be used to predict the other.
Outer detection:
This type of data mining technique refers to the observation of data items in the dataset which do not match an
expected pattern or expected behavior. This technique can be used in a variety of domains, such as intrusion,
detection, fraud or fault detection, etc. Outer detection is also called Outlier Analysis or Outlier mining.
Sequential Patterns:
This data mining technique helps to discover or identify similar patterns or trends in transaction data for a certain
period.
Prediction:
Where the end user can predict the most repeated things.

BY: MUHAMMAD SHARIF 242


Database Systems Handbook

BY: MUHAMMAD SHARIF 243


Database Systems Handbook

Steps/tasks Involved in Data Preprocessing:


1 Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part, data cleaning is done. It involves
handling missing data, noisy data, etc.
Fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies
2 Data Transformation:
This step is taken to transform the data into appropriate forms suitable for the mining process.
3 Data discretization
Part of data reduction but with particular importance especially for numerical data
4 Data Reduction:
Since data mining is a technique that is used to handle a huge amount of data. While working with a huge volume
of data, analysis became harder in such cases. To get rid of this, we use the data reduction technique. It aims to
increase storage efficiency and reduce data storage and analysis costs.
5 Data integration
Integration of multiple databases, data cubes, or files
Method of treating missing data
1 Ignoring and discarding data
2 Fill in the missing value manually
3 Use the global constant to fill the mission values
4 Imputation using mean, median, or mod,
5 Replace missing values using a prediction/ classification model
6 K-Nearest Neighbor (k-NN) approach (The best approach)

Difference between Data steward and Data curator:

BY: MUHAMMAD SHARIF 244


Database Systems Handbook

Information Retrieval (IR) can be defined as a software program that deals with the organization, storage, retrieval,
and evaluation of information from document repositories, particularly textual information.
An Information Retrieval (IR) model selects and ranks the document that is required by the user or the user has
asked for in the form of a query.

Information Retrieval Data Retrieval

The software program deals with the Data retrieval deals with obtaining data from a database
organization, storage, retrieval, and evaluation management system such as ODBMS. It is A process of
of information from document repositories, identifying and retrieving the data from the database, based
particularly textual information. on the query provided by the user or application.

Determines the keywords in the user query and retrieves


Retrieves information about a subject. the data.

BY: MUHAMMAD SHARIF 245


Database Systems Handbook

Information Retrieval Data Retrieval

Small errors are likely to go unnoticed. A single error object means total failure.

Not always well structured and is semantically


ambiguous. Has a well-defined structure and semantics.

Does not provide a solution to the user of the


database system. Provides solutions to the user of the database system.

The results obtained are approximate


matches. The results obtained are exact matches.

Results are ordered by relevance. Results are unordered by relevance.

It is a probabilistic model. It is a deterministic model.

Techniques of Information retrieval:


1. Traditional system
2. Non-traditional system.
There are three types of Information Retrieval (IR) models:
1. Classical IR Model
2. Non-Classical IR Model
3. Alternative IR Model
Let’s understand the classical IR models in further detail:
1. Boolean Model — This model required information to be translated into a Boolean expression and Boolean
queries. The latter is used to determine the information needed to be able to provide the right match when
the Boolean expression is found to be true. It uses Boolean operations AND, OR, NOT to create a
combination of multiple terms based on what the user asks.
2. Vector Space Model — This model takes documents and queries denoted as vectors and retrieves
documents depending on how similar they are. This can result in two types of vectors which are then used
to rank search results either
3. Probability Distribution Model — In this model, the documents are considered as distributions of terms,
and queries are matched based on the similarity of these representations. This is made possible using
entropy or by computing the probable utility of the document.
Probability distribution model types:
• Similarity-based Probability Distribution Model
• Expected-utility-based Probability Distribution Model

BY: MUHAMMAD SHARIF 246


Database Systems Handbook

BY: MUHAMMAD SHARIF 247


Database Systems Handbook

END

BY: MUHAMMAD SHARIF 248


Database Systems Handbook

CHAPTER 13 DBMS INTEGRATION WITH BPMS


Overview: DBMS and BPMS should be used simultaneously they give better performance. BPMS takes or holds
operational data and DBMS holds transactional and log data but BPMS will hold All the transactional data go through
BPMS. BPMS is run at the execution level. BPMS also holds document flow data.
Every member of the DBMS-ECM-BPMS trio can replace any of the two remaining brothers but only in some more
or less nasty way:
DBMS can store documents in text and binary fields yet it’s far from ECM functionality e.g. in navigation, content
search, versioning, access control, and MS Office integration.
DBMS is sometimes used to store process information, usually in a table where each row defines the next step of
document processing. Of course, it’s very primitive if compared with the BPMS process diagram.
BPMS has typed attributes for structured data but clearly, it’s for limited use only - the performance is nothing
compared to that of DBMS.
BPMS lets you attach documents to a process instance yet as well as in the DBMS case, the service will be very
primitive if compared with ECM.
The BPM lifecycle is considered to have five stages: design, model, execute, monitor, optimize, and Process
reengineering.
The difference between BP and BPMS is defined as BPM is a discipline that uses various methods to discover, model,
analyze, measure, improve, and optimize business processes.
BPM is a method, technique, or way of being/doing and BPMS is a collection of technologies to help build software
systems or applications to automate processes.
BPMS is a software tool used to improve an organization’s business processes through the definition, automation,
and analysis of business processes. It also acts as a valuable automation tool for businesses to generate a competitive
advantage through cost reduction, process excellence, and continuous process improvement. As BPM is a discipline
used by organizations to identify, document, and improve their business processes; BPMS is used to enable aspects
of BPM.

Business Process Modeling Notation (BPMN)


BPMS has elements, label, token, activity, case, event process, sequence symbols, etc

BPMN Task
A logical unit of work that is carried out as a single whole

BY: MUHAMMAD SHARIF 249


Database Systems Handbook

Resource
A person or a machine that can perform specific tasks
Activity -the performance of a task by a resource
Case
A sequence of activities performed to achieve some goal, an order, an insurance claim, a car assembly
Work item
The combination of a case and a task that is just to be carried out
Process
Describes how a particular category of cases shall be managed
Control flow construct ->sequence, selection, iteration, parallelisation
BPMN concepts
Events
Things that happen instantaneously (e.g. an invoice
Activities
Units of work that have a duration (e.g. an activity to
Process, events, and activities are logically related
Sequence
The most elementary form of relation is Sequence, which implies that one event or activity A is followed by
another event or activity B.
Start event
Circles used with a thin border
End event
Circles used with a thick border
Label
Give a name or label to each activity and event
Token
Once a process instance has been spawned/born, we use a token to identify the progress (or state) of that
instance.
Gateway
There is a gating mechanism that either allows or disallows the passage of tokens through the gateway
Split gateway
A point where the process flow diverges
Have one incoming sequence flow and multiple outgoing sequence flows (representing the branches that diverge)
Join gateway
A point where the process flow converges
Mutually exclusive
Only one of them can be true every time the XOR split is reached by a token

Exclusive (XOR) split


To model the relation between two or more alternative activities, like in the case of the approval or rejection of a
claim.
Exclusive (XOR) join
To merge two or more alternative branches that may have previously been forked with an XOR-split
Indicated with an empty diamond or empty diamond marked with an “X”
Naming/Label Conventions in BPMN:
The label will begin with a verb followed by a noun.
The noun may be preceded by an adjective

BY: MUHAMMAD SHARIF 250


Database Systems Handbook

The verb may be followed by a complement to explain how the action is being done.
The flow of a process

END

BY: MUHAMMAD SHARIF 251


Database Systems Handbook

CHAPTER 14 RAID STRUCTURE AND MEMORY MANAGEMENT


Redundant Arrays of Independent Disks
RAID, or “Redundant Arrays of Independent Disks” is a technique that makes use of a combination of multiple
disks instead of using a single disk for increased performance, data redundancy, reliability, or both. The term was
coined by David Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987.
Disk Array: Arrangement of several disks that gives abstraction of a single, large disk.
RAID techniques:

Different RAID levels


RAID-0 (Stripping) Blocks are “stripped” across disks.
RAID-1 (Mirroring)
RAID 2 (Striping and Mirroring) Consists of Bit-level Striping. RAID 2 records Error Correction Code (ECC) using
hamming code parity.
RAID 3 (Bit-Interleaved Parity; Striping Unit: One bit. One check disk) Consists of Byte-level Striping. It stripes the
data onto multiple disks.
RAID-4 (Block-Level Stripping with Dedicated Parity) Block-Interleaved Parity; Striping Unit: One disk block. One
check disk.
RAID-5 (Block-Level Stripping with Distributed Parity) Similar to RAID Level 4, but parity blocks are distributed over
all disks

Storage manager Components

BY: MUHAMMAD SHARIF 252


Database Systems Handbook

Hard disk vs. Oracle database storage


1. Physical Storage Media
2. Cache
3. Main memory
4. Flash memory (SSD-solid state disk) (Also known as EEPROM (Electrically Erasable Programmable Read-
Only Memory))
5. Magnetic disk (Hard disks vs. floppy disks)
6. Optical disk (CD-ROM, CD-RW, DVD-RW, and DVD-RAM)
7. Tape storage

BY: MUHAMMAD SHARIF 253


Database Systems Handbook

BY: MUHAMMAD SHARIF 254


Database Systems Handbook

BY: MUHAMMAD SHARIF 255


Database Systems Handbook

Performance measures of hard disks/ Accessing a Disk Page


1. Access time: the time it takes from when a read or write request is
issued to when the data transfer begins. Is composed of:
Time to access (read/write) a disk block:
• Seek time (moving arms to position disk head on track)
• Rotational delay/latency (waiting for the block to rotate under the head)
• Data transfer time/rate (moving data to/from disk surface)
Seek time and rotational delay dominate.
• Seek time varies from about 2 to 15mS
• Rotational delay from 0 to 8.3mS (have 4.2mS)
• The transfer rate is about 3.5mS per 256Kb page
Key to lower I/O cost: reduce seek/rotation delays! Hardware vs. software solutions?
2. Data-transfer rate: the rate at which data can be retrieved from or stored on disk (e.g., 25-100 MB/s)
3. Mean time to failure (MTTF): average time the disk is expected to run continuously without any failure
Block
Block is also a sequence of bits and bytes. Block is made up of sectors. A sector is a physical spot on a formatted disk
that holds information. A block is also called a physical record On hard drives and floppies, each sector can hold 512
bytes of data.
A page is made up of unit blocks or groups of blocks. Pages have fixed sizes, usually 2k or 4k. 4 tuples fit in one block
if the block size is 2 kb and 30 tuples fit on 1 block if the block size is 8kb. A block is made up of either one sector or
many sectors (2, 4, 6...), which means blocks have varied in size. A hard disk plate has many concentric circles on it,
called tracks. Every track is further divided into sectors.
Example: We have 10 000 000 records. Each record is 80 bytes long. Each record contains a unique key A block is a
contiguous sequence of sectors from a single track. Blocks are separated by interblock gaps, which hold control
information created during disk initialization. A block is a contiguous sequence of sectors from a single track. Blocks
are separated by interblock gaps, which hold control information created during disk initialization.
A block is virtual memory that stores tables rows and records logically and a page is a physical memory unit.
Pinned block: Memory block that is not allowed to be written back to disk.

BY: MUHAMMAD SHARIF 256


Database Systems Handbook

Toss immediate strategy: Frees the space occupied by a block as soon as the final tuple of that block has been
processed
Example: We can say if we have an employee table and have email, name, CNIC... Empid = 12 bytes, name = 59
bytes, CNIC = 15 bytes.... so all employee table columns are 230 bytes. Its means each row in the employee table
have of 230 bytes. So its means we can store around 2 rows in one block. For example, say your hard drive has a
block size of 4K, and you have a 4.5K file. This requires 8K to store on your hard drive (2 whole blocks), but only 4.5K
on a floppy (9 floppy-size blocks).
Example:

BY: MUHAMMAD SHARIF 257


Database Systems Handbook

Buffer Manager/Buffer management


Buffer: Portion of main memory available to store copies of disk blocks.
Buffer Manager: Subsystem that is responsible for buffering disk
blocks in main memory.
The overall goal is to minimize the number of disk accesses.
A buffer manager is similar to a virtual memory manager of an operating system.

Architecture: The buffer manager stages pages from external storage to the main memory buffer pool. File and
index layers make calls to the buffer manager.

BY: MUHAMMAD SHARIF 258


Database Systems Handbook

What is the steal approach in DBMS? What are the Buffer Manager Policies/Roles? Data
storage on disk?
Note: Buffer manager moves pages between the main memory buffer pool (volatile memory) from the external
storage disk (in non-volatile storage). When execution starts, the file and index layer make the call to the buffer
manager.
The steal approach is Used when the buffer manager replaces an existing page in the cache, that has been updated
by a transaction not yet committed, by another page requested by another transaction.
No-force. The force rule means that REDO will never be needed during recovery since any committed transaction
will have all its updates on disk before it is committed.
The deferred update ( NO-UNDO ) recovery scheme A no-steal approach. However, typical database systems
employ a steal/no-force strategy. The advantage of steel is that it avoids the need for very large buffer space to
Steal/No-Steal
Similarly, it would be easy to ensure atomicity with a no-steal policy. The no-steal policy states
that pages cannot be evicted from memory (and thus written to disk) until the transaction commits.
Need support for undo: removing the effects of an uncommitted transaction on the disk
Force/No Force
Durability can be a very simple property to ensure if we use a force policy. The force policy states
when a transaction executes, force all modified data pages to disk before the transaction commits.

BY: MUHAMMAD SHARIF 259


Database Systems Handbook

Preferred Policy: Steal/No-Force


This combination is most complicated but allows for the highest flexibility/performance.
STEAL (why enforcing Atomicity is hard, complicates enforcing Atomicity)
NO FORCE (why enforcing Durability is hard, complicates enforcing Durability)
In case of no force Need support for a redo: complete a committed transaction’s writes on disk.
Disk Access and file organization:

BY: MUHAMMAD SHARIF 260


Database Systems Handbook

File: A file is logically a sequence of records, where a record is a sequence of fields; The buffer manager stages
pages from external storage to the main memory buffer pool. File and index layers make calls to the buffer
manager.
The hard disk is also called secondary memory. Which is used to store data permanently. This is non-volatile
File scans can be made fast with read-ahead (track-at-a-crack). Requires contiguous file allocation, so may need to
bypass OS/file system.
Heap file organized table can Search through the entire table file, looking for all rows where the value of
account_id is A-591. This is called a file scan.
Sorted files: records are sorted by search key. Good for equality and range search.
Hashed files: records are grouped into buckets by search key. Good for equality search.
Disks: Can retrieve random page at a fixed cost
Tapes: Can only read pages sequentially
Database tables and indexes may be stored on a disk in one of some forms, including ordered/unordered flat files,
ISAM, heap files, hash buckets, or B+ trees. The most used forms are B-trees and ISAM.
Data on a hard disk is stored in microscopic areas called magnetic domains on the magnetic material. Each domain
stores either 1 or 0 values.
When the computer is switched off, then the head is lifted to a safe zone normally termed a safe parking zone to
prevent the head from scratching against the data zone on a platter when the air bearing subsides. This process is
called parking. The basic difference between the magnetic tape and magnetic disk is that magnetic tape is used for
backups whereas, the magnetic disk is used as secondary storage.

Data storage in the file system:


The database stores data logically in its segments and physically in a disk file at consists of pages or blocks of fixed
size up to 8kb or 16 kb. A disk can read/write a page faster. Each block/page consists of some records. Each record
has its size depending on the data type of column field data. If I insert a new row/record it will come in a block/page
if the existing block/page has space. Otherwise, it assigned a new block within the file.
Dynamic Storage-Allocation Problem/Algorithms
How to satisfy a request of size n from a list of free holes
Memory allocation is a process by which computer programs are assigned memory or space. It is of four types:
First Fit Allocation
The first hole that is big enough is allocated to the program. In this type fit, the partition is allocated, which is the
first sufficient block from the beginning of the main memory.
Best Fit Allocation
The smallest hole that is big enough is allocated to the program. It allocates the process to the partition that is the
first smallest partition among the free partitions.
Worst Fit Allocation
The largest hole that is big enough is allocated to the program. It allocates the process to the partition, which is the
largest sufficient freely available partition in the main memory.
Next Fit allocation: It is mostly similar to the first Fit, but this Fit, searches for the first sufficient partition from the
last allocation point.
Note: First-fit and best-fit better than worst-fit in terms of speed and storage utilization

BY: MUHAMMAD SHARIF 261


Database Systems Handbook

Static and Dynamic Loading:


To load a process into the main memory is done by a loader. There are two different types of loading :
Static loading:- loading the entire program into a fixed address. It requires more memory space.
Dynamic loading:- The entire program and all data of a process must be in physical memory for the process to
execute. So, the size of a process is limited to the size of physical memory.
Methods Involved in Memory Management
There are various methods and with their help Memory Management can be done intelligently by the Operating
System:

➢ Fragmentation
As processes are loaded and removed from memory, the free memory space is broken into little pieces. It happens
after sometimes that processes cannot be allocated to memory blocks considering their small size and memory
blocks remain unused. This problem is known as Fragmentation.
Fragmentation Category −
1. External fragmentation
Total memory space is enough to satisfy a request or to reside a process in it, but it is not contiguous, so it cannot
be used.
2. Internal fragmentation
The memory block assigned to the process is bigger. Some portion of memory is left unused, as it cannot be used
by another process.
Two types of fragmentation are possible
1. Horizontal fragmentation
2. Vertical Fragmentation
Reconstruction of Hybrid Fragmentation
The original relation in hybrid fragmentation is reconstructed by performing union and full outer join.
3. Hybrid fragmentation can be achieved by performing horizontal and vertical partitions together.
4. Mixed fragmentation is a group of rows and columns in relation.

BY: MUHAMMAD SHARIF 262


Database Systems Handbook

Reduce external fragmentation by compaction


● Shuffle memory contents to place all free memory together in
one large block
● Compaction is possible only if relocation is dynamic, and is
done at execution time
● I/O problem
- Latch job in memory while it is involved in I/O
- Do I/O only into OS buffers

➢ Segmentation
Segmentation is a memory management technique in which each job is divided into several segments of different
sizes, one for each module that contains pieces that perform related functions. Each segment is a different logical
address space of the program or A segment is a logical unit.

BY: MUHAMMAD SHARIF 263


Database Systems Handbook

Segmentation with Paging


Both paging and segmentation have their advantages and disadvantages, it is better to combine these two
schemes to improve on each. The combined scheme is known as 'Page the Elements'. Each segment in this scheme
is divided into pages and each segment is maintained in a page table. So the logical address is divided into the
following 3 parts :
Segment numbers(S)
Page number (P)
The displacement or offset number (D)

As shown in the following diagram, the Intel 386 uses segmentation with paging for memory management with a
two-level paging scheme

BY: MUHAMMAD SHARIF 264


Database Systems Handbook

➢ Swapping
Swapping is a mechanism in which a process can be swapped temporarily out of the main memory (or move) to
secondary storage (disk) and make that memory available to other processes. At some later time, the system
swaps back the process from the secondary storage to the main memory.
Though performance is usually affected by the swapping process it helps in running multiple and big processes in
parallel and that's the reason Swapping is also known as a technique for memory compaction.
Note: Bring a page into memory only when it is needed. The same page may be brought into memory several times
➢ Paging
A page is also a unit of data storage. A page is loaded into the processor from the main memory. A page is made up
of unit blocks or groups of blocks. Pages have fixed sizes, usually 2k or 4k. A page is also called a virtual page or
memory page. When the transfer of pages occurs between main memory and secondary memory it is known as
paging.
Paging is a memory management technique in which process address space is broken into blocks of the same size
called pages (size is the power of 2, between 512 bytes and 8192 bytes). The size of the process is measured in the
number of pages.
Divide logical memory into blocks of the same size called pages.
Similarly, main memory is divided into small fixed-sized blocks of (physical) memory called frames and the size of a
frame is kept the same as that of a page to have optimum utilization of the main memory and to avoid external
fragmentation.
Divide physical memory into fixed-sized blocks called frames (size is the power of 2, between 512 bytes and 8192
bytes)
The basic difference between the magnetic tape and magnetic disk is that magnetic tape is used for backups
whereas, the magnetic disk is used as secondary storage.

BY: MUHAMMAD SHARIF 265


Database Systems Handbook

Hard disk stores information in the form of magnetic fields. Data is stored digitally in the form of tiny magnetized
regions on the platter where each region represents a bit.
Microsoft SQL Server databases are stored on disk in two files: a data file and a log file
Note: To run a program of size n pages, need to find n free frames and load the program
Implementation of Page Table
The page table is kept in the main memory
• Page-table base register (PTBR) points to the page table
• Page-table length register (PRLR) indicates the size of the page table
In this scheme, every data/instruction access requires two memory accesses. One for the page table and one for
the data/instruction.
The two memory access problems can be solved by the use of a special fast-lookup hardware cache called
associative memory or translation look-aside buffers (TLBs)

The flow of Tasks in memory


The program must be brought into memory and placed within a process for it to be run.

BY: MUHAMMAD SHARIF 266


Database Systems Handbook

Collection of processes on the disk that are waiting to be brought into memory to run the program.
Binding of Instructions and Data to Memory
Address binding of instructions and data to memory addresses can
happen at three different stages
Compile time: If memory location knew a priori, absolute code can be generated; must recompile code if starting
location changes
Load time: Must generate relocatable code if memory location is not known at compile time
Execution time: Binding delayed until run time if the process can be moved during its execution from one memory
segment to another. Need hardware support for address maps (e.g., base and limit registers). Multistep Processing
of a User Program In memory is as follows:

The concept of a logical address space that is bound to separate physical address space is central to proper
memory management
Logical address – generated by the CPU; also referred to as virtual address
Physical address – address seen by the memory unit
Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical
(virtual) and physical addresses differ in the execution-time address-binding scheme
The user program deals with logical addresses; it never sees the real physical addresses
The logical address space of a process can be noncontiguous; the process is allocated physical memory whenever
the latter is available

BY: MUHAMMAD SHARIF 267


Database Systems Handbook

Address Translation Architecture

END

BY: MUHAMMAD SHARIF 268


Database Systems Handbook

CHAPTER 15 ORACLE DATABASE FUNDAMENTAL AND ITS ADMINISTRATION

Oracle Database History


I will use Oracle tool in this book. Oracle Versions and Its meaning

1. Oracle Database 11g


2. Oracle Database 12c
3. Oracle 18c (new name) = Oracle Database 12c Release 2 12.2.0.2 (Patch Set for 12c Release 2).
4. Oracle 19c (new name) = Oracle Database 12c Release 2 12.2.0.3 (Terminal Patch Set for Release
Oracle releases in oracle history.

BY: MUHAMMAD SHARIF 269


Database Systems Handbook

Oracle Database Releases Features

Oracle DB editions are hierarchically broken down as follows:


Enterprise Edition: Offers all features, including superior performance and security, and is the most robust
Personal Edition: Nearly the same as the Enterprise Edition, except it does not include the Oracle Real Application
Clusters option
Standard Edition: Contains base functionality for users that do not require Enterprise Edition’s robust package
Express Edition (XE): The lightweight, free and limited Windows, and Linux edition
Oracle Lite: For mobile devices

BY: MUHAMMAD SHARIF 270


Database Systems Handbook

A Database Instance is an interface between client applications (users) and the database. An Oracle instance
consists of three main parts: System Global Area (SGA), Program Global Area (PGA), and background processes.
Searches for a server parameter file in a platform-specific default location and, if not found, for a text initialization
parameter file (specifying STARTUP with the SPFILE or PFILE parameters overrides the default behavior) Reads the
parameter file to determine the values of initialization parameters
Allocates the SGA based on the initialization parameter settings. Starts the Oracle background processes. Opens the
alert log and trace files and writes all explicit parameter settings to the alert log in valid parameter syntax

Oracle Database creates server processes to handle the requests of user processes connected to an instance. A
server process can be either of the following: A dedicated server process, which services only one user process. A
shared server
process, which can service multiple user processes.
We can see the listener has the default name of "LISTENER" and is listening for TCP connections on port 1521.

The listener process is started when the server is started (or whenever the instance is started). The listener is only
required for connections from other servers, and the DBA performs the creation of the listener process. When a
new connection comes in over the network, the listener passes the connection to Oracle.

BY: MUHAMMAD SHARIF 271


Database Systems Handbook

Major Database shutting down conditions


Shutdown Normal | Transactional | Immediate | Abort
Database startup conditions:
Startup restrict | Startup mount restrict | Startup force |Startup nomount |Startup mount | Open
Read only modes:
Alter database open read-only
Alter database open;

BY: MUHAMMAD SHARIF 272


Database Systems Handbook

Details of shutting down conditions:


Shutdown /shut/shutdown normal:
1. New connections are not allowed
2. Connected users can perform an ongoing transaction
3. Idle sessions will not be disconnected
4. When connected users log out manually then the database gets shut down.
5. It is also a graceful shutdown, So it doesn’t require ICR in the next startup.
6. A common scn number will be updated to control files and data files before the database shutdown.
Shutdown Transnational:
1. New connections are not allowed
2. Connected users can perform an ongoing transaction
3. Idle sessions will be disconnected
4. The database gets shutdown once ongoing tx’s get completed(commit/rollback)
Hence, It is also a graceful shutdown, So it doesn’t require ICR in the next startup.
Shutdown immediate:
1. New connections are not allowed
2. Connected uses can’t perform an ongoing transaction
3. Idle sessions will be disconnected
4. Oracle performs rollback’s the ongoing Tx’s(uncommitted) and the database gets shutdown.
5. A common scn number will be updated to control files and data files before the database shutdown.
Hence, It is also a graceful shutdown, So it doesn’t require ICR in the next startup.
Shutdown Abort:
1. New connections are not allowed
2. Connected uses can’t perform an ongoing transaction
3. Idle sessions will be disconnected
4. Db gets shutdown abruptly (NO Commit /No Rollback)
Hence, It is an abrupt shutdown, So it requires ICR in the next startup.

BY: MUHAMMAD SHARIF 273


Database Systems Handbook

BY: MUHAMMAD SHARIF 274


Database Systems Handbook

Types of Standby Databases


1. Physical Standby Database
2. Snapshot Standby Database
3. Logical Standby Database

BY: MUHAMMAD SHARIF 275


Database Systems Handbook

Physical Standby Database


A physical standby database is physically identical to the primary database, with on-disk database structures that
are identical to the primary database on a block-for-block basis. The physical standby database is updated by
performing recovery using redo data that is received from the primary database. Oracle Database12c enables a
physical standby database to receive and apply redo while it is open in read-only mode.
Logical Standby Database
A logical standby database contains the same logical information (unless configured to skip certain objects) as the
production database, although the physical organization and structure of the data can be different. The logical
standby database is kept synchronized with the primary database by transforming the data in the redo received from
the primary database into SQL statements and then executing the SQL statements on the standby database. This is
done with the use of LogMiner technology on the redo data received from the primary database. The tables in a
logical standby database can be used simultaneously for recovery and other tasks such as reporting, summations,
and queries.

Snapshot Standby Database


A snapshot standby database is a database that is created by converting a physical standby database into a snapshot
standby database. The snapshot standby database receives redo from the primary database but does not apply the
redo data until it is converted back into a physical standby database. The snapshot standby database can be used
for updates, but those updates are discarded before the snapshot standby database is converted back into a physical
standby database. The snapshot standby database is appropriate when you require a temporary, updatable version
of a physical standby
database.
What is Cloning?
Database Cloning is a procedure that can be used to create an identical copy of the existing Oracle database. DBAs
occasionally need to clone databases to test backup and recovery strategies or export a table that was dropped from

BY: MUHAMMAD SHARIF 276


Database Systems Handbook

the production database and import it back into the production databases. Cloning can be done on a different host
or the same host even if it is different from the standby database.
Database Cloning can be done using the following methods,
Cold Cloning
Hot Cloning
RMAN Cloning

The basic memory structures associated with Oracle Database include:


System global area (SGA)
The SGA is a group of shared memory structures, known as SGA components, that contain data and control
information for one Oracle Database instance. All server and background processes share the SGA. Examples of
data stored in the SGA include cached data blocks and shared SQL areas.
Program global area (PGA)
A PGA is a nonshared memory region that contains data and control information exclusively for use by an Oracle
process. Oracle Database creates the PGA when an Oracle process starts.
One PGA exists for each server process and background process. The collection of individual PGAs is the total
instance PGA or instance PGA. Database initialization parameters set the size of the instance PGA, not individual
PGAs.

User global area (UGA)


The UGA is memory associated with a user session.
Software code areas
Software code areas are portions of memory used to store code that is being run or can be run. Oracle Database
code is stored in a software area that is typically at a different location from user programs—a more exclusive or
protected location.

BY: MUHAMMAD SHARIF 277


Database Systems Handbook

Oracle Database Logical Storage Structure

BY: MUHAMMAD SHARIF 278


Database Systems Handbook

Oracle allocates logical database space for all data in a database. The units of database space allocation are data
blocks, extents, and segments.
The Relationships Among Segments, Extents, Data Blocks in the data file, Oracle block, and OS block:

BY: MUHAMMAD SHARIF 279


Database Systems Handbook

BY: MUHAMMAD SHARIF 280


Database Systems Handbook

Oracle Block: At the finest level of granularity, Oracle stores data in data blocks (also called logical blocks, Oracle
blocks, or pages). One data block corresponds to a specific number of bytes of physical database space on a disk.
Oracle Extent: The next level of logical database space is an extent. An extent is a specific number of contiguous
data blocks allocated for storing a specific type of information. It can be spared over two tablespaces.
Oracle Segment: The level of logical database storage greater than an extent is called a segment. A segment is a set
of extents, each of which has been allocated for a specific data structure and all of which are stored in the same
tablespace. For example, each table's data is stored in its data segment, while each index's data is stored in its index
segment. If the table or index is partitioned, each partition is stored in its segment.

BY: MUHAMMAD SHARIF 281


Database Systems Handbook

Data block: Oracle manages the storage space in the data files of a database in units called data blocks. A data
block is the smallest unit of data used by a database.
Oracle block and data block are equal in data storage by logical and physical respectively like table's (logical) data is
stored in its data segment.
The high water mark is the boundary between used and unused space in a segment.
Operating system block: The data consisting of the data block in the data files are stored in operating system
blocks.
OS Page: The smallest unit of storage that can be atomically written
to non-volatile storage is called a page
Details of Data storage in Oracle Blocks:
An extent is a set of logically contiguous data blocks allocated for storing a specific type of information. In the
Figure above, the 24 KB extent has 12 data blocks, while the 72 KB extent has 36 data blocks.
A segment is a set of extents allocated for a specific database object, such as a table. For example, the data for the
employee's table is stored in its data segment, whereas each index for employees is stored in its index segment.
Every database object that consumes storage consists of a single segment.

BY: MUHAMMAD SHARIF 282


Database Systems Handbook

Oracle Tablespaces and their Definitions

BY: MUHAMMAD SHARIF 283


Database Systems Handbook

A big file tablespace eases database administration because it consists of only one data file. The
a single data file can be up to 128TB (terabytes) in size if the tablespace block size is 32KB; if you
use the more common 8KB block size, 32TB is the maximum size of a big file tablespace.
Broad View of Logical and Physical Structure of Database System in Oracle.

Oracle Database must use logical space management to track and allocate the extents in a tablespace. When a
database object requires an extent, the database must have a method of finding and providing it. Similarly, when
an object no longer requires an extent, the database must have a method of making the free extent available.
Oracle Database manages space within a tablespace based on the type that you create. You can create either of
the following types of tablespaces:
Locally managed tablespaces (default)
The database uses bitmaps in the tablespaces themselves to manage extents. Thus, locally managed tablespaces
have a part of the tablespace set aside for a bitmap. Within a tablespace, the database can manage segments with
automatic segment space management (ASSM) or manual segment space management (MSSM).
Dictionary-managed tablespaces
The database uses the data dictionary to manage the extent

BY: MUHAMMAD SHARIF 284


Database Systems Handbook

Oracle Physical Storage Structure

Oracle Database Memory Management


Memory management involves maintaining optimal sizes for the Oracle instance memory structures as demands
on the database change. Oracle Database manages memory based on the settings of memory-related initialization
parameters.

The basic options for memory management are as follows:


Automatic memory management
You specify the target size for the database instance memory. The instance automatically tunes to the target
memory size, redistributing memory as needed between the SGA and the instance PGA.
Automatically shared memory management
This management model is partially automated. You set a target size for the SGA and then have the option of
setting an aggregate target size for the PGA or managing PGA work areas individually.
Manual memory management
Instead of setting the total memory size, you set many initialization parameters to manage components of the SGA
and instance PGA individually.

BY: MUHAMMAD SHARIF 285


Database Systems Handbook

SGA (System Global Area) is an area of memory (RAM) allocated when an Oracle Instance starts up. The SGA's size
and function are controlled by initialization (INIT.ORA or SPFILE) parameters.
In general, the SGA consists of the following subcomponents, as can be verified by querying the V$SGAINFO:
SELECT FROM v$sgainfo;
The common components are:
Data buffer cache - cache data and index blocks for faster access.
Shared pool - cache parsed SQL and PL/SQL statements.
Dictionary Cache - information about data dictionary objects.
Redo Log Buffer - committed transactions that are not yet written to the redo log files.
JAVA pool - caching parsed Java programs.
Streams pool - cache Oracle Streams objects.
Large pool - used for backups, UGAs, etc.

BY: MUHAMMAD SHARIF 286


Database Systems Handbook

Automatic Shared Memory Management simplifies the configuration of the SGA and is the recommended
memory configuration. To use Automatic Shared Memory Management, set the SGA_TARGET initialization
parameter to a nonzero value and set the STATISTICS_LEVEL initialization parameter to TYPICAL or ALL. The value
of the SGA_TARGET parameter should be set to the amount of memory that you want to dedicate to the SGA. In
response to the workload on the system, the automatic SGA management distributes the memory appropriately
for the following memory pools:
1. Database buffer cache (default pool)
2. Shared pool
3. Large pool
4. Java pool
5. Streams pool

BY: MUHAMMAD SHARIF 287


Database Systems Handbook

END

BY: MUHAMMAD SHARIF 288


Database Systems Handbook

CHAPTER 16 LOGS MANAGEMENT, DATABASE BACKUPS AND RECOVERY


Overview of Backup Solutions in Oracle
There are three ways to perform a data backup in Oracle:
Oracle Recovery Manager (RMAN) (it's done by server session (Restore files, Backup data Files, Recover Data files).
It's also recommended.
It is also called physical backup Types: (cold, hot, full, incremental)
User-managed Backup and Recovery: SQLPlus and OS Commands by starting from the beginning null end;
Exporting and Importing Data: SQL Commands
Oracle Database XE provides the following command-line utilities for exporting and importing data:
i) Data Pump Export and Data Pump Import (These are called Logical backup)
ii) Export and Import (These are called Logical backup)
Hot backup, also known as dynamic or online backup, is a backup performed on data while the database is actively
online and accessible to users.
A user can log in to RMAN and command it to back up a database. RMAN can write backup sets to disk and tape
cold backup (offline backup)
A cold backup also called an offline backup, is a database backup during which the database is offline and not
accessible to users.
Physical backups are copies of physical database files. For example, a physical backup might copy database
content from a local disk drive to another secure location.
During an Oracle tablespace hot backup, you (or your script) puts a tablespace into backup mode, then copy the
data files to disk or tape, then take the tablespace out of backup mode.
A physical backup can be hot or cold
Hot backup—Users can modify the database during a hot backup.
Cold backup—Users cannot modify the database during a cold backup, so the database and the backup copy are
always synchronized. Cold backup is used only when the service level allows for the required system downtime.
Full—Creates a copy of data that can include parts of a database such as the control file, transaction files (redo logs),
tablespaces, archive files, and data files. Regular cold full physical backups are recommended. The database must
be in archive log mode for a full physical backup.
Incremental—Captures only changes made after the last full physical backup. Incremental backup can be done
with a hot backup.
Cold-full backup - A cold-full backup is when the database is shut down, all of the
physical files are backed up, and the database is started up again.
Cold-partial backup - A cold-partial backup is used when a full backup is not possible due to some physical
constraints.
Hot-full backup - A hot-full backup is one in which the database is not taken off-line during the backup process.
Rather, the tablespace and data files are put into a backup state.
Hot-partial backup - A hot-partial backup is one in which the database is not taken
off-line during the backup process, plus different tablespaces are backed up on different nights.

Consistent and Inconsistent Backups


A consistent backup is one in which the files being backed up contain all changes up to the same system change
number (SCN). This means that the files in the backup contain all the data taken from the same point in time.
Unlike an inconsistent backup, a consistent whole database backup does not require recovery after it is restored.
An inconsistent backup is a backup of one or more database files that you make while the database is open or after
the database has shut down abnormally.

BY: MUHAMMAD SHARIF 289


Database Systems Handbook

Note:
A system change number (SCN) is a logical, internal time stamp used by Oracle Database. SCNs order events that
occur within the database, which is necessary to satisfy the ACID properties of a transaction. Oracle Database uses
SCNs to mark the SCN before which all changes are known to be on disk so that recovery avoids applying
unnecessary redo. The database also uses SCNs to mark the point at which no redo exists for a set of data so that
recovery can stop.
SCNs occur in a monotonically increasing sequence. Oracle Database increments SCNs in the system global area
(SGA).
Image Backup/mirror backup
A full image backup, or mirror backup, is a replica of everything on your computer's hard drive, from the operating
system, boot information, apps, and hidden files to your preferences and settings. Imaging software not only
captures individual files but everything you need to get your system running again.
Backup sets are logical entities produced by the RMAN BACKUP command.
Image copies are exact byte-for-byte copies of files. RMAN prefers to use an image copy over a backup set
Restore Database backup by:
Flashback in Oracle is a set of tools that allow System Administrators and users to view and even manipulate the
past state of data without having to recover to a fixed point in time. Using the flashback command, we can pull a
table out of the recycle bin. The Flashback is complete; this way, we restore the table. At the physical level, Oracle
Flashback Database provides a more efficient data protection alternative to database point-in-time recovery
(DBPITR). If the current data files have unwanted changes, then you can use the RMAN command FLASHBACK
DATABASE to revert the data files to their contents at a past time.
Database Exports/Imports Data Pump Export the HR schema to a dump file named schema.DMP by issuing the
following command at the system command prompt:
EXPDP SYSTEM/PASSWORD SCHEMAS=HR DIRECTORY=DMPDIR DUMPFILE=SCHEMA.DMP
LOGFILE=EXPSCHEMA.LOG
IMPDP USER/PASSWORD@DB_NAME DIRECTORY=DATA_PUMP_DIR DUMPFILE=DUMP_NAME.DMP
SCHEMAS=EMR FROMUSER=MIS TOUSER=EMR
Cash recovery and Log-Based Recovery
The log is a sequence of records. The log of each transaction is maintained in some stable storage so that if any
failure occurs, then it can be recovered from there.

BY: MUHAMMAD SHARIF 290


Database Systems Handbook

Log management and its type


Log: An ordered list of REDO/UNDO actions
Log record contains:
<XID, pageID, offset, length, old data, new data> and additional control info.
The fields are:
XID: transaction ID - tells us which transaction did this operation
pageID: what page has been modified
offset: where on the page the data started changing (typically in bytes)
length: how much data was changed (typically in bytes)
old data: what the data was originally (used for undo operations)
new data: what the data has been updated to (used for redo operations)

BY: MUHAMMAD SHARIF 291


Database Systems Handbook

Checkpoint
The checkpoint is like a bookmark. While the execution of the transaction, such checkpoints are marked, and the
transaction is executed then using the steps of the transaction, the log files will be created.
Checkpoint declares a point before which all the logs are stored permanently in the storage disk and are in an
inconsistent state. In the case of crashes, the amount of work and time is saved as the system can restart from the
checkpoint. Checkpointing is a quick way to limit the number of logs to scan on recovery.

BY: MUHAMMAD SHARIF 292


Database Systems Handbook

Store the LSN of the most recent checkpoint at a master record on a disk
System Catalog
A repository of information describing the data in the database (metadata, data about data)

Data Replication
Replication is the process of copying and maintaining database objects in multiple databases that make up a
distributed database system. Replication can improve the performance and protect the availability of applications
because alternate data access options exist.

BY: MUHAMMAD SHARIF 293


Database Systems Handbook

Oracle provides its own set of tools to replicate Oracle and integrate it with other databases. In this post, you will
explore the tools provided by Oracle as well as open-source tools that can be used for Oracle database replication
by implementing custom code.
The catalog is needed to keep track of the location of each fragment & replica
Data replication techniques
Synchronous vs. asynchronous
Synchronous: all replicas are up-to-date
Asynchronous: cheaper but delay in synchronization
Regarding the timing of data transfer, there are two types of data replication:
Asynchronous replication is when the data is sent to the model server -- the server where the replicas take data from
the client. Then, the model server pings the client with a confirmation saying the data has been received. From there,
it goes about copying data to the replicas at an unspecified or monitored pace.
Synchronous replication is when data is copied from the client-server to the model server and then replicated to all
the replica servers before the client is notified that data has been replicated. This takes longer to verify than the
asynchronous method, but it presents the advantage of knowing that all data was copied before proceeding.
Asynchronous database replication offers flexibility and ease of use, as replications happen in the background.

Methods to Setup Oracle Database Replication


You can easily set up the Oracle Database Replication using the following methods:
Method 1: Oracle Database Replication Using Hevo Data
Method 2: Oracle Database Replication Using A Full Backup And Load Approach
Method 3: Oracle Database Replication Using a Trigger-Based Approach
Method 4: Oracle Database Replication Using Oracle Golden Gate CDC
Method 5: Oracle Database Replication Using Custom Script-Based on Binary Log
Oracle Types of data replication and integration in OLAP
Three main architectures:
Consolidation database: All data is moved into a single database and managed from a central location. Oracle Real
Application Clusters (Oracle RAC), Grid computing, and Virtual Private Database (VPD) can help you consolidate
information into a single database that is highly available, scalable, and secure.

BY: MUHAMMAD SHARIF 294


Database Systems Handbook

Federation: Data appears to be integrated into a single virtual database while remaining in its current distributed
locations. Distributed queries, distributed SQL, and Oracle Database Gateway can help you create a federated
database.
Sharing Mediation: Multiple copies of the same information are maintained in multiple databases and application
data stores. Data replication and messaging can help you share information at multiple databases.

===========================END=========================➔

BY: MUHAMMAD SHARIF 295

You might also like