Professional Documents
Culture Documents
Database Systems Handbook
Database Systems Handbook
Database Systems Handbook
==============
Dedication
I dedicate all my efforts to my reader who gives me an urge and inspiration
to work more.
Muhammad Sharif
Author
Database Systems Handbook
Acknowledgments
We are grateful to numerous individuals who contributed
Categories of Data
What is information?
When we organized data that has some meaning, we called information.
What is the database?
A database is an organized collection of related information or collection of related data. It is an interrelated
collection of many different types of database objects (tables, indexes).
2-tier architecture (basic client-server APIs like ODBC, JDBC, and ORDS are used), Client and disk are connected by
APIs called network.
3-tier architecture (Used for web applications, it uses a web server to connect with a database server).
There are 3 types of buses used in uniform While in non-uniform Memory Access, There are
1 Memory Access which are: Single, Multiple 2 types of buses used which are: Tree and
and Crossbar. hierarchical.
Advantages of NUMA
Improves the scalability of the system.
Memory bottleneck (shortage of memory) problem is minimized in this architecture.
NUMA machines provide a linear address space, allowing all processors to directly address all memory.
Distributed Databases
Distributed database system (DDBS) = DB + Communication.
A set of databases in a distributed system that can appear to applications as a single data source.
A distributed DBMS (DDBMS) can have the actual database and DBMS software distributed over many sites,
connected by a computer network.
Distributed DBMS architectures
Three alternative approaches are used to separate functionality across different DBMS-related processes. These
alternative distributed architectures are called
1. Client-server,
2. Collaborating server or multi-Server
3. Middleware or Peer-to-Peer
Client-server: Client can send query to server to execute. There may be multiple server process. The two
different client-server architecture models are:
1. Single Server Multiple Client
2. Multiple Server Multiple Client
Client Server architecture layers
1. Presentation layer
2. Logic layer
3. Data layer
Presentation layer
The basic work of this layer provides a user interface. The interface is a graphical user interface. The graphical user
interface is an interface that consists of menus, buttons, icons, etc. The presentation tier presents information
related to such work as browsing, sales purchasing, and shopping cart contents. It attaches with other tiers by
computing results to the browser/client tier and all other tiers in the network. Its other name is external layer.
Logic layer
The logical tier is also known as the data access tier and middle tier. It lies between the presentation tier and the
data tier. it controls the application’s functions by performing processing. The components that build this layer exist
on the server and assist the resource sharing these components also define the business rules like different
government legal rules, data rules, and different business algorithms which are designed to keep data structure
consistent. This is also known as conceptual layer.
Data layer
The 3-Data layer is the physical database tier where data is stored or manipulated. It is internal layer of database
management system where data stored.
Collaborative/Multi server: This is an integrated database system formed by a collection of two or more
autonomous database systems. Multi-DBMS can be expressed through six levels of schema:
1. Multi-database View Level − Depicts multiple user views comprising subsets of the integrated distributed
database.
2. Multi-database Conceptual Level − Depicts integrated multi-database that comprises global logical multi-
database structure definitions.
3. Multi-database Internal Level − Depicts the data distribution across different sites and multi-database to
local data mapping.
4. Local database View Level − Depicts a public view of local data.
5. Local database Conceptual Level − Depicts local data organization at each site.
6. Local database Internal Level − Depicts physical data organization at each site.
Autonomous databases
1. Autonomous Transaction Processing - Serverless
2. Autonomous Transaction Processing – Dedicated
Serverless is a simple and elastic deployment choice. Oracle autonomously operates all aspects of the database
lifecycle from database placement to backup and updates.
Dedicated is a private cloud in public cloud deployment choice. A completely dedicated compute, storage, network,
and database service for only a single tenant.
Note: The Semi Join and Bloom Join are two techniques/data fetching method in distributed databases.
Some Popular databases
Native XML Databases
We were not surprised that the number of start-up companies as well as some established data management
companies determined that XML data would be best managed by a DBMS that was designed specifically to deal with
semi-structured data — that is, a native XML database.
Conceptual Database
This step is related to the modeling in the Entity-Relationship (E/R) Model to specify sets of data called entities,
relations among them called relationships and cardinality restrictions identified by letters N and M, in this case, the
many-many relationships stand out.
Conventional Database
This step includes Relational Modeling where a mapping from MER to relations using rules of mapping is carried
out. The posterior implementation is done in Structured Query Language (SQL).
Non-Conventional database
This step involves Object-Relational Modeling which is done by the specification in Structured Query Language. In
this case, the modeling is related to the objects and their relationships with the Relational Model.
Traditional database
Temporal database
Conventional Databases
NewSQL Database
Autonomous database
Cloud database
Spatiotemporal
Enterprise Database Management System
Other popular non-relational databases
Google Cloud Firestore
Cassandra
Couchbase
Memcached, Redis, Coherence (key-value store)
HBase, Big Table, Accumulo (Tabular)
Amazon DynamoDB
MongoDB, CouchDB, Cloudant, JSON-like (Document-based)
Neo4j (Graph Database)
Non-relational (NoSQL) Data model
BASE Model:
Basically Available – Rather than enforcing immediate consistency, BASE-modelled NoSQL databases will ensure the
availability of data by spreading and replicating it across the nodes of the database cluster.
Soft State – Due to the lack of immediate consistency, data values may change over time. The BASE model breaks
off with the concept of a database that enforces its consistency, delegating that responsibility to developers.
Eventually Consistent – The fact that BASE does not enforce immediate consistency does not mean that it never
achieves it. However, until it does, data reads are still possible (even though they might not reflect the reality).
Just as SQL databases are almost uniformly ACID compliant, NoSQL databases tend to conform to BASE principles.
NewSQL Database
NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems
for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database
system.
Examples and properties of Relational Non-Relational Database:
The term NewSQL categorizes databases that are the combination of relational models with the advancement in
scalability, and flexibility with types of data. These databases focus on the features which are not present in NoSQL,
which offers a strong consistency guarantee. This covers two layers of data one relational one and a key-value store.
Use Cases: Big Data, Social Network Use Cases: E-Commerce, Telecom industry, and
8. Applications, and IoT. Gaming.
1 byte 0 to 255
Approximately –922,337,203,685,477.5808 to
8 bytes
922,337,203,685,477.5807
Variable length:
Variable length: <= about 2 billion (65,400 for Win 3.1)
10 bytes + string length; Fixed
Fixed length: up to 65,400
length: string length
16 bytes for numbers Variable-length binary data with a maximum length of 2^31 – 1
22 bytes + string length. (2,147,483,647) bytes
BINARY_FLOAT 32-bit floating point number. This data type requires 4 bytes.
BINARY_DOUBLE 64-bit floating point number. This data type requires 8 bytes.
If max_string_size = extended
Number having precision p and scale s. The precision p can range from 1
32767 bytes or characters
to 38. The scale s can range from -84 to 127. Both precision and scale
---------------------
are in decimal digits. A number value requires from 1 to 22 bytes.
If max_string_size = standard
Number(p,s) data type 4000
bytes or characters
The character data types represent alphanumeric text. PL/SQL uses the
SQL character data types such as CHAR, VARCHAR2, LONG, RAW, LONG
Character data types RAW, ROWID, and UROWID.
CHAR(n) is a fixed-length character type whose length is from 1 to
32,767 bytes.
VARCHAR2(n) is varying length character data from 1 to 32,767 bytes.
Database Key A key is a field of a table that identifies the tuple in that table.
Super key
An attribute or a set of attributes that uniquely identifies a tuple within a relation.
Candidate key
A super key such that no proper subset is a super key within the relation. Contains no unique subset (irreducibility).
Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key. PRIMARY KEY
(sid), UNIQUE (id, grade)) A candidate can be unique but its value can be changed.
Composite Key
The composite key consists of more than one attribute. COMPOSITE KEY is a combination of two or more columns
that uniquely identify rows in a table. The combination of columns guarantees uniqueness, though individually
uniqueness is not guaranteed. Hence, they are combined to uniquely identify records in a table. You can you
composite key as PK but the Composite key will go to other tables as a foreign key.
Alternate key
A relation can have only one primary key. It may contain many fields or a combination of fields that can be used as
the primary key. One field or combination of fields is used as the primary key. The fields or combinations of fields
that are not used as primary keys are known as candidate keys or alternate keys.
Sort Or control key
A field or combination of fields that are used to physically sequence the stored data is called a sort key. It is also
known s the control key.
Alternate key
An alternate key is a secondary key it can be simple to understand an example:
Let's take an example of a student it can contain NAME, ROLL NO., ID, and CLASS.
Unique key
A unique key is a set of one or more than one field/column of a table that uniquely identifies a record in a database
table.
You can say that it is a little like a primary key but it can accept only one null value and it cannot have duplicate
values.
The unique key and primary key both provide a guarantee for uniqueness for a column or a set of columns.
There is an automatically defined unique key constraint within a primary key constraint.
There may be many unique key constraints for one table, but only one PRIMARY KEY constraint for one table.
Artificial Key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created when a primary
key is large and complex and has no relationship with many other relations. The data values of the artificial keys are
usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in employee relations.
So it would be better to add a new virtual attribute to identify each tuple in the relation uniquely. Rownum and
rowid are artificial keys. It should be a number or integer, numeric.
Format of Rowid of :
Surrogate key
SURROGATE KEYS is An artificial key that aims to uniquely identify each record and is called a surrogate key. This
kind of partial key in DBMS is unique because it is created when you don’t have any natural primary key. You can't
insert values of the surrogate key. Its value comes from the system automatically.
No business logic in key so no changes based on business requirements
Surrogate keys reduce the complexity of the composite key.
Surrogate keys integrate the extract, transform, and load in DBs.
Compound Key
COMPOUND KEY has two or more attributes that allow you to uniquely recognize a specific record. It is possible that
each column may not be unique by itself within the database.
Operators
The EXISTS function/operator is used to test for the existence of any record in a subquery. The EXISTS operator
returns TRUE if the subquery returns one or more records.
The IN operator allows you to specify multiple values in a WHERE clause. The IN operator is a shorthand for multiple
OR conditions.
Subquery Concept
END
Data abstraction Process of hiding (suppressing) unnecessary details so that the high-level concept can be made
more visible. A data model is a relatively simple representation, usually graphical, of more complex real-world data
structures.
Types of Instances
Initial Database Instance: Refers to the database instance that is initially loaded into the system.
Valid Database Instance: An instance that satisfies the structure and constraints of the database.
The database instance changes every time the database is updated.
An instance is also called an extension
A schema contains schema objects like table, foreign key, primary key, views, columns, data types, stored procedure,
etc.
A database schema can be represented by using a visual diagram. That diagram shows the database objects and
their relationship with each other.
A database schema is designed by the database designers to help programmers whose software will interact with
the database. The process of database creation is called data modeling.
Data independence
IT is the ability to make changes in either the logical or physical structure of the database without requiring
reprogramming of application programs.
Data Independence types:
Logical data independence=>Immunity of external schemas to changes in the conceptual schema
Physical data independence=>Immunity of the conceptual schema to changes in the internal schema.
Hence, the application programs need not be changed since they refer to the external schemas.
For example, the internal schema may be changed when certain file structures are reorganized or new indexes are
created to improve database performance.
Data abstraction
Data abstraction makes complex systems more user-friendly by removing the specifics of the system mechanics.
The conceptual data model has been most successful as a tool for communication between the designer and the
end user during the requirements analysis and logical design phases. Its success is because the model, using either
ER or UML, is easy to understand and convenient to represent. Another reason for its effectiveness is that it is a top-
down approach using the concept of abstraction. In addition, abstraction techniques such as generalization provide
useful tools for integrating
end user views to define a global conceptual schema.
These differences show up in conceptual data models as different levels of abstraction; connectivity of relationships
(one-to-many, many-to-many, and so on); or as the same concept being modeled as an entity, attribute, or
relationship, depending on the user’s perspective.
Techniques used for view integration include abstraction, such as generalization and aggregation to create new
supertypes or subtypes, or even the introduction of new relationships. The higher-level abstraction, the entity
cluster, must maintain the same relationships between entities inside and outside the entity cluster as those that
occur between the same entities in the lower-level diagram.
ERD, EER terminology is not only used in conceptual data modeling but also in artificial intelligence literature when
discussing knowledge representation (KR).
The goal of KR techniques is to develop concepts for accurately modeling some domain of knowledge by creating an
ontology.
Ontology is the fundamental part of Semantic Web. The goal of World Wide Web Consortium (W3C) is to bring the
web into (its full potential) a semantic web with reusing previous systems and artifacts. Most legacy systems have
been documented in structural analysis and structured design (SASD), especially in simple or Extended ER Diagram
(ERD). Such systems need up-gradation to become the part of semantic web. In this paper, we present ERD to OWL-
DL ontology transformation rules at concrete level. These rules facilitate an easy and understandable transformation
from ERD to OWL. Ontology engineering is an important aspect of semantic web vision to attain the meaningful
representation of data. Although various techniques exist for the creation of ontology, most of the methods involve
the number of complex phases, scenario-dependent ontology development, and poor validation of ontology. This
research work presents a lightweight approach to build domain ontology using Entity Relationship (ER) model.
We now discuss four abstraction concepts that are used in semantic data models, such as the EER model as well as
in KR schemes: (1) classification and instantiation, (2) identification, (3) specialization and generalization, and (4)
aggregation and association.
One ongoing project that is attempting to allow information exchange among computers on the Web is called the
Semantic Web, which attempts to create knowledge representation models that are quite general in order to allow
meaningful information exchange and search among machines.
One commonly used definition of ontology is a specification of a conceptualization. In this definition, a
conceptualization is the set of concepts that are used to represent the part of reality or knowledge that is of interest
to a community of users.
Types of Abstractions
Classification: A is a member of class B
Aggregation: B, C, D Are Aggregated Into A, A Is Made Of/Composed Of B, C, D, Is-Made-Of, Is-
Associated-With, Is-Part-Of, Is-Component-Of. Aggregation is an abstraction through which relationships are
treated as higher-level entities.
Generalization: B,C,D can be generalized into a, b is-a/is-an a, is-as-like, is-kind-of.
Category or Union: A category represents a single superclass or subclass relationship with more than one
superclass.
Specialization: A can be specialized into B, C, DB, C, or D (special cases of A) Has-a, Has-A, Has An, Has-An
approach is used in the specialization
Composition: IS-MADE-OF (like aggregation)
Identification: IS-IDENTIFIED-BY
UML Diagrams Notations
UML stands for Unified Modeling Language. ERD stands for Entity Relationship Diagram. UML is a popular and
standardized modeling language that is primarily used for object-oriented software. Entity-Relationship diagrams
are used in structured analysis and conceptual modeling.
Object-oriented data models are typically depicted using Unified Modeling Language (UML) class diagrams. Unified
Modeling Language (UML) is a language based on OO concepts that describes a set of diagrams and symbols that
can be used to graphically model a system. UML class diagrams are used to represent data and their relationships
within the larger UML object-oriented system’s modeling language.
Associations
UML uses Boolean attributes instead of unary relationships but allows relationships of all other entities. Optionally,
each association may be given at most one name. Association names normally start with a capital letter. Binary
associations are depicted as lines between classes. Association lines may include elbows to assist with layout or
when needed (e.g., for ring relationships).
Types of Attributes-
In ER diagram, attributes associated with an entity set may be of the following types-
1. Simple attributes/atomic attributes/Static attributes
2. Key attribute
3. Unique attributes
4. Stored attributes
5. Prime attributes
6. Derived attributes (DOB, AGE)
7. Composite attribute (Address (street, door#, city, town, country))
8. The multivalued attribute (double ellipse (Phone#, Hobby, Degrees))
9. Dynamic Attributes
10. Boolean attributes
The fundamental new idea in the MOST model is the so-called dynamic attributes. Each attribute of an object class
is classified to be either static or dynamic. A static attribute is as usual. A dynamic attribute changes its value with
time automatically.
Attributes of the database tables which are candidate keys of the database tables are called prime attributes.
Symbols of Attributes:
The Entity
The entity is the basic building block of the E-R data model. The term entity is used in three different meanings or
for three different terms and are:
Entity type
Entity instance
Entity set
Context diagrams are the most basic data flow diagrams. They provide a broad view that is easily digestible but
offers little detail. They always consist of a single process and describe a single system. The only process displayed
in the CDFDs is the process/system being analyzed. The name of the CDFDs is generally a Noun Phrase.
Physical DFD:
Physical data flow diagram shows how the data flow is actually implemented in the system. Physical DFD is more
specific and closer to implementation.
N-ary
N-ary (n entities involved in the relationship)
An N-ary relationship exists when there are n types of entities. There is one limitation of the N-ary any entities so it
is very hard to convert into an entity, a rational table.
A relationship between more than two entities is called an n-ary relationship.
Examples of relationships R between two entities E and F
Normalize the ERD and remove FD from Entities to enter the final steps
Transformation Rule 1. Each entity in an ER diagram is mapped to a single table in a relational database;
Transformation Rule 2. A key attribute of the entity type is represented by the primary key.
All single-valued attribute becomes a column for the table
Transformation Rule 3. Given an entity E with primary identify, a multivalued attributed attached to E in an ER
diagram is mapped to a table of its own;
Generalization
Reverse processes of defining subclasses (bottom-up approach). Bring together common attributes in entities (ISA,
IS-A, IS AN, IS-AN)
Union
Models a class/subclass with more than one superclass of distinct entity types. Attribute inheritance is selective.
Cardinality
It expresses some entity occurrences associated with one occurrence of the related entity=>The specific.
The cardinality of a relationship is the number of instances of entity B that can be associated with entity A. There is
a minimum cardinality and a maximum cardinality for each relationship, with an unspecified maximum cardinality
being shown as N. Cardinality limits are usually derived from the organization's policies or external constraints.
For Example:
At the University, each Teacher can teach an unspecified maximum number of subjects as long as his/her weekly
hours do not exceed 24 (this is an external constraint set by an industrial award). Teachers may teach 0 subjects if
they are involved in non-teaching projects. Therefore, the cardinality limits for TEACHER are (O, N).
The University's policies state that each Subject is taught by only one teacher, but it is possible to have Subjects
that have not yet been assigned a teacher. Therefore, the cardinality limits for SUBJECT are (0,1). Teacher and
subject have M: N relationship connectivity. And they are binary (two) ternary too if we break this relationship.
Such situations are modeled using a composite entity (or gerund)
Cardinality Constraint: Quantification of the relationship between two concepts or classes (a constraint on
aggregation)
Remember cardinality is always a relationship to another thing.
Max Cardinality(Cardinality) Always 1 or Many. Class A has a relationship to Package B with a cardinality of one,
which means at most there can be one occurrence of this class in the package. The opposite could be a Package
that has a Max Cardinality of N, which would mean there can be N number of classes
Min Cardinality(Optionality) Simply means "required." Its always 0 or 1. 0 would mean 0 or more, 1 or more
The three types of cardinality you can define for a relationship are as follows:
Minimum Cardinality. Governs whether or not selecting items from this relationship is optional or required. If you
set the minimum cardinality to 0, selecting items is optional. If you set the minimum cardinality to greater than 0,
the user must select that number of items from the relationship.
Optional to Mandatory, Optional to Optional, Mandatory to Optional, Mandatory to Mandatory
Summary Of ER Diagram Symbols
Maximum Cardinality. Sets the maximum number of items that the user can select from a relationship. If you set the
minimum cardinality to greater than 0, you must set the maximum cardinality to a number at least as large If you do
not enter a maximum cardinality, the default is 999.
Type of Max Cardinality: 1 to 1, 1 to many, many to many, many to 1
Default Cardinality. Specifies what quantity of the default product is automatically added to the initial solution that
the user sees. Default cardinality must be equal to or greater than the minimum cardinality and must be less than
or equal to the maximum cardinality.
Replaces cardinality ratio numerals and single/double line notation
Associate a pair of integer numbers (min, max) with each participant of an entity type E in a relationship type R,
where 0 ≤ min ≤ max and max ≥ 1 max=N => finite, but unbounded
Relationship types can also have attributes
Attributes of 1:1 or 1:N relationship types can be migrated to one of the participating entity types
For a 1:N relationship type, the relationship attribute can be migrated only to the entity type on the N-side of the
relationship
Attributes on M: N relationship types must be specified as relationship attributes
In the case of Data Modelling, Cardinality defines the number of attributes in one entity set, which can be associated
with the number of attributes of other sets via a relationship set. In simple words, it refers to the relationship one
table can have with the other table. They can be One-to-one, One-to-many, Many-to-one, or Many-to-many. And
third may be the number of tuples in a relation.
In the case of SQL, Cardinality refers to a number. It gives the number of unique values that appear in the table for
a particular column. For eg: you have a table called Person with the column Gender. Gender column can have values
either 'Male' or 'Female''.
cardinality is the number of tuples in a relation (number of rows).
The Multiplicity of an association indicates how many objects the opposing class of an object can be instantiated.
When this number is variable then the.
Multiplicity Cardinality + Participation dictionary definition of cardinality is the number of elements in a particular
set or other.
Multiplicity can be set for attribute operations and associations in a UML class diagram (Equivalent to ERD) and
associations in a use case diagram.
A cardinality is how many elements are in a set. Thus, a multiplicity tells you the minimum and maximum allowed
members of the set. They are not synonymous.
1 or more
1 and only 1 (exactly 1)
Multiplicity = Cardinality + Participation
Cardinality: Denotes the maximum number of possible relationship occurrences in which a certain entity can
participate (in simple terms: at most).
Note: Connectivity and Modality/ multiplicity/ Cardinality and Relationship are same terms.
Participation: Denotes if all or only some entity occurrences participate in a relationship (in simple terms: at least).
BASIS FOR
CARDINALITY MODALITY
COMPARISON
Generalization is like a bottom-up approach in which two or more entities of lower levels combine to form a
higher level entity if they have some attributes in common.
Generalization is more like a subclass and superclass system, but the only difference is the approach.
Generalization uses the bottom-up approach. Like subclasses are combined to make a superclass. IS-A, ISA, IS A, IS
AN, IS-AN Approach is used in generalization
Generalization is the result of taking the union of two or more (lower level) entity types to produce a higher level
entity type.
Generalization is the same as UNION. Specialization is the same as ISA.
A specialization is a top-down approach, and it is the opposite of Generalization. In specialization, one higher-level
entity can be broken down into two lower-level entities. Specialization is the result of taking a subset of a higher-
level entity type to form a lower-level entity type.
Normally, the superclass is defined first, the subclass and its related attributes are defined next, and the
relationship set is then added. HASA, HAS-A, HAS AN, HAS-AN.
UML to EER specialization or generalization comes in the form of hierarchical entity set:
Mapping Process
1. Create tables for all higher-level entities.
2. Create tables for lower-level entities.
3. Add primary keys of higher-level entities in the table of lower-level entities.
4. In lower-level tables, add all other attributes of lower-level entities.
5. Declare the primary key of the higher-level table and the primary key of the lower-level table.
6. Declare foreign key constraints.
This section presents the concept of entity clustering, which abstracts the ER schema to such a degree that the
entire schema can appear on a single sheet of paper or a single computer screen.
END
The schema-based constraints or explicit include domain constraints, key constraints, constraints on NULLs, entity
integrity constraints, and referential integrity constraints.
Insertions Constraints are also called explicit
Insert can violate any of the four types of constraints discussed in the previous section. Domain constraints can be
violated if an attribute value is given that does not appear in the corresponding domain or is not of the appropriate
data type. Key constraints can be violated if a key value in the new tuple already exists in another tuple in the
relation r(R). Entity integrity can be violated if any part of the primary key of the new tuple t is NULL. Referential
integrity can be violated if the value of any foreign key in t refers to a tuple that does not exist in the referenced
relation.
1. Business Rule constraints
These rules are applied to data before (first) the data is inserted into the table columns. For example, Unique, Not
NULL, Default constraints.
1. The primary key value can’t be null.
2. Not null (absence of any value (i.e., unknown or nonapplicable to a tuple)
3. Unique
4. Primary key
5. Foreign key
6. Check
7. Default
2. Null Constraints
Comparisons Involving NULL and Three-Valued Logic:
SQL has various rules for dealing with NULL values. Recall from Section 3.1.2 that NULL is used to represent a missing
value, but that it usually has one of three different interpretations—value unknown (exists but is not known), value
not available (exists but is purposely withheld), or value not applicable (the attribute is undefined for this tuple).
Consider the following examples to illustrate each of the meanings of NULL.
1. Unknownalue. A person’s date of birth is not known, so it is represented by NULL in the database.
2. Unavailable or withheld value. A person has a home phone but does not want it to be listed, so it is withheld
and represented as NULL in the database.
3. Not applicable attribute. An attribute Last_College_Degree would be NULL for a person who has no college
degrees because it does not apply to that person.
3. Enterprise Constraints
Enterprise constraints – sometimes referred to as semantic constraints – are additional rules specified by users or
database administrators and can be based on multiple tables.
Here are some examples.
A class can have a maximum of 30 students.
A teacher can teach a maximum of four classes per semester.
An employee cannot take part in more than five projects.
The salary of an employee cannot exceed the salary of the employee’s manager.
7. Authorization constraints
We may want to differentiate among the users as far as the type of access they are permitted to various data
values in the database. This differentiation is expressed in terms of Authorization. The most common being:
Read authorization – which allows reading but not the modification of data;
Insert authorization – which allows the insertion of new data but not the modification of existing data
Update authorization – which allows modification, but not deletion.
The three rules that referential integrity enforces are:
1. A foreign key must have a corresponding primary key. (“No orphans” rule.)
2. When a record in a primary table is deleted, all related records referencing the primary key must also be
deleted, which is typically accomplished by using cascade delete.
3. If the primary key for record changes, all corresponding records in other tables using the primary key as a
foreign key must also be modified. This can be accomplished by using a cascade update.
The preceding integrity constraints are included in the data definition language because they occur in most
database applications. However, they do not include a large class of general constraints, sometimes called
semantic integrity constraints, which may have to be specified and enforced on a relational database.
The types of constraints we discussed so far may be called state constraints because they define the constraints that
a valid state of the database must satisfy. Another type of constraint, called transition constraints, can be defined
to deal with state changes in the database. An example of a transition constraint is: “the salary of an employee can
only increase.”
END
SQL version:
1970 – Dr. Edgar F. “Ted” Codd described a relational model for databases.
1974 – Structured Query Language appeared.
1978 – IBM released a product called System/R.
1986 – IBM developed the prototype of a relational database, which is standardized by ANSI.
1989- First ever version launched of SQL
1999 – SQL 3 launched with features like triggers, object orientation, etc.
SQL2003- window functions, XML-related features, etc.
SQL2006- Support for XML Query Language
SQL2011-improved support for temporal databases
SQL-86 in 1986, the most recent version in 2011 (SQL:2016). Accepted by the American National Standards
Institute (ANSI) in 1986 and by the International Organization for Standardization (ISO) in 1987 Each vendor
provides its implementation (also called SQL dialect) of SQL.
Standard of SQL ANSI and ISO
In 1993, the ANSI and ISO development committees decided to split future SQL development into a multi-part
standard. The Parts, as of December 1995, are:
Part 1: Framework. A non-technical description of how the document is structured.
Part 2: Foundation. The core specification, including all of the new ADT and Object SQL, features; is currently over
800 pages.
Part 3: SQL/CLI. The call level interface. A version dependent only on SQL-92 was published in 1995 as ISO/IEC 9075-
3:1995. A follow-on, providing support for new features in other Parts of SQL is under development.
Part 4: SQL/PSM. The stored procedures specification, including computational completeness. Currently being
processed for DIS Ballot.
Part 5: SQL/Bindings. The Dynamic SQL and Embedded SQL bindings are taken from SQL-92. No active new work at
this time, although C++ and Java interfaces are under discussion.
Part 6: SQL/XA. An SQL specialization of the popular XA Interface developed by X/Open (see below).
Part 7: SQL/Temporal. A newly approved SQL subproject to develop enhanced facilities for temporal data
management using SQL.
Part 8: SQL Multimedia (SQL/Mm)
A new ISO/IEC international standardization project for the development of an SQL class library for multimedia
applications was approved in early 1993. This new standardization activity, named SQL Multimedia (SQL/MM), will
specify packages of SQL abstract data type (ADT) definitions using the facilities for ADT specification and invocation
provided in the emerging SQL3 specification.
Part 1 will be a Framework that specifies how the other parts are to be constructed. Each of the other parts will be
devoted to a specific SQL application package.
Query-By-Example (QBE)
Query-By-Example (QBE) is the first interactive database query language to exploit such modes of HCI. In QBE, a
query is constructed on an interactive terminal involving two-dimensional ‘drawings’ of one or more relations,
visualized in tabular form, which are filled in selected columns with ‘examples’ of data items to be retrieved (thus
the phrase query-by-example).
It is different from SQL, and from most other database query languages, in having a graphical user interface that
allows users to write queries by creating example tables on the screen.
QBE, like SQL, was developed at IBM and QBE is an IBM trademark, but a number of other companies sell QBE-like
interfaces, including Paradox.
A convenient shorthand notation is that if we want to print all fields in some relation, we can place P. under the
name of the relation. This notation is like the SELECT * convention in SQL. It is equivalent to placing a P. in every
field:
Example of QBE:
Hoffer, Ramesh, and Topi (2019) named the database life cycle as the database development activities, consisting of
seven phases:
III. Physical design. The physical design step involves the selection of indexes (access methods),
partitioning, and clustering of data. The logical design methodology in step II simplifies the
approach to designing large relational databases by reducing the number of data dependencies
that need to be analyzed. This is accomplished by inserting conceptual data modeling and
integration steps (II(a) and II(b) of pictures into the traditional relational design
approach.
IV. Database implementation, monitoring, and modification. Once the
design is completed, and the database can be created through the implementation of the formal
schema using the data definition language (DDL) of a DBMS.
Attribute Describes some aspect of the entity/object, characteristics of object. An attribute is a data item that
describes a property of an entity or a relationship
Column or field The column represents the set of values for a specific attribute. An attribute is for a model and a
column is for a table, a column is a column in a database table whereas attribute(s) are externally visible
facets of an object.
A relation instance is a finite set of tuples in the RDBMS system. Relation instances never have duplicate tuples.
Relationship Association between entities, connected entities are called participants, Connectivity describes the
relationship (1-1, 1-M, M-N)
Following the relation in above image consist degree=4, 5=cardinality, data values/cells = 20.
Characteristics of relation
1. Distinct Relation/table name
2. Relations are unordered
3. Cells contain exactly one atomic (Single) value means Each cell (field) must contain a single value
4. No repeating groups
5. Distinct attributes name
6. Value of attribute comes from the same domain
7. Order of attribute has no significant
8. The attributes in R(A1, ...,An) and the values in t = <V1,V2, ..... , Vn> are ordered.
9. Each tuple is a distinct
10. order of tuples that has no significance.
11. tuples may be stored and retrieved in an arbitrary order
12. Tables manage attributes. This means they store information in form of attributes only
13. Tables contain rows. Each row is one record only
14. All rows in a table have the same columns. Columns are also called fields
15. Each field has a data type and a name
16. A relation must contain at least one attribute (column) that identifies each tuple (row) uniquely
External Tables
An external table is a read-only table whose metadata is stored in the database but whose data
is stored outside the database.
Partitioning Tables
Partitioning logically splits up a table into smaller tables according to the partition column(s). So rows with the
same partition key are stored in the same physical location.
Table Splitting
Collections Records
All items are of the same data type All items are different data types
Same data type items are called elements Different data type items are called fields
For creating a collection variable you can use %TYPE For creating a record variable you can use %ROWTYPE or
%TYPE
Lists and arrays are examples Tables and columns are examples
By default, tables are heap-organized. This means the database is free to store rows wherever there is space. You
can add the "organization heap" clause if you want to be explicit.
Windowing Clause When you use order by, the database adds a default windowing clause of range
between unbounded preceding and current row.
Sliding Windows As well as running totals so far, you can change the windowing clause to be a subset of
the previous rows.
The following shows the total weight of:
1. The current row + the previous row
2. All rows with the same weight as the current + all rows with a weight one less than the current
Strategies for Schema design in DBMS
Top-down strategy –
Bottom-up strategy –
Inside-Out Strategy –
Mixed Strategy –
Identifying correspondences and conflicts among the schema integration in DBMS
Naming conflict
Type conflicts
Domain conflicts
Conflicts among constraints
Process of SQL
When we are executing the command of SQL on any Relational database management system, then the system
automatically finds the best routine to carry out our request, and the SQL engine determines how to interpret that
particular command.
Structured Query Language contains the following four components in its process:
1. Query Dispatcher
2. Optimization Engines
3. Classic Query Engine
4. SQL Query Engine, etc.
SQL Programming
Approaches to Database Programming| Comparing the Three Approaches
In this section, we briefly compare the three approaches for database programming
and discuss the advantages and disadvantages of each approach.
Several techniques exist for including database interactions in application programs.
The main approaches for database programming are the following:
1. Embedding database commands in a general-purpose programming language.
Embedded SQL Approach. The main advantage of this approach is that the query text is part of
the program source code itself, and hence can be checked for syntax errors and validated
against the database schema at compile time.
We can once again be faced with possible ambiguity among attribute names if attributes of the same name exist—
one in a relation in the FROM clause of the outer query, and another in a relation in the FROM clause of the nested
query. The rule is that a reference to an unqualified attribute refers to the relation declared in the innermost nested
query.
On the other hand, when we TRUNCATE a table, the table structure remains the same, so you will not face any of
the above problems.
In general, ANSI SQL permits the use of ON DELETE and ON UPDATE clauses to cover
CASCADE, SET NULL, or SET DEFAULT.
MS Access, SQL Server, and Oracle support ON DELETE CASCADE.
MS Access and SQL Server support ON UPDATE CASCADE.
Oracle does not support ON UPDATE CASCADE.
Oracle supports SET NULL.
MS Access and SQL Server do not support SET NULL.
Refer to your product manuals for additional information on referential constraints.
While MS Access does not support ON DELETE CASCADE or ON UPDATE CASCADE at the SQL command-line level,
A view is a virtual relation or one that does not exist but is dynamically derived it can be constructed by performing
operations (i.e., select, project, join, etc.) on values of existing base relation (a named relation that is designed in a
conceptual schema whose tuples are physically stored in the database). Views are viewable in the external
schema.
Types of View
1. User-defined view
a. Simple view (Single table view)
b. Complex View (Multiple tables having joins, group by, and functions)
c. Inline View (Based on a subquery in from clause to create a temp table and form a complex
query)
d. Materialized View (It stores physical data, definitions of tables)
e. Dynamic view
f. Static view
2. Database View
3. System Defined Views
4. Information Schema View
5. Catalog View
6. Dynamic Management View
7. Server-scoped Dynamic Management View
8. Sources of Data Dictionary Information View
a. General Views
b. Transaction Service Views
c. SQL Service Views
Advantages of View:
Provide security
Hide specific parts of the database from certain users
Customize base relations based on their needs
It supports the external model
Provide logical independence
Views don't store data in a physical location.
Views can provide Access Restriction, since data insertion, update, and deletion is not possible with the
view.
We can DML on view if it is derived from a single base relation, and contains the primary key or a
candidate key
When can a view be updated?
1. The view is defined based on one and only one table.
2. The view must include the PRIMARY KEY of the table based upon which the view has been created.
3. The view should not have any field made out of aggregate functions.
4. The view must not have any DISTINCT clause in its definition.
5. The view must not have any GROUP BY or HAVING clause in its definition.
6. The view must not have any SUBQUERIES in its definitions.
7. If the view you want to update is based upon another view, the latter should be updatable.
8. Any of the selected output fields (of the view) must not use constants, strings, or value expressions.
END
Normalizations:
Ans It is a refinement technique, it reduces redundancy and eliminates undesirable’s characteristics like insertion,
updating, and deletions. Removal of anomalies and reputations.
That normalization and E-R modeling are used concurrently to produce a good database design.
Advantages of normalization
Reduces data redundancies
Expending entities
Helps eliminate data anomalies
Produces controlled redundancies to link tables
Cost more processing efforts
Series steps called normal forms
1NF - First normal form
2NF - Second normal form
3NF - Third normal form
3.5NF BCNF
4NF - Fourth normal form
5NF - Fifth normal form
2. Insertion anomalies
The new employee must be assigned a project (phantom project). We tried to insert data in a record that does not
exist at all.
3. Deletion anomalies
If an employee is deleted, other vital data is lost. We tried to delete a record, but parts of it were left undeleted
because of unawareness, the data is also saved somewhere else.
if we delete the Dining Table from Order 1006, we lose information concerning this item's finish and price
In most cases, if you can place your relations in the third normal form (3NF), then you will have avoided most of
the problems common to bad relational designs. Boyce-Codd (BCNF) and the fourth normal form (4NF) handle
special situations that arise only occasionally.
Denormalization in Databases
Denormalization is a database optimization technique in which we add redundant data to one or more tables. This
can help us avoid costly joins in a relational database. Note that denormalization does not mean not doing
normalization. It is an optimization technique that is applied after normalization.
Types of Denormalization
The two most common types of denormalization are two entities in a one-to-one relationship and two entities in a
one-to-many relationship.
Pros of Denormalization: -
Retrieving data is faster since we do fewer joins
Queries to retrieve can be simpler (and therefore less likely to have bugs),
since we need to look at fewer tables.
Cons of Denormalization: -
Updates and inserts are more expensive.
Denormalization can make an update and insert code harder to write.
Data may be inconsistent. Which is the “correct” value for a piece of data?
Data redundancy necessities more storage.
Relational Decomposition
Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and
redundancy.
When a relation in the relational model is not inappropriate normal form then the decomposition of a relationship
is required. In a database, it breaks the table into multiple tables.
Types of Decomposition
1 Lossless Decomposition
If the information is not lost from the relation that is decomposed, then the decomposition will be lossless. The
process of normalization depends on being able to factor or decompose a table into two or smaller tables, in such a
way that we can recapture the precise content of the original table by joining the decomposed parts.
2 Lossy Decomposition
Data will be lost for more decomposition of the table.
END
Example:
Dependency Preserving
If a relation R is decomposed into relations R1 and R2, then the dependencies of R either must be a part of R1 or
R2 or must be derivable from the combination of functional dependencies of R1 and R2.
For example, suppose there is a relation R (A, B, C, D) with a functional dependency set (A->BC). The relational R is
decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a part of relation
R1(ABC)
Multivalued Dependency
Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on
a third attribute.
A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it
always requires at least three attributes.
Join Dependency
Join decomposition is a further generalization of Multivalued dependencies.
If the join of R1 and R2 over C is equal to relation R, then we can say that a join dependency (JD) exists.
Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D).
Alternatively, R1 and R2 are lossless decompositions of R.
A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join decomposition.
The (A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the relation R.
Here, (R1, R2, R3) is used to indicate that relation R1, R2, R3, and so on are a JD of R.
Inclusion Dependency
Multivalued dependency and join dependency can be used to guide database design although they both are less
common than functional dependencies. The inclusion dependency is a statement in which some columns of a
relation are contained in other columns.
Canonical Cover/ irreducible
A canonical cover or irreducible set of functional dependencies FD is a simplified set of FD that has a similar closure
as the original set FD.
Extraneous attributes
An attribute of an FD is said to be extraneous if we can remove it without changing the closure of the set of FD.
Closure properties of attributes
Closure of an Attribute: Closure of an Attribute can be defined as a set of attributes that can be functionally
determined from it.
Closure of a set of attributes X concerning F is the set X+ of all attributes that are functionally determined by X
END
Dirty Read – A Dirty read is a situation when a transaction reads data that has not yet been committed. For example,
Let’s say transaction 1 updates a row and leaves it uncommitted, meanwhile, Transaction 2 reads the updated row.
If transaction 1 rolls back the change, transaction 2 will have read data that is considered never to have existed.
Lost Updates occur when multiple transactions select the same row and update the row based on the value
selected
Non Repeatable read – Non Repeatable read occurs when a transaction reads the same row twice and gets a
different value each time. For example, suppose transaction T1 reads data. Due to concurrency, another
transaction T2 updates the same data and commits, Now if transaction T1 rereads the same data, it will retrieve a
different value.
Phantom Read – Phantom Read occurs when two same queries are executed, but the rows retrieved by the two, are
different. For example, suppose transaction T1 retrieves a set of rows that satisfy some search criteria. Now,
Transaction T2 generates some new rows that match the search criteria for transaction T1. If transaction T1 re-
executes the statement that reads the rows, it gets a different set of rows this time.
Based on these phenomena, The SQL standard defines four isolation levels :
Read Uncommitted – Read Uncommitted is the lowest isolation level. In this level, one transaction may read not
yet committed changes made by another transaction, thereby allowing dirty reads. At this level, transactions are
not isolated from each other.
Read Committed – This isolation level guarantees that any data read is committed at the moment it is read. Thus it
does not allows dirty reading. The transaction holds a read or write lock on the current row, and thus prevents
other transactions from reading, updating, or deleting it.
Repeatable Read – This is the most restrictive isolation level. The transaction holds read locks on all rows it
references and writes locks on all rows it inserts, updates, or deletes. Since other transactions cannot read, update
or delete these rows, consequently it avoids non-repeatable read.
Serializable – This is the highest isolation level. A serializable execution is guaranteed to be serializable. Serializable
execution is defined to be an execution of operations in which concurrently executing transactions appear to be
serially executing.
Durability: Durability ensures the permanency of something. In DBMS, the term durability ensures that the data
after the successful execution of the operation becomes permanent in the database. If a transaction is committed,
it will remain even error, power loss, etc.
ACID Example:
States of Transaction
Begin, active, partially committed, failed, committed, end, aborted
Aborted details are necessary
If any of the checks fail and the transaction has reached a failed state then the database recovery system will make
sure that the database is in its previous consistent state. If not then it will abort or roll back the transaction to bring
the database into a consistent state.
If the transaction fails in the middle of the transaction then before executing the transaction, all the executed
transactions are rolled back to their consistent state.
After aborting the transaction, the database recovery module will select one of the two operations:
Re-start the transaction
Kill the transaction
The scheduler
A module that schedules the transaction’s actions, ensuring serializability
Two main approaches
1. Pessimistic: locks
2. Optimistic: time stamps, MV, validation
Scheduling
A schedule is responsible for maintaining jobs/transactions if many jobs are entered at the
same time(by multiple users) to execute state and read/write operations performed at that jobs.
A schedule is a sequence of interleaved actions from all transactions. Execution of several Facts while preserving
the order of R(A) and W(A) of any 1 Xact.
Note: Two schedules are equivalent if:
Two Schedules are equivalent if they have the same dependencies.
They contain the same transactions and operations
They order all conflicting operations of non-aborting transactions in the same way
A schedule is serializable if it is equivalent to a serial schedule
Serial Schedule
The serial schedule is a type of schedule where one transaction is executed completely before starting another
transaction.
A serializable schedule always leaves the database in a consistent state. A serial schedule is always a serializable
schedule because, in a serial schedule, a transaction only starts when the other transaction finished execution.
However, a non-serial schedule needs to be checked for Serializability.
A non-serial schedule of n number of transactions is said to be a serializable schedule if it is equivalent to the serial
schedule of those n transactions. A serial schedule doesn’t allow concurrency, only one transaction executes at a
time, and the other stars when the already running transaction is finished.
Linearizability: a guarantee about single operations on single objects Once the write completes, all later reads (by
wall clock) should reflect that write.
Types of Serializability
There are two types of Serializability.
1. Conflict Serializability
2. View Serializability
Conflict Serializable A schedule is conflict serializable if it is equivalent to some serial schedule
Non-conflicting operations can be reordered to get a serial schedule.
In general, a schedule is conflict-serializable if and only if its precedence graph is acyclic
A precedence graph is used for Testing for Conflict-Serializability
View serializability/view equivalence is a concept that is used to compute whether schedules are View-
Serializable or not. A schedule is said to be View-Serializable if it is view equivalent to a Serial Schedule (where no
interleaving of transactions is possible).
The non-serializable schedule is divided into two types, Recoverable and Non-recoverable Schedules.
1. Recoverable Schedule(Cascading Schedule, cascades Schedule, strict Schedule). In a recoverable schedule, if a
transaction T commits, then any other transaction that T read from must also have committed.
A schedule is recoverable if:
It is conflict-serializable, and
Whenever a transaction T commits, all transactions that have written elements read by T have already been
committed.
2. Non-Recoverable Schedule
The relation between various types of schedules can be depicted as:
Three-phase Commit
Another real-world atomic commit protocol is a three-phase commit (3PC). This protocol can reduce the amount of
blocking and provide for more flexible recovery in the event of failure. Although it is a better choice in unusually
failure-prone environments, its complexity makes 2PC the more popular choice.
Transaction atomicity using a two-phase commit
Transaction serializability using distributed locking.
DBMS Deadlock Types or techniques
All lock requests are made to the concurrency-control manager. Transactions proceed only once the lock request is
granted. A lock is a variable, associated with the data item, which controls the access of that data item. Locking is
the most widely used form of concurrency control.
Deadlock Example:
1. Binary Locks: A Binary lock on a data item can either be locked or unlocked states.
2. Shared/exclusive: This type of locking mechanism separates the locks in DBMS based on their uses. If a lock is
acquired on a data item to perform a write operation, it is called an exclusive lock.
3. Simplistic Lock Protocol: This type of lock-based protocol allows transactions to obtain a lock on every object
before beginning operation. Transactions may unlock the data item after finishing the ‘write’ operation.
4. Pre-claiming Locking: Two-Phase locking protocol which is also known as a 2PL protocol needs a transaction
should acquire a lock after it releases one of its locks. It has 2 phases growing and shrinking.
5. Shared lock: These locks are referred to as read locks, and denoted by 'S'.
If a transaction T has obtained Shared-lock on data item X, then T can read X, but cannot write X. Multiple Shared
locks can be placed simultaneously on a data item.
A deadlock is an unwanted situation in which two or more transactions are waiting indefinitely for one another to
give up locks.
The Bakery algorithm is one of the simplest known solutions to the mutual exclusion problem for the general case
of the N process. The bakery Algorithm is a critical section solution for N processes. The algorithm preserves the first
come first serve the property.
Before entering its critical section, the process receives a number. The holder of the smallest number enters the
critical section.
Deadlock detection
This technique allows deadlock to occur, but then, it detects it and solves it. Here, a database is periodically checked
for deadlocks. If a deadlock is detected, one of the transactions, involved in the deadlock cycle, is aborted. Other
transactions continue their execution. An aborted transaction is rolled back and restarted.
When a transaction waits more than a specific amount of time to obtain a lock (called the deadlock timeout),
Derby can detect whether the transaction is involved in a deadlock.
If deadlocks occur frequently in your multi-user system with a particular application, you might need to do some
debugging.
A deadlock where two transactions are waiting for one another to give up locks.
Deadlock detection and removal schemes
Wait-for-graph
This scheme allows the older transaction to wait but kills the younger one.
Phantom deadlock detection is the condition where the deadlock does not exist but due to a delay in propagating
local information, deadlock detection algorithms identify the locks that have been already acquired.
There are three alternatives for deadlock detection in a distributed system, namely.
Centralized Deadlock Detector − One site is designated as the central deadlock detector.
Hierarchical Deadlock Detector − Some deadlock detectors are arranged in a hierarchy.
Distributed Deadlock Detector − All the sites participate in detecting deadlocks and removing them.
The deadlock detection algorithm uses 3 data structures –
Available
Vector of length m
Indicates the number of available resources of each type.
Allocation
Matrix of size n*m
A[i,j] indicates the number of j the resource type allocated to I the process.
Request
Matrix of size n*m
Banker’s Algorithm
Banker's Algorithm for Single Resource Type is a resource allocation and deadlock avoidance algorithm. This name
has been given since it is one of most problems in Banking Systems these days.
In this, as a new process P1 enters, it declares the maximum number of resources it needs.
The system looks at those and checks if allocating those resources to P1 will leave the system in a safe state or not.
If after allocation, it will be in a safe state, the resources are allocated to process P1.
Otherwise, P1 should wait till the other processes release some resources.
This is the basic idea of Banker’s Algorithm.
A state is safe if the system can allocate all resources requested by all processes ( up to their stated maximums )
without entering a deadlock state.
Resource Preemption:
To eliminate deadlocks using resource preemption, we preempt some resources from processes and give those
resources to other processes. This method will raise three issues –
(a) Selecting a victim:
We must determine which resources and which processes are to be preempted and also order to minimize the
cost.
(b) Rollback:
We must determine what should be done with the process from which resources are preempted. One simple idea
is total rollback. That means aborting the process and restarting it.
(c) Starvation:
In a system, the same process may be always picked as a victim. As a result, that process will never complete its
designated task. This situation is called Starvation and must be avoided. One solution is that a process must be
picked as a victim only a finite number of times.
Concurrent executions are done for Better transaction throughput, response time
Done via better utilization of resources:
Concurrency Control
What is Concurrency Control?
Concurrent access is quite easy if all users are just reading data. There is no way they can interfere with one
another. Though for any practical Database, it would have a mix of READ and WRITE operations, and hence the
concurrency is a challenge. DBMS Concurrency Control is used to address such conflicts, which mostly occur with a
multi-user system.
data simultaneously. Two Phase Locking protocol helps to eliminate the concurrency problem in DBMS. Every 2PL
schedule is serializable.
Theorem: 2PL ensures/enforce conflict serializability schedule
But does not enforce recoverable schedules
2PL rule: Once a transaction has released a lock it is not allowed to obtain any other locks
This locking protocol divides the execution phase of a transaction into three different parts.
In the first phase, when the transaction begins to execute, it requires permission for the locks it needs.
The second part is where the transaction obtains all the locks. When a transaction releases its first lock, the third
phase starts.
In this third phase, the transaction cannot demand any new locks. Instead, it only releases the acquired locks.
The Two-Phase Locking protocol allows each transaction to make a lock or unlock request Growing Phase and
Shrinking Phase.
2PL has the following two phases:
A growing phase, in which a transaction acquires all the required locks without unlocking any data. Once all locks
have been acquired, the transaction is in its locked
point.
A shrinking phase, in which a transaction releases all locks and cannot obtain any new lock.
In practice:
– Growing phase is the entire transaction
– Shrinking phase is during the commit
The 2PL protocol indeed offers serializability. However, it does not ensure that deadlocks do not happen.
In the above-given diagram, you can see that local and global deadlock detectors are searching for deadlocks and
solving them by resuming transactions to their initial states.
Strict Two-Phase Locking Method
Strict-Two phase locking system is almost like 2PL. The only difference is that Strict-2PL never releases a lock after
using it. It holds all the locks until the commit point and releases all the locks at one go when the process is over.
Strict 2PL: All locks held by a transaction are released when the transaction is completed. Strict 2PL guarantees
conflict serializability, but not serializability.
Centralized 2PL
In Centralized 2PL, a single site is responsible for the lock management process. It has only one lock manager for
the entire DBMS.
Primary copy 2PL
Primary copy 2PL mechanism, many lock managers are distributed to different sites. After that, a particular lock
manager is responsible for managing the lock for a set of data items. When the primary copy has been updated,
the change is propagated to the slaves.
Distributed 2PL
In this kind of two-phase locking mechanism, Lock managers are distributed to all sites. They are responsible for
managing locks for data at that site. If no data is replicated, it is equivalent to primary copy 2PL. Communication
costs of Distributed 2PL are quite higher than primary copy 2PL
Time-Stamp Methods for Concurrency control:
The timestamp is a unique identifier created by the DBMS to identify the relative starting time of a transaction.
Typically, timestamp values are assigned in the order in which the transactions are submitted to the system. So, a
timestamp can be thought of as the transaction start time. Therefore, time stamping is a method of concurrency
control in which each transaction is assigned a transaction timestamp.
Timestamps must have two properties namely
Uniqueness: The uniqueness property assures that no equal timestamp values can exist.
Monotonicity: monotonicity assures that timestamp values always increase.
Timestamps are divided into further fields:
Granule Timestamps
Timestamp Ordering
Conflict Resolution in Timestamps
Timestamp-based Protocol in DBMS is an algorithm that uses the System Time or Logical Counter as a timestamp
to serialize the execution of concurrent transactions. The Timestamp-based protocol ensures that every conflicting
read and write operation is executed in timestamp order.
The timestamp-based algorithm uses a timestamp to serialize the execution of concurrent transactions. The
protocol uses the System Time or Logical Count as a Timestamp.
Conflict Resolution in Timestamps:
To deal with conflicts in timestamp algorithms, some transactions involved in conflicts are made to wait and abort
others.
Write Phase
Read Phase
In the Read Phase, the data values from the database can be read by a transaction but the write operation or
updates are only applied to the local data copies, not the actual database.
Validation Phase
In the Validation Phase, the data is checked to ensure that there is no violation of serializability while applying the
transaction updates to the database.
Write Phase
In the Write Phase, the updates are applied to the database if the validation is successful, else; the updates are not
applied, and the transaction is rolled back.
Laws of concurrency control
First Law of Concurrency Control
Concurrent execution should not cause application programs to malfunction.
Second Law of Concurrency Control
Concurrent execution should not have lower throughput or much higher response times than serial execution.
Lock Thrashing is the point where system performance(throughput) decreases with increasing load
(adding more active transactions). It happens due to the contention of locks. Transactions waste time
on lock waits.
The default concurrency control mechanism depends on the table type:
Disk-based tables (D-tables) are by default optimistic.
Main-memory tables (M-tables) are always pessimistic.
Pessimistic locking (Locking and timestamp) is useful if there are a lot of updates and relatively high chances of
users trying to update data at the same time.
Optimistic (Validation)locking is useful if the possibility for conflicts is very low – there are many records but
relatively few users, or very few updates and mostly read-type operations.
Optimistic concurrency control is based on the idea of conflicts and transaction restart while pessimistic
concurrency control uses locking as the basic serialization mechanism (it assumes that two or more users will want
to update the same record at the same time, and then prevents that possibility by locking the record, no matter
how unlikely conflicts are.
Properties
Optimistic locking is useful in stateless environments (such as mod_plsql and the like). Not only useful but critical.
optimistic locking -- you read data out and only update it if it did not change.
Optimistic locking only works when developers modify the same object. The problem occurs when multiple
developers are modifying different objects on the same page at the same time. Modifying one
object may affect the process of the entire page, which other developers may not be aware of.
pessimistic locking -- you lock the data as you read it out AND THEN modify it.
Lock Granularity:
A database is represented as a collection of named data items. The size of the data item chosen as the unit of
protection by a concurrency control program is called granularity. Locking can take place at the following level :
Database level.
Table level(Coarse-grain locking).
Page level.
Row (Tuple) level.
Attributes (fields) level.
Multiple Granularity
Let's start by understanding the meaning of granularity.
Granularity: It is the size of the data item allowed to lock.
It can be defined as hierarchically breaking up the database into blocks that can be locked.
The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
It maintains the track of what to lock and how to lock.
It makes it easy to decide either to lock a data item or to unlock a data item. This type of hierarchy can be
graphically represented as a tree.
There are three additional lock modes with multiple granularities:
Intention-shared (IS): It contains explicit locking at a lower level of the tree but only with shared locks.
Intention-Exclusive (IX): It contains explicit locking at a lower level with exclusive or shared locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared mode, and some node is locked in
exclusive mode by the same transaction.
Compatibility Matrix with Intention Lock Modes: The below table describes the compatibility matrix for these lock
modes:
In our example:
– T1: reads the list of products
– T2: inserts a new product
– T1: re-reads: a new product appears!
END
Select(σ) The SELECT operation is used for selecting a subset of the tuples according
to a given selection condition (Unary operator)
Projection(π) The projection eliminates all attributes of the input relation but those
mentioned in the projection list. (Unary operator)/ Projection operator has
to eliminate duplicates!
Union Operation(∪) UNION is symbolized by the symbol. It includes all tuples that are in tables
A or B.
Set Difference(-) - Symbol denotes it. The result of A - B, is a relation that includes all tuples
that are in A but not in B.
Intersection(∩) Intersection defines a relation consisting of a set of all tuples that are in
both A and B.
Cartesian Product(X) Cartesian operation is helpful to merge columns from two relations.
Inner Join Inner join includes only those tuples that satisfy the matching criteria.
Theta Join(θ) The general case of the JOIN operation is called a Theta join. It is denoted
by the symbol θ.
EQUI Join When a theta join uses only an equivalence condition, it becomes an equi
join.
Natural Join(⋈) Natural join can only be performed if there is a common attribute (column)
between the relations.
Outer Join In an outer join, along with tuples that satisfy the matching criteria.
Left Outer Join( ) In the left outer join, the operation allows keeping all tuples in the left
relation.
Right Outer join( ) In the right outer join, the operation allows keeping all tuples in the right
relation.
Full Outer Join( ) In a full outer join, all tuples from both relations are included in the result
irrespective of the matching condition.
Select Operation
Notation: ⴋp(r) p is called the selection predicate
Project Operation
Notation: πA1,..., Ak (r)
The result is defined as the relation of k columns obtained by deleting the columns that are not listed
Union Operation
Notation: r Us
Relational Calculus
There is an alternate way of formulating queries known as Relational Calculus. Relational calculus is a non-procedural
query language. In the non-procedural query language, the user is concerned with the details of how to obtain the
results. The relational calculus tells what to do but never explains how to do it. Most commercial relational languages
are based on aspects of relational calculus including SQL-QBE and QUEL.
It is based on Predicate calculus, a name derived from a branch of symbolic language. A predicate is a truth-valued
function with arguments.
Relational Algebra targets how to obtain the Relational Calculus targets what result
2 Objective
result. to obtain.
Notations of RC
In TRS, the variables represent the tuples In DRS, the variables represent the value drawn from the
from specified relations. specified domain.
A tuple is a single element of relation. In A domain is equivalent to column data type and any
database terms, it is a row. constraints on the value of data.
Notation : Notation :
{T | P (T)} or {T | Condition (T)} { a1, a2, a3, …, an | P (a1, a2, a3, …, an)}
Example : Example :
{T | EMPLOYEE (T) AND T.DEPT_ID = 10} { | < EMPLOYEE > DEPT_ID = 10 }
Examples of RC:
Query Block in RA
SQL, Relational Algebra, Tuple Calculus, and domain calculus examples: Comparisons
Select Operation
R = (A, B)
Relational Algebra: σB=17 (r)
Tuple Calculus: {t | t ∈ r ∧ B = 17}
Domain Calculus: {<a, b> | <a, b> ∈ r ∧ b = 17}
Project Operation
R = (A, B)
Relational Algebra: ΠA(r)
Tuple Calculus: {t | ∃ p ∈ r (t[A] = p[A])}
Domain Calculus: {<a> | ∃ b ( <a, b> ∈ r )}
Combining Operations
R = (A, B)
Relational Algebra: ΠA(σB=17 (r))
Tuple Calculus: {t | ∃ p ∈ r (t[A] = p[A] ∧ p[B] = 17)}
Domain Calculus: {<a> | ∃ b ( <a, b> ∈ r ∧ b = 17)}
Natural Join
R = (A, B, C, D) S = (B, D, E)
Relational Algebra: r ⋈ s
Πr.A,r.B,r.C,r.D,s.E(σr.B=s.B ∧ r.D=s.D (r × s))
Tuple Calculus: {t | ∃ p ∈ r ∃ q ∈ s (t[A] = p[A] ∧ t[B] = p[B] ∧
t[C] = p[C] ∧ t[D] = p[D] ∧ t[E] = q[E] ∧
p[B] = q[B] ∧ p[D] = q[D])}
Domain Calculus: {<a, b, c, d, e> | <a, b, c, d> ∈ r ∧ <b, d, e> ∈ s}
Key components:
Right from the moment the query is written and submitted by the user, to the point of its execution and the
eventual return of the results, there are several steps involved. These steps are outlined below in the following
diagram.
The parsing of a query is performed within the database using the Optimizer component. Taking all of these inputs
into consideration, the Optimizer decides the best possible way to execute the query. This information is stored
within the SGA in the Library Cache – a sub-pool within the Shared Pool.
The memory area within the Library Cache in which the information about a query’s processing is kept is called the
Cursor. Thus, if a reusable cursor is found within the library cache, it’s just a matter of picking it up and using it to
execute the statement. This is called Soft Parsing. If it’s not possible to find a reusable cursor or if the query has
never been executed before, query optimization is required. This is called Hard Parsing.
Hard parsing means that either the cursor was not found in the library cache or it was found but was invalidated for
some reason. For whatever reason, Hard Parsing would mean that work needs to be done by the optimizer to ensure
the most optimal execution plan for the query.
Before the process of finding the best plan is started for the query, some tasks are completed. These tasks are
repeatedly executed even if the same query executes in the same session for N number of times:
1. Syntax Check
2. Semantics Check
3. Hashing the query text and generating a hash key-value pair
Various phases of query executation in system. First query go from client process to server
process and in PGA SQL area then following
phases start:
1 Parsing (Parse query tree, (syntax check, semantic check, shared pool check)
used for soft parse
2 Transformation (Binding)
3 Estimation/query optimization
4 Plan generation, row source generation
5 Query Execution & plan
6 Query result
Index and Table scan in the query execution process
Query Evaluation
The logic applied to the evaluation of SELECT statements, as described here, does not precisely reflect how the
DBMS Server evaluates your query to determine the most efficient way to return results. However, by applying this
logic to your queries and data, the results of your queries can be anticipated.
1. Evaluate the FROM clause. Combine all the sources specified in the FROM clause to create a Cartesian product (a
table composed of all the rows and columns of the sources). If joins are specified, evaluate each join to obtain its
results table, and combine it with the other sources in the FROM clause. If SELECT DISTINCT is specified, discard
duplicate rows.
2. Apply the WHERE clause. Discard rows in the result table that do not fulfill the restrictions specified in the
WHERE clause.
3. Apply the GROUP BY clause. Group results according to the columns specified in the GROUP BY clause.
4. Apply the HAVING clause. Discard rows in the result table that do not fulfill the restrictions specified in the HAVING
clause.
5. Evaluate the SELECT clause. Discard columns that are not specified in the SELECT clause. (In case of SELECT FIRST
n… UNION SELECT …, the first n rows of the result from the union are chosen.)
6. Perform any unions. Combine result tables as specified in the UNION clause. (In case of SELECT FIRST n… UNION
SELECT …, the first n rows of the result from the union are chosen.)
7. Apply for the ORDER BY clause. Sort the result rows as specified.
Query Evaluation Techniques for Large Databases
Steps to process a query: parsing, validation, resolution, optimization, plan compilation, execution.
The architecture of query engines:
Query processing algorithms iterate over members of input sets; algorithms are algebra operators. The physical
algebra is the set of operators, data representations, and associated cost functions that the database execution
engine supports, while the logical algebra is more related to the data model and expressible queries of the data
model (e.g. SQL).
Synchronization and transfer between operators are key. Naïve query plan methods include the creation of
temporary files/buffers, using one process per operator, and using IPC. The practical method is to implement all
operators as a set of procedures (open, next, and close), and have operators schedule each other within a single
process via simple function calls. Each time an operator needs another piece of data ("granule"), it calls its data
input operator's next function to produce one. Operators structured in such a manner are called iterators.
Note: Three SQL relational algebra query plans one pushed, nearly fully pushed
Query plans are algebra expressions and can be represented as trees. Left-deep (every right subtree is a leaf),
right-deep (every left-subtree is a leaf), and bushy (arbitrary) are the three common structures. In a left-deep tree,
each operator draws input from one input and an inner loop integrates over the other input.
Evaluation Plan
Cost Estimation
The cost estimation of a query evaluation plan is calculated in terms of various resources that include: Number of
disk accesses. Execution time is taken by the CPU to execute a query.
Query Optimization
Summary of steps of processing an SQL query:
Lexical analysis, parsing, validation, Query Optimizer, Query Code Generator, Runtime Database Processor
The term optimization here has the meaning “choose a reasonably efficient strategy” (not necessarily the best
strategy)
Query optimization: choosing a suitable strategy to execute a particular query more efficiently
An SQL query undergoes several stages: lexical analysis (scanning, LEX), parsing (YACC), validation
Scanning: identify SQL tokens
Parser: check the query syntax according to the SQL grammar
Validation: check that all attributes/relation names are valid in the particular database being queried
Then create the query tree or the query graph (these are internal representations of the query)
Main techniques to implement query optimization
Heuristic rules (to order the execution of operations in a query)
Computing cost estimates of different execution strategies
Process for heuristics optimization
1. The parser of a high-level query generates an initial
internal representation;
2. Apply heuristics rules to optimize the internal
representation.
3. A query execution plan is generated to execute groups of
operations based on the access paths available on the files
involved in the query.
Sorting
External sorting is a basic ingredient of relational operators that use sort-merge strategies
Sorting is used implicitly in SQL in many situations:
Order by clause, join a union, intersection, duplicate elimination distinct.
Sorting can be avoided if we have an index (ordered access to the data)
External Sorting: (sorting large files of records that don’t fit entirely in the main memory)
Internal Sorting: (sorting files that fit entirely in the main memory)
All sorting in "real" database systems uses merging techniques since very large data sets are expected. Sorting
modules' interfaces should follow the structure of iterators.
Exploit the duality of quicksort and mergesort. Sort proceeds in divide phase and combines phase. One of the two
phases is based on logical keys (indexes), the physically arranges data items (which phase is logical is particular to
an algorithm). Two sub algorithms: one for sorting a run within main memory, another for managing runs on disk
or tape. The degree of fan-in (number of runs merged in a given step) is a key parameter.
External sorting:
The first step is bulk loading the B+ tree index (i.e., sort data entries and records). Useful for eliminating duplicate
copies in a collection of records (Why?)
Sort-merge join algorithm involves sorting.
Hashing:
Hashing should be considered for equality matches, in general.
Hashing-based query processing algos use the in-memory hash table of database objects; if data in the hash table
is bigger than the main memory (common case), then hash table overflow occurs. Three techniques for overflow
handling exist:
Avoidance: input set is partitioned into F files before any in-memory hash table is built. Partitions can be dealt with
independently. Partition sizes must be chosen well, or recursive partitioning will be needed.
Resolution: assume overflow won't occur; if it does, partition dynamically.
Hybrid: like resolution, but when partition, only write one partition to disk, keep the rest in memory.
END
File Organization
File Organization defines how file records are mapped onto disk blocks. We have four types of File Organization to
organize file records −
Sorted Files: Best if records must be retrieved in some order, or only a `range’ of records is needed.
Indexing
Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes
on which the indexing has been done. Indexing in database systems is like what we see in books.
Indexing is defined based on its indexing attributes.
4 Non-Clustering The Non-Clustering indexes are used to quickly find all records whose values in a certain
field satisfy some condition. Non-clustering index (different order of data and index). Non-clustering Index
whose search key specifies an order different from the sequential order of the file. Non-clustering indexes
are also called secondary indexes.
Dense Index
In a dense index, there is an index record for every search key value in the database. This makes searching faster
but requires more space to store index records themselves. Index records contain a search key value and a pointer
to the actual record on the disk.
Sparse Index
In a sparse index, index records are not created for every search key. An index record here contains a search key
and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach the
actual location of the data. If the data we are looking for is not where we directly reach by following the index,
then the system starts a sequential search until the desired data is found.
Multilevel Index
Index records comprise search-key values and data pointers. The multilevel index is stored on the disk along with
the actual database files. As the size of the database grows, so does the size of the indices. There is an immense
need to keep the index records in the main memory to speed up the search operations. If the single-level index is
used, then a large size index cannot be kept in memory which leads to multiple disk accesses.
A multi-level Index helps in breaking down the index into several smaller indices to make the outermost level so
small that it can be saved in a single disk block, which can easily be accommodated anywhere in the main memory.
B+ Tree
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf nodes of a B+ tree
denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same height, thus balanced.
Additionally, the leaf nodes are linked using a link list; therefore, a B+ tree can support random access as well as
sequential access.
Structure of B+ Tree
Every leaf node is at an equal distance from the root node. A B+ tree is of the order n where n is fixed for every
B+ tree.
Internal nodes −
Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
At most, an internal node can contain n pointers.
Leaf nodes −
Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
At most, a leaf node can contain n record pointers and n key values.
Every leaf node contains one block pointer P to point to the next leaf node and forms a linked list.
Hash Organization
Hashing uses hash functions with search keys as parameters to generate the address of a data record.
Bucket − A hash file stores data in bucket format. The bucket is considered a unit of storage. A bucket typically
stores one complete disk block, which in turn can store one or more records.
Hash Function − A hash function, h, is a mapping function that maps all the set of search keys K to the address
where actual records are placed. It is a function from search keys to bucket addresses.
Types of Hashing Techniques
There are mainly two types of SQL hashing methods/techniques:
1 Static Hashing
2 Dynamic Hashing/Extendible hashing
Static Hashing
In static hashing, when a search-key value is provided, the hash function always computes the same address.
Static hashing is further divided into:
1. Open hashing
2. Close hashing.
Linear Probing − When a hash function generates an address at which data is already stored, the next free bucket
is allocated to it. This mechanism is called Open Hashing.
Data bucket – Data buckets are memory locations where the records are stored. It is also known as a Unit of
storage.
Key: A DBMS key is an attribute or set of an attribute that helps you to identify a row(tuple) in a relation(table).
This allows you to find the relationship between two tables.
Hash function: A hash function, is a mapping function that maps all the set of search keys to the address where
actual records are placed.
Linear Probing – Linear probing is a fixed interval between probes. In this method, the next available data block is
used to enter the new record, instead of overwriting the older record.
Quadratic probing– It helps you to determine the new bucket address. It helps you to add Interval between probes
by adding the consecutive output of quadratic polynomial to starting value given by the original computation.
Hash index – It is an address of the data block. A hash function could be a simple mathematical function to even a
complex mathematical function.
Double Hashing –Double hashing is a computer programming method used in hash tables to resolve the issues of a
collision.
Bucket Overflow: The condition of bucket overflow is called a collision. This is a fatal stage for any static to
function.
Hashing function h(r) Mapping from the index’s search key to a bucket in which the (data entry for) record r
belongs.
What is Collision?
Hash collision is a state when the resultant hashes from two or more data in the data set, wrongly map the same
place in the hash table.
How to deal with Hashing Collision?
There is two technique that you can use to avoid a hash collision:
1. Rehashing: This method, invokes a secondary hash function, which is applied continuously until an empty slot is
found, where a record should be placed.
2. Chaining: The chaining method builds a Linked list of items whose key hashes to the same value. This method
requires an extra link field to each table position.
Indexing
An index is an on-disk structure associated with a table or view that speeds the retrieval of rows from the table or
view. An index contains keys built from one or more columns in the table or view. Indexes are automatically created
when PRIMARY KEY and UNIQUE constraints are defined on table columns. An index on a file speeds up selections
on the search key fields for the index.
The index is a collection of buckets.
Bucket = primary page plus zero or more overflow pages. Buckets contain data entries.
Types of Indexes
1 Clustered Index
2 Non-Clustered Index
3 Column Store Index
4 Filtered Index
5 Hash-based Index
6 Dense primary index
7 sparse index
8 b or b+ tree index
9 FK index
10 Secondary index
11 File Indexing – B+ Tree
12 Bitmap Indexing
13 Inverted Index
14 Forward Index
15 Function-based index
16 Spatial index
17 Bitmap Join Index
18 Composite index
19 Primary key index If the search key contains a primary key, then it is called a primary index.
20 Unique index: Search key contains a candidate key.
21 Multilevel index(A multilevel index considers the index file, which we will now refer to as the first (or
base) level of a multilevel index, as an ordered file with a distinct value for each K(i))
22 Inner index: The main index file for the data
23 Outer index: A sparse index on the index
END
Database users:
The Create User command creates a user. It also automatically creates a schema for that user.
The Schema Also Logical Structure to process the data in the Database(Memory Component). It's created
automatically by Oracle when the user is created.
Create Profile
SQL> Create profile clerk limit
sessions_per_user 1
idle_time 30
connect_time 600;
Create User
SQL> Create user dcranney
identified by bedrock
default tablespace users
temporary tablespace temp_ts
profile clerk
CREATE A ROLE
SYS> create role SHARIF IDENTIFIED BY devdb;
GRANTING SYSTEM PRIVILEGES TO A ROLE
SYS> GRANT create table, create view, create synonym, create sequence, create trigger to SHARIF;
Grant succeeded
GRANT A ROLE TO USERS
SYS> grant SHARIF to sony, scott;
ACTIVATE A ROLE
SCOTT> set role SHARIF identified by devdb;
TO DISABLING ALL ROLE
SCOTT> set role none;
GRANT A PRIVILEGE
SYS> grant create any table to SHARIF;
REVOKE A PRIVILEGE
SYS> revoke create any table from SHARIF;
SET ALL ROLES ASSIGNED TO scott AS DEFAULT
SYS> alter user scott default role all;
SYS> alter user scott default role SHARIF;
SONY can access user sham.emp table because SELECT PRIVILEGE given to ‘PUBLIC’. So that sham.emp is available
to everyone of the database. SONY has created a view EMP_VIEW based on sham.emp.
Note: If you revoke OBJECT PRIVILEGE from a user, that privilege also revoked to whom it was granted.
Note: If you grant RESOURCE role to the user, this privilege overrides all explicit tablespace quotas. The UNLIMITED
TABLESPACE system privilege lets the user allocate as much space in any tablespaces that make up the database.
Database account locks and unlock
Alter user admin identified by admin account lock;
Select u.username from all_users u where u.username like 'info';
END
Kimball’s approach – creating data marts first and then developing a data warehouse database incrementally from
independent data marts.
Type is Denormalized.
Focuses on infrastructure functionality using multidimensional database management systems (MDBMS) like star
schema or snowflake schema
Data Mart
A data mart(s) can be created from an existing data warehouse—the top-down approach—or other sources, such as
internal operational systems or external data. Similar to a data warehouse, it is a relational database that stores
transactional data (time value, numerical order, reference to one or more objects) in columns and rows making it
easy to organize and access.
Data marts and data warehouses are both highly structured repositories where data is stored and managed until it
is needed. Data marts are designed for a specific line of business and DWH is designed for enterprise-wide range
use. The data mart is >100 and DWH is >100 and the Data mart is a single subject but DWH is a multiple subjects
repository. Data marts are independent data marts and dependent data marts.
Data mart contains a subset of organization-wide data. This subset of data is valuable to specific groups of an
organization.
Definition
Types of Dimensions
Conformed Conformed dimensions are the very fact to which it relates. This dimension is used in more
Dimensions than one-star schema or Datamart.
Outrigger A dimension may have a reference to another dimension table. These secondary dimensions
Dimensions are called outrigger dimensions. This kind of Dimension should be used carefully.
Shrunken Rollup Shrunken Rollup dimensions are a subdivision of rows and columns of a base dimension.
Dimensions These kinds of dimensions are useful for developing aggregated fact tables.
Dimension-to-
Dimensions may have references to other dimensions. However, these relationships can be
Dimension Table
modeled with outrigger dimensions.
Joins
Role-Playing A single physical dimension helps to reference multiple times in a fact table as each
Dimensions reference links to a logically distinct role for the dimension.
It is a collection of random transactional codes, flags, or text attributes. It may not logically
Junk Dimensions
belong to any specific dimension.
A degenerate dimension is without a corresponding dimension. It is used in the transaction
Degenerate
and collecting snapshot fact tables. This kind of dimension does not have its dimension as it
Dimensions
is derived from the fact table.
Swappable They are used when the same fact table is paired with different versions of the same
Dimensions dimension.
1. Logical Extraction
The logical Extraction method in turn has two methods:
i) Full Extraction
For example, exporting a complete table in the form of a flat file.
ii) Incremental Extraction
In incremental extraction, the changes in source data need to be tracked since the last successful extraction.
2. Physical Extraction
Physical extraction has two methods: Online and Offline extraction:
i) Online Extraction
In this process, the extraction process directly connects to the source system and extracts the source data.
ii) Offline Extraction
The data is not extracted directly from the source system but is staged explicitly outside the source system.
Data Capture
Data capture is an advanced extraction process. It enables the extraction of data from documents, converting it
into machine-readable data. This process is used to collect important organizational information when the source
systems are in the form of paper/electronic documents (receipts, emails, contacts, etc.)
Characteristics of OLAP
In the FASMI characteristics of OLAP methods, the term derived from the first letters of the characteristics are:
Fast
It defines which system is targeted to deliver the most feedback to the client within about five seconds, with the
elementary analysis taking no more than one second and very few taking more than 20 seconds.
Analysis
It defines which method can cope with any business logic and statistical analysis that is relevant for the function and
the user, and keep it easy enough for the target client. Although some preprogramming may be needed we do not
think it acceptable if all application definitions have to allow the user to define new Adhoc calculations as part of the
analysis and to document the data in any desired method, without having to program so we exclude products (like
Oracle Discoverer) that do not allow the user to define new Adhoc calculation as part of the analysis and to document
on the data in any desired product that do not allow adequate end user-oriented calculation flexibility.
Share
It defines which the system tools all the security requirements for understanding and, if multiple write connection
is needed, concurrent update location at an appropriated level, not all functions need the customer to write data
back, but for the increasing number which does, the system should be able to manage multiple updates in a timely,
secure manner.
Multidimensional
This is the basic requirement. OLAP system must provide a multidimensional conceptual view of the data, including
full support for hierarchies, as this is certainly the most logical method to analyze businesses and organizations.
OLAP Operations
Since OLAP servers are based on a multidimensional view of data, we will discuss OLAP operations in
multidimensional data.
Here is the list of OLAP operations −
1. Roll-up
2. Drill-down
3. Slice and dice
4. Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −
By climbing up a concept hierarchy for a dimension
By dimension reduction
The following diagram illustrates how roll-up works.
Drill-down is performed by stepping down a concept hierarchy for the dimension time.
Initially, the concept hierarchy was "day < month < quarter < year."
On drilling down, the time dimension descended from the level of the quarter to the level of the month.
When drill-down is performed, one or more dimensions from the data cube are added.
It navigates the data from less detailed data to highly detailed data.
Slice
The slice operation selects one particular dimension from a given cube and provides a new sub-cube. Consider the
following diagram that shows how a slice works.
Here Slice is performed for the dimension "time" using the criterion time = "Q1".
The dice operation on the cube based on the following selection criteria involves three dimensions.
(location = "Toronto" or "Vancouver")
(time = "Q1" or "Q2")
(item =" Mobile" or "Modem")
Pivot
The pivot operation is also known as rotation. It rotates the data axes in view to provide an alternative
presentation of data. Consider the following diagram that shows the pivot operation.
Here data can be made smooth by fitting it to a regression function. The regression used may be linear (having one
independent variable) or multiple (having multiple independent variables).
Regression is a technique that conforms data values to a function. Linear regression involves finding the “best” line
to fit two attributes (or variables) so that one attribute can be used to predict the other.
Outer detection:
This type of data mining technique refers to the observation of data items in the dataset which do not match an
expected pattern or expected behavior. This technique can be used in a variety of domains, such as intrusion,
detection, fraud or fault detection, etc. Outer detection is also called Outlier Analysis or Outlier mining.
Sequential Patterns:
This data mining technique helps to discover or identify similar patterns or trends in transaction data for a certain
period.
Prediction:
Where the end user can predict the most repeated things.
Information Retrieval (IR) can be defined as a software program that deals with the organization, storage, retrieval,
and evaluation of information from document repositories, particularly textual information.
An Information Retrieval (IR) model selects and ranks the document that is required by the user or the user has
asked for in the form of a query.
The software program deals with the Data retrieval deals with obtaining data from a database
organization, storage, retrieval, and evaluation management system such as ODBMS. It is A process of
of information from document repositories, identifying and retrieving the data from the database, based
particularly textual information. on the query provided by the user or application.
Small errors are likely to go unnoticed. A single error object means total failure.
END
BPMN Task
A logical unit of work that is carried out as a single whole
Resource
A person or a machine that can perform specific tasks
Activity -the performance of a task by a resource
Case
A sequence of activities performed to achieve some goal, an order, an insurance claim, a car assembly
Work item
The combination of a case and a task that is just to be carried out
Process
Describes how a particular category of cases shall be managed
Control flow construct ->sequence, selection, iteration, parallelisation
BPMN concepts
Events
Things that happen instantaneously (e.g. an invoice
Activities
Units of work that have a duration (e.g. an activity to
Process, events, and activities are logically related
Sequence
The most elementary form of relation is Sequence, which implies that one event or activity A is followed by
another event or activity B.
Start event
Circles used with a thin border
End event
Circles used with a thick border
Label
Give a name or label to each activity and event
Token
Once a process instance has been spawned/born, we use a token to identify the progress (or state) of that
instance.
Gateway
There is a gating mechanism that either allows or disallows the passage of tokens through the gateway
Split gateway
A point where the process flow diverges
Have one incoming sequence flow and multiple outgoing sequence flows (representing the branches that diverge)
Join gateway
A point where the process flow converges
Mutually exclusive
Only one of them can be true every time the XOR split is reached by a token
END
Toss immediate strategy: Frees the space occupied by a block as soon as the final tuple of that block has been
processed
Example: We can say if we have an employee table and have email, name, CNIC... Empid = 12 bytes, name = 59
bytes, CNIC = 15 bytes.... so all employee table columns are 230 bytes. Its means each row in the employee table
have of 230 bytes. So its means we can store around 2 rows in one block. For example, say your hard drive has a
block size of 4K, and you have a 4.5K file. This requires 8K to store on your hard drive (2 whole blocks), but only 4.5K
on a floppy (9 floppy-size blocks).
Example:
Architecture: The buffer manager stages pages from external storage to the main memory buffer pool. File and
index layers make calls to the buffer manager.
What is the steal approach in DBMS? What are the Buffer Manager Policies/Roles? Data
storage on disk?
Note: Buffer manager moves pages between the main memory buffer pool (volatile memory) from the external
storage disk (in non-volatile storage). When execution starts, the file and index layer make the call to the buffer
manager.
The steal approach is Used when the buffer manager replaces an existing page in the cache, that has been updated
by a transaction not yet committed, by another page requested by another transaction.
No-force. The force rule means that REDO will never be needed during recovery since any committed transaction
will have all its updates on disk before it is committed.
The deferred update ( NO-UNDO ) recovery scheme A no-steal approach. However, typical database systems
employ a steal/no-force strategy. The advantage of steel is that it avoids the need for very large buffer space to
Steal/No-Steal
Similarly, it would be easy to ensure atomicity with a no-steal policy. The no-steal policy states
that pages cannot be evicted from memory (and thus written to disk) until the transaction commits.
Need support for undo: removing the effects of an uncommitted transaction on the disk
Force/No Force
Durability can be a very simple property to ensure if we use a force policy. The force policy states
when a transaction executes, force all modified data pages to disk before the transaction commits.
Note: First-fit and best-fit better than worst-fit in terms of speed and storage utilization
Static and Dynamic Loading:
To load a process into the main memory is done by a loader. There are two different types of loading :
Static loading:- loading the entire program into a fixed address. It requires more memory space.
Dynamic loading:- The entire program and all data of a process must be in physical memory for the process to
execute. So, the size of a process is limited to the size of physical memory.
Methods Involved in Memory Management
There are various methods and with their help Memory Management can be done intelligently by the Operating
System:
Fragmentation
As processes are loaded and removed from memory, the free memory space is broken into little pieces. It happens
after sometimes that processes cannot be allocated to memory blocks considering their small size and memory
blocks remain unused. This problem is known as Fragmentation.
Fragmentation Category −
1. External fragmentation
Total memory space is enough to satisfy a request or to reside a process in it, but it is not contiguous, so it cannot
be used.
2. Internal fragmentation
The memory block assigned to the process is bigger. Some portion of memory is left unused, as it cannot be used
by another process.
Two types of fragmentation are possible
1. Horizontal fragmentation
2. Vertical Fragmentation
Reconstruction of Hybrid Fragmentation
The original relation in hybrid fragmentation is reconstructed by performing union and full outer join.
3. Hybrid fragmentation can be achieved by performing horizontal and vertical partitions together.
4. Mixed fragmentation is a group of rows and columns in relation.
Segmentation
Segmentation is a memory management technique in which each job is divided into several segments of different
sizes, one for each module that contains pieces that perform related functions. Each segment is a different logical
address space of the program or A segment is a logical unit.
As shown in the following diagram, the Intel 386 uses segmentation with paging for memory management with a
two-level paging scheme
Swapping
Swapping is a mechanism in which a process can be swapped temporarily out of the main memory (or move) to
secondary storage (disk) and make that memory available to other processes. At some later time, the system
swaps back the process from the secondary storage to the main memory.
Though performance is usually affected by the swapping process it helps in running multiple and big processes in
parallel and that's the reason Swapping is also known as a technique for memory compaction.
Note: Bring a page into memory only when it is needed. The same page may be brought into memory several times
Paging
A page is also a unit of data storage. A page is loaded into the processor from the main memory. A page is made up
of unit blocks or groups of blocks. Pages have fixed sizes, usually 2k or 4k. A page is also called a virtual page or
memory page. When the transfer of pages occurs between main memory and secondary memory it is known as
paging.
Paging is a memory management technique in which process address space is broken into blocks of the same size
called pages (size is the power of 2, between 512 bytes and 8192 bytes). The size of the process is measured in the
number of pages.
Divide logical memory into blocks of the same size called pages.
Similarly, main memory is divided into small fixed-sized blocks of (physical) memory called frames and the size of a
frame is kept the same as that of a page to have optimum utilization of the main memory and to avoid external
fragmentation.
Divide physical memory into fixed-sized blocks called frames (size is the power of 2, between 512 bytes and 8192
bytes)
The basic difference between the magnetic tape and magnetic disk is that magnetic tape is used for backups
whereas, the magnetic disk is used as secondary storage.
Hard disk stores information in the form of magnetic fields. Data is stored digitally in the form of tiny magnetized
regions on the platter where each region represents a bit.
Microsoft SQL Server databases are stored on disk in two files: a data file and a log file
Note: To run a program of size n pages, need to find n free frames and load the program
Implementation of Page Table
The page table is kept in the main memory
Page-table base register (PTBR) points to the page table
Page-table length register (PRLR) indicates the size of the page table
In this scheme, every data/instruction access requires two memory accesses. One for the page table and one for
the data/instruction.
The two memory access problems can be solved by the use of a special fast-lookup hardware cache called
associative memory or translation look-aside buffers (TLBs)
Collection of processes on the disk that are waiting to be brought into memory to run the program.
Binding of Instructions and Data to Memory
Address binding of instructions and data to memory addresses can
happen at three different stages
Compile time: If memory location knew a priori, absolute code can be generated; must recompile code if starting
location changes
Load time: Must generate relocatable code if memory location is not known at compile time
Execution time: Binding delayed until run time if the process can be moved during its execution from one memory
segment to another. Need hardware support for address maps (e.g., base and limit registers). Multistep Processing
of a User Program In memory is as follows:
The concept of a logical address space that is bound to separate physical address space is central to proper
memory management
Logical address – generated by the CPU; also referred to as virtual address
Physical address – address seen by the memory unit
Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical
(virtual) and physical addresses differ in the execution-time address-binding scheme
The user program deals with logical addresses; it never sees the real physical addresses
The logical address space of a process can be noncontiguous; the process is allocated physical memory whenever
the latter is available
END
Oracle Database creates server processes to handle the requests of user processes connected to an instance. A
server process can be either of the following: A dedicated server process, which services only one user process. A
shared server
process, which can service multiple user processes.
We can see the listener has the default name of "LISTENER" and is listening for TCP connections on port 1521.
The listener process is started when the server is started (or whenever the instance is started). The listener is only
required for connections from other servers, and the DBA performs the creation of the listener process. When a
new connection comes in over the network, the listener passes the connection to Oracle.
the production database and import it back into the production databases. Cloning can be done on a different host
or the same host even if it is different from the standby database.
Database Cloning can be done using the following methods,
Cold Cloning
Hot Cloning
RMAN Cloning
Oracle allocates logical database space for all data in a database. The units of database space allocation are data
blocks, extents, and segments.
The Relationships Among Segments, Extents, Data Blocks in the data file, Oracle block, and OS block:
Oracle Block: At the finest level of granularity, Oracle stores data in data blocks (also called logical blocks, Oracle
blocks, or pages). One data block corresponds to a specific number of bytes of physical database space on a disk.
Oracle Extent: The next level of logical database space is an extent. An extent is a specific number of contiguous
data blocks allocated for storing a specific type of information. It can be spared over two tablespaces.
Oracle Segment: The level of logical database storage greater than an extent is called a segment. A segment is a set
of extents, each of which has been allocated for a specific data structure and all of which are stored in the same
tablespace. For example, each table's data is stored in its data segment, while each index's data is stored in its index
segment. If the table or index is partitioned, each partition is stored in its segment.
Data block: Oracle manages the storage space in the data files of a database in units called data blocks. A data
block is the smallest unit of data used by a database.
Oracle block and data block are equal in data storage by logical and physical respectively like table's (logical) data is
stored in its data segment.
The high water mark is the boundary between used and unused space in a segment.
Operating system block: The data consisting of the data block in the data files are stored in operating system
blocks.
OS Page: The smallest unit of storage that can be atomically written
to non-volatile storage is called a page
Details of Data storage in Oracle Blocks:
An extent is a set of logically contiguous data blocks allocated for storing a specific type of information. In the
Figure above, the 24 KB extent has 12 data blocks, while the 72 KB extent has 36 data blocks.
A segment is a set of extents allocated for a specific database object, such as a table. For example, the data for the
employee's table is stored in its data segment, whereas each index for employees is stored in its index segment.
Every database object that consumes storage consists of a single segment.
A big file tablespace eases database administration because it consists of only one data file. The
a single data file can be up to 128TB (terabytes) in size if the tablespace block size is 32KB; if you
use the more common 8KB block size, 32TB is the maximum size of a big file tablespace.
Broad View of Logical and Physical Structure of Database System in Oracle.
Oracle Database must use logical space management to track and allocate the extents in a tablespace. When a
database object requires an extent, the database must have a method of finding and providing it. Similarly, when
an object no longer requires an extent, the database must have a method of making the free extent available.
Oracle Database manages space within a tablespace based on the type that you create.
SGA (System Global Area) is an area of memory (RAM) allocated when an Oracle Instance starts up. The SGA's size
and function are controlled by initialization (INIT.ORA or SPFILE) parameters.
In general, the SGA consists of the following subcomponents, as can be verified by querying the V$SGAINFO:
SELECT FROM v$sgainfo;
The common components are:
Data buffer cache - cache data and index blocks for faster access.
Shared pool - cache parsed SQL and PL/SQL statements.
Dictionary Cache - information about data dictionary objects.
Redo Log Buffer - committed transactions that are not yet written to the redo log files.
JAVA pool - caching parsed Java programs.
Streams pool - cache Oracle Streams objects.
Large pool - used for backups, UGAs, etc.
Automatic Shared Memory Management simplifies the configuration of the SGA and is the recommended
memory configuration. To use Automatic Shared Memory Management, set the SGA_TARGET initialization
parameter to a nonzero value and set the STATISTICS_LEVEL initialization parameter to TYPICAL or ALL. The value
of the SGA_TARGET parameter should be set to the amount of memory that you want to dedicate to the SGA. In
response to the workload on the system, the automatic SGA management distributes the memory appropriately
for the following memory pools:
1. Database buffer cache (default pool)
2. Shared pool
3. Large pool
4. Java pool
5. Streams pool
END
SCNs to mark the SCN before which all changes are known to be on disk so that recovery avoids applying
unnecessary redo. The database also uses SCNs to mark the point at which no redo exists for a set of data so that
recovery can stop.
SCNs occur in a monotonically increasing sequence. Oracle Database increments SCNs in the system global area
(SGA).
Image Backup/mirror backup
A full image backup, or mirror backup, is a replica of everything on your computer's hard drive, from the operating
system, boot information, apps, and hidden files to your preferences and settings. Imaging software not only
captures individual files but everything you need to get your system running again.
Backup sets are logical entities produced by the RMAN BACKUP command.
Image copies are exact byte-for-byte copies of files. RMAN prefers to use an image copy over a backup set
Restore Database backup by:
Flashback in Oracle is a set of tools that allow System Administrators and users to view and even manipulate the
past state of data without having to recover to a fixed point in time. Using the flashback command, we can pull a
table out of the recycle bin. The Flashback is complete; this way, we restore the table. At the physical level, Oracle
Flashback Database provides a more efficient data protection alternative to database point-in-time recovery
(DBPITR). If the current data files have unwanted changes, then you can use the RMAN command FLASHBACK
DATABASE to revert the data files to their contents at a past time.
Database Exports/Imports Data Pump Export the HR schema to a dump file named schema.DMP by issuing the
following command at the system command prompt:
EXPDP SYSTEM/PASSWORD SCHEMAS=HR DIRECTORY=DMPDIR DUMPFILE=SCHEMA.DMP
LOGFILE=EXPSCHEMA.LOG
IMPDP USER/PASSWORD@DB_NAME DIRECTORY=DATA_PUMP_DIR DUMPFILE=DUMP_NAME.DMP
SCHEMAS=EMR FROMUSER=MIS TOUSER=EMR
Cash recovery and Log-Based Recovery
The log is a sequence of records. The log of each transaction is maintained in some stable storage so that if any
failure occurs, then it can be recovered from there.
Checkpoint
The checkpoint is like a bookmark. While the execution of the transaction, such checkpoints are marked, and the
transaction is executed then using the steps of the transaction, the log files will be created.
Checkpoint declares a point before which all the logs are stored permanently in the storage disk and are in an
inconsistent state. In the case of crashes, the amount of work and time is saved as the system can restart from the
checkpoint. Checkpointing is a quick way to limit the number of logs to scan on recovery.
Store the LSN of the most recent checkpoint at a master record on a disk
System Catalog
A repository of information describing the data in the database (metadata, data about data)
Data Replication
Replication is the process of copying and maintaining database objects in multiple databases that make up a
distributed database system. Replication can improve the performance and protect the availability of applications
because alternate data access options exist.
Oracle provides its own set of tools to replicate Oracle and integrate it with other databases. In this post, you will
explore the tools provided by Oracle as well as open-source tools that can be used for Oracle database replication
by implementing custom code.
The catalog is needed to keep track of the location of each fragment & replica
Data replication techniques
Synchronous vs. asynchronous
Synchronous: all replicas are up-to-date
END
CPU, RAM, Heap Size, and Hard Disk Space Requirements for OMS
CPU, RAM, and Hard Disk Space Requirements for Standalone Management Agent
CPU, RAM, and Hard Disk Space Requirements for Management Repository
CPU, RAM, Heap Size, and Hard Disk Space Requirements for OMS
In this CPU, physical memory (RAM), heap size, and hard disk space requirements for installing an OMS
(including a Management Agent that comes with it).
Small Medium Large
(1 OMS, <=1000 (2 OMSes for <=10,000 (> 2 OMSes, >=10,000
targets, <100 agents) targets and <1000 agents) targets, >=1000 agents)
CPU Cores/Host 2 4 8
RAM 4 GB 6 GB 8 GB
RAM with ADPFoot 1 , 6GB 10 GB 14 GB
JVMDFoot 2
Oracle WebLogic Server 512 MB 1 GB 2 GB
JVM Heap Size
Hard Disk Space 7 GB 7 GB 7 GB
Hard Disk Space with ADP, 10 GB 12 GB 14 GB
JVMD
Note:
While installing an additional OMS (by cloning an existing one), if you have installed BI publisher on the source
host, then ensure that you have 7 GB of additional hard disk space on the destination host, so a total of 14 GB.
CPU, RAM, and Hard Disk Space Requirements for Standalone Management Agent
For a standalone Oracle Management Agent, ensure that you have 2 CPU cores per host, 512 BM of RAM, and 1 GB
of hard disk space.
CPU, RAM, and Hard Disk Space Requirements for Management Repository
describes the RAM and the hard disk space requirements for configuring a Management Repository:
In this table RAM and Hard Disk Space Requirements for Management Repository
The following table lists the hardware components that are required for Oracle Database on Windows x64.
Requirement Value
Virtual memory (swap) If physical memory is between 2 GB and 16 GB, then set
virtual memory to 1 times the size of the RAM
If physical memory is more than 16 GB, then set virtual
memory to 16 GB
SYSTEM_DRIVE:\
TEMP Program Oracle Data
Installation Type Space Files\Oracle\Inventory Home Files * Total
Check Task
Server Make and Architecture Confirm that server make, model, core architecture,
and host bus adaptors (HBA) or network interface
controllers (NICs) are supported to run with Oracle
Database and Oracle Grid Infrastructure.
Runlevel 3 or 5
Linux 4 GB 8 GB
UNIX 4 GB 8 GB
Windows 4 GB 8 GB
Bitmap Compression
Index-Value Compression
RLE Compression
zlib Compression
Index Files
The calculation for the space required to store the index files (essxxxxx.ind) uses the following factors:
Number of existing blocks (value DB from Table 238, Worksheet: List of Factors That Affect Disk Space
Requirements of a Database)
112 bytes—the size of an index entry
The minimum size for the index is 8,216,576 bytes (8 MB). To calculate the size of a database index,
including all index files, perform the following calculation:
number of existing blocks * 112 bytes = the size of database index
Type Size
Type Size
Maximum possible file size with 16 K sized 64 Gigabytes (GB) (4,194,304 * 16,384) = 64 gigabytes
blocks (GB)
2 KB 20,000
4 KB 40,000
8 KB 65,536
16 KB 65,536
Type Size
In this section, you will be installing the Oracle Database and creating an Oracle Home User account.
1. Expand the database folder that you extracted in the previous section. Double-click setup.
2. Click Yes in the User Account Control window to continue with the installation.
3. The Configure Security Updates window appears. Enter your email address and My Oracle Support
password to receive security issue notifications via email. If you do not wish to receive notifications via
email, deselect "I wish to receive security updates via My Oracle Support". Click Next to continue. Click
"Yes" in the confirmation window to confirm your preference.
4. The Download Software Updates window appears with the following options:
Select "Use My Oracle Support credentials for download" to download and apply the latest software
updates.
Select "Use pre-downloaded software updates" to apply software updates that you previously
downloaded.
5. The Select Installation Option window appears with the following options:
Select "Create and configure a database" to install the database, create database instance and configure
the database.
Select "Install database software only" to only install the database software.
Select "Upgrade an existing database" to upgrade the database that is already installed.
In this OBE, we create and configure the database. Select the Create and configure a database option
and click Next.
6. The System Class window appears. Select Desktop Class or Server Class depending on the type of
system you are using. In this OBE, we will perform the installation on a desktop/laptop. Select Desktop
class and click Next.
7. The Oracle Home User Selection window appears. Starting with Oracle Database 12c Release 1 (12.1),
Oracle Database on Microsoft Windows supports the use of an Oracle Home User, specified at the time
of installation. This Oracle Home User is used to run the Windows services for a Oracle Home, and is
similar to the Oracle User on Oracle Database on Linux. This user is associated with an Oracle Home and
cannot be changed to a different user post installation.
Note:
Different Oracle homes on a system can share the same Oracle Home User or use different Oracle Home
Users.
The Oracle Home User is different from an Oracle Installation User. The Oracle Installation User is the
user who requires administrative privileges to install Oracle products. The Oracle Home User is used to
run the Windows services for the Oracle Home.
If you select "Use Existing Windows User", the user credentials provided must be a standard Windows
user account (not an administrator).
If this is a single instance database installation, the user can be a local user, a domain user, or a managed
services account.
If this is an Oracle RAC database installation, the existing user must be a Windows domain user. The
Oracle installer will display an error if this user has administrator privileges.
If you select "Create New Windows User", the Oracle installer will create a new standard Windows user
account. This user will be assigned as the Oracle Home User. Please note that this user will not have
login privileges. This option is not available for an Oracle RAC Database installation.
If you select "Use Windows Built-in Account", the system uses the Windows Built-in account as the
Oracle Home User.
Select the Create New Windows User option. Enter the user name as OracleHomeUser1 and password
as Welcome1. Click Next.
Note: Remember the Windows User password. It will be required later to administer or manage
database services.
8. The Typical Install Configuration window appears. Click on a text field and then the balloon icon ( )to
know more about the field. Note that by default, the installer creates a container database along with a
pluggable database called "pdborcl". The pluggable database contains the sample HR schema. Change
the Global database name to orcl. Enter the "Administrative password" as Oracle_1. This password will
be used later to log into administrator accounts such as SYS and SYSTEM. Click Next.
9. The prerequisite checks are performed and a Summary window appears. Review the settings and click
Install.
Note: Depending on your firewall settings, you may need to grant permissions to allow java to access
the network.
12. After the Database Configuration Assistant creates the database, you can navigate to
https://localhost:5500/em as a SYS user to manage the database using Enterprise Manager Database
Express. You can click "Password Management..." to unlock accounts. Click OK to continue.
13. The Finish window appears. Click Close to exit the Oracle Universal Installer.
14. To verify the installation Navigate to C:\Windows\system32 using Windows Explorer. Double-click
services. The Services window appears, displaying a list of services.
There is no need to spend time on the GUI at the very beginning. Thus, the developer can directly
start with implementing the business logic.
This is the reason why Oracle APEX is feasible to create rapid GUI-Prototypes without logic. Thus,
prospective customers can get an idea of how their future application will look.
“APEX has allowed us to migrate several disparate Excel and MS Access applications to a
consistent, secure, web-based environment. The speed and concurrency offered by APEX have
been exceptionally valuable.”
Oracle APEX
Database-centric web application development framework
Distinguishing Characteristics
Apex history
APEX is a very powerful development tool, which is used to create web-based database-centric
applications. The tool itself consists of a schema in the database with a lot of tables, views, and
PL/SQL code. It's available for every edition of the database. The techniques that are used with
this tool are PL/SQL, HTML, CSS, and JavaScript.
Before APEX there was WebDB, which was based on the same techniques. WebDB became part
of Oracle Portal and disappeared in silence. The difference between APEX and WebDB is that
WebDB generates packages that generate the HTML pages, while APEX generates the HTML
pages at runtime from the repository. Despite this approach APEX is amazingly fast.
Because the database is doing all the hard work, the architecture is fairly simple. We only have
to add a web server. We can choose one of the following web servers:
Oracle HTTP Server (OHS)
Embedded PL/SQL Gateway (EPG)
APEX Listener
APEX became available to the public in 2004 and then it was part of version 10g of the
database. At that time it was called HTMLDB and the first version was 1.5. Before HTMLDB, it
was called Oracle Flows, Oracle Platform, and Project Marvel.
About Installing the Oracle Application Express Release Included with the Oracle Database
Learn about the Oracle Application Express release included with Oracle Database releases.
Note:
Starting with Oracle Database 12c Release 2 (12.2), Oracle Application Express is included in the
Oracle Home on disk and is no longer installed by default in the database.
Oracle Application Express is included with the following Oracle Database releases:
Oracle Database 19c - Oracle Application Express Release 18.1.
Oracle Database 18c - Oracle Application Express Release 5.1.
Oracle Database 12c Release 2 (12.2)- Oracle Application Express Release 5.0.
Oracle Database 12c Release 1 (12.1) - Oracle Application Express Release 4.2.
Oracle Database 11g Release 2 (11.2) - Oracle Application Express Release 3.2.
Oracle Database 11g Release 1 (11.1) - Oracle Application Express Release 3.0.
The Oracle Database releases less frequently than Oracle Application Express. Therefore, Oracle
recommends updating to the latest Oracle Application Express release available on Oracle
Technology Network. To learn more, see the installation instructions for the appropriate Web
Listener in your environment.
Within each application, you can also specify a Compatibility Mode in the Application
Definition. The Compatibility Mode attribute controls the compatibility mode of the Application
Express runtime engine. Certain runtime behaviors change from release to release. You can use
this attribute to obtain specific application behavior. Compatibility Mode options include Pre
4.1, 4.1, 4.2, 5.0, 5.1/18.1, 18.2, 19.1, and 19.2.
Version 22.1
This release of Oracle APEX introduces Approvals and the Unified Task List, Simplified Create Page
wizards, Readable Application Export formats, and Data Generator. APEX 22.1 also brings several
enhancements existing components, such as tokenized row search, an easy way to sort regions,
improvements to faceted search, additional customization of the PWA service worker, a more
streamlined developer experience, and much more!
Version 21.2
Released November 2021
This release of Oracle APEX introduces Smart Filters, Progressive Web Apps, and REST Service Catalogs.
APEX 21.2 also brings greater UI flexibility with Universal Theme, new and updated page components,
numerous improvements to the developer experience, and a whole lot more!
Version 21.1
Released May 12, 2021
This release of Oracle APEX introduces a number of exciting new features and enhancements to help
deliver a functionally rich and modern user experience.
Maps Region, New Application Data Loading
Version 20.2
Released October 21, 2020
Automations
Faceted Search Enhancements, Report Printing, REST Data Source Connector Plug-Ins, REST Data Source
Synchronization
Redwood UI
Universal Theme now supports a new Redwood Light theme style, available via Theme Roller. Refresh
your existing apps to uptake the latest version of Universal Theme and this new theme style.
Version 20.1
Released April 23, 2020
APEX + Redwood
The user interface of APEX and the App Builder has been refreshed to align with Redwood, Oracle's new
user experience design system. The new design and color scheme extends across the full developer
experience and provides refreshing new visuals. The appearance of APEX can now automatically switch
between the dark and light appearance modes based on your OS or platform setting, enabling APEX to
seamlessly integrate with your workflow.
Faceted Search Enhancements
Friendly URLs
The URL syntax for APEX apps has been simplified to allow for friendlier URLs at runtime. The new
syntax provides a Search Engine Optimization (SEO)-friendly URL structure which is far easier to
understand and provides immediate context as to where you are within an app. The URL no longer
features application or page numbers, and instead, uses the workspace path prefix, application and page
aliases, and standard web parameter syntax for its URL structure.
Native PDF Printing
You can now print PDF files directly from Interactive Grids. This feature produces a PDF file which
includes formatting options such as highlighting, column grouping, and column breaks.
Mega Menus it was also this version feature.
Version 19.1
Released March 29, 2019
This is the first release 2019 and includes a number of new features, bug fixes, and general
improvements.
Dark Mode
The development environment of APEX can now be rendered with a darker color scheme, which reduces
eye strain and is especially helpful for those who are developing late into the night.
Create App from File
The data upload functionality in SQL Workshop and Create Application From a File have been
modernized with a new drag & drop user interface and support for native Excel, CSV, XML and JSON
documents. A new public data loading PL/SQL API is also available.
REST Enabled Forms
The built-in support for REST Enabled SQL and Web Sources has been extended to Form regions,
allowing read and write access to remote data sources.
Especially now Oracle has pointed out APEX as one of the important tools for building
applications in their Oracle Database Cloud Service, this interest will only grow. APEX shared a
lot of the characteristics of cloud computing, even before cloud computing became popular.
These characteristics include:
Elasticity
Browser-based development and runtime
RESTful web services (REST stands for Representational State Transfer)
App builder utility and locking Page/Workspace/App Screen Shot
Workspace utility
Application Components
Supporting objects
Utility components
Remote development
Like below:
These include OAuth client, APEX User, Database Schema User, and OS User. While it is important
to ensure your ORDS web services are secured, you also need to consider what a client has access
to once authenticate. As a quick reminder, authentication confirms your identity and allows you
into the system, authorization decides what you can do once you are in.
Oracle REST Data Services is a Java EE-based alternative for Oracle HTTP Server and mod_plsql.
The Java EE implementation offers increased functionality including a command-line based
configuration, enhanced security, file caching, and RESTful web services.
Oracle REST Data Services also provides increased flexibility by supporting deployments using
Oracle WebLogic Server, GlassFish Server, Apache Tomcat, and a standalone mode.
The Oracle Application Express architecture requires some form of the webserver to proxy
requests between a web browser and the Oracle Application Express engine. Oracle REST Data
Services satisfies this need but its use goes beyond that of Oracle Application Express
configurations.
Oracle REST Data Services simplifies the deployment process because there is no Oracle home
required, as connectivity is provided using an embedded JDBC driver.
Oracle REST Data Services is a Java Enterprise Edition (Java EE) based data service that provides
enhanced security, file caching features, and RESTful Web Services. Oracle REST Data Services
also increases flexibility through support for deployment in standalone mode, as well as using
servers like Oracle WebLogic Server and Apache Tomcat.
ORDS
ORDS, a Java-based application, enables developers with SQL and database skills to
develop REST APIs for Oracle Database. You can deploy ORDS on web and application
servers, including WebLogic®, Tomcat®, and Glassfish®, as shown in the following
image:
ORDS is our middle tier JAVA application that allows you to access your Oracle Database
resources via REST APIs. Use standard HTTP(s) calls (GET|POST|PUT|DELETE) via URIs that ORDS
makes available (/ords/database123/user3/module5/something/)
ORDS will route your request to the appropriate database, and call the appropriate query or
PL/SQL anonymous block), and return the output and HTTP codes.
For most calls, that’s going to be the results of a SQL statement – paginated and formatted as
JSON.
In order for an Instance administrator to assign most Oracle default schemas to workspaces, a DBA
must explicitly grant the privilege.
When Oracle Application Express installs, the Instance administrator does not have the ability to
assign Oracle default schemas to workspaces. Default schemas such as SYS, SYSTEM, and RMAN are
reserved by Oracle for various product features and for internal use. Access to a default schema can
be a very powerful privilege. For example, a workspace with access to the default
schema SYSTEM can run applications that parse as the SYSTEM user.
In order for an Instance administrator to have the ability to assign most Oracle default schemas to
workspaces, the DBA must explicitly grant the privilege using SQL*Plus to run a procedure within
the APEX_INSTANCE_ADMIN package.
DBAs can grant an Instance administrator the ability to assign Oracle schemas to workspaces.
A DBA grants an Instance administrator the ability to assign Oracle schemas to workspaces by using
SQL*Plus to run the APEX_INSTANCE_ADMIN.UNRESTRICT_SCHEMA procedure from within the
Application Express engine schema. For example:
COMMIT;
A DBA revokes the privilege to assign default schemas using SQL*Plus to run the
APEX_INSTANCE_ADMIN.RESTRICT_SCHEMA procedure from within the Application Express engine
schema. For example:
COMMIT;
This example would prevent the Instance administrator from assigning the RMAN schema to any
workspace. It does not, however, prevent workspaces that have already had the RMAN schema
assigned to them from using the RMAN schema.
The DBA can grant an Oracle Application Express administrator the ability to assign Oracle default
schemas to workspaces by using SQL*Plus to run the
COMMIT;
This example would enable the Oracle Application Express administrator to assign the SYSTEM
schema to any workspace.
The DBA can revoke this privilege using SQL*Plus to run the
APEX_SITE_ADMIN_PRIVS.RESTRICT_SCHEMA procedure from within the Application Express
engine schema. For example:
COMMIT;
This example would display the text of a query that dumps the tables that defines the schema and
workspace restrictions.
Identifies Application
APEX_APPLICATION_COMPUTATIONS Computations which can run for APEX_APPLICATIONS
every page or on login
Identifies attributes of an
APEX_APPLICATION_PAGE_IR APEX_APPLICATION_PAGE_REGIONS
interactive report
Identifies subscriptions
APEX_APPLICATION_PAGE_IR_SUB scheduled in saved reports for an APEX_APPLICATION_PAGE_IR_RPT
interactive report
Developer comments of an
APEX_APPL_DEVELOPER_COMMENTS APEX_APPLICATIONS
application
Identifies a collection of
APEX_APPL_LOAD_TABLE_RULES transformation rules that are to APEX_APPLICATIONS
be used on the load tables.
Setup (Download both software having equal version and paste unzip files at same location in
directory)
Installation
Embedded PL/SQL Gateway (EPG) Configuration
Oracle REST Data Services (ORDS) Configuration
Oracle HTTP Server (OHS) Configuration
Network ACLs
Step One
Create a new tablespace to act as the default tablespace for APEX.
-- For Oracle Managed Files (OMF).
CREATE TABLESPACE apex DATAFILE SIZE 100M AUTOEXTEND ON NEXT 1M;
-- For non-OMF.
CREATE TABLESPACE apex DATAFILE '/path/to/datafiles/apex01.dbf' SIZE 100M AUTOEXTEND
ON NEXT 1M;
CREATE TABLESPACE lmtbsb DATAFILE '/u02/oracle/data/lmtbsb01.dbf' SIZE 50M
EXTENT MANAGEMENT LOCAL AUTOALLOCATE;
Tablespaces allocate space in extents. Tablespaces can use two different methods to keep track of
their free and used space:
When you create a tablespace, you choose one of these methods of space management. Later, you
can change the management method with the DBMS_SPACE_ADMIN PL/SQL package.
Step two
Installation
Change directory to the directory holding the unzipped APEX software.
$ cd /home/oracle/apex
In this directory there are 3 important files:
apexins.sql – install apex in database
apxchpwd.sql – change password for main apex user ADMIN
apex_rest_config.sql – configures ords in database
Step three
IF: Connect to SQL*Plus as the SYS user and run the "apexins.sql" script, specifying the relevant
tablespace names and image URL.
SQL> CONN sys@pdb1 AS SYSDBA
SQL> -- @apexins.sql tablespace_apex tablespace_files tablespace_temp images
SQL> @apexins.sql APEX APEX TEMP /i/
Or Else
Logon to database as SYSDBA and switch to pluggable database orclpdb1 and run installation
script. You can install apex on dedicated tablespaces if required.
sqlplus / as sysdba
alter session set container=orclpdb1;
@apexins.sql SYSAUX SYSAUX TEMP /i/
(Description of the command:
@apexins.sql tablespace_apex tablespace_files tablespace_temp images
APEX_UTIL.create_user(
p_user_name => 'ADMIN',
p_email_address => 'me@example.com',
p_web_password => 'PutPasswordHere',
p_developer_privs => 'ADMIN' );
APEX_UTIL.set_security_group_id( null );
COMMIT;
END;
/
Note:
Oracle Application Express is installed in the APEX_210200 schema.
The structure of the link to the Application Express
administration services is as follows:
http://host:port/ords/apex_admin
The structure of the link to the Application Express
development interface is as follows:
http://host:port/ords
Or
When Oracle Application Express installs, it creates three new database accounts all with status
LOCKED in database:
APEX_210200– The account that owns the Oracle Application Express schema and metadata.
FLOWS_FILES – The account that owns the Oracle Application Express uploaded files.
APEX_PUBLIC_USER – The minimally privileged account is used for Oracle Application Express
configuration with ORDS.
Create and change password for ADMIN account. When prompted enter a password for the
ADMIN account.
sqlplus / as sysdba
alter session set container=orclpdb1;
@apxchpwd.sql
output
SQL> @apxchpwd.sql
This script can be used to change the password of an Application Express
instance administrator. If the user does not yet exist, a user record will be
created.
Enter the administrator's username [ADMIN]
User "ADMIN" does not yet exist and will be created.
Enter ADMIN's email [ADMIN]
Enter ADMIN's password []
Created instance administrator ADMIN.
Step Five
Create the APEX_LISTENER and APEX_REST_PUBLIC_USER users by running the
"apex_rest_config.sql" script.
SQL> CONN sys@pdb1 AS SYSDBA
SQL> @apex_rest_config.sql
Configure RESTful Services. When prompted enter a password for the APEX_LISTENER,
APEX_REST_PUBLIC_USER account.
sqlplus / as sysdba
alter session set container=orclpdb1;
@apex_rest_config.sql
output
SQL> @apex_rest_config.sql
Enter a password for the APEX_LISTENER user []
Enter a password for the APEX_REST_PUBLIC_USER user []
...set_appun.sql
...setting session environment
...create APEX_LISTENER and APEX_REST_PUBLIC_USER users
...grants for APEX_LISTENER and ORDS_METADATA user
as last step you can modify again passwords for 3 users:
ALTER USER apex_public_user IDENTIFIED BY Dbaora$ ACCOUNT UNLOCK;
ALTER USER apex_listener IDENTIFIED BY Dbaora$ ACCOUNT UNLOCK;
ALTER USER apex_rest_public_user IDENTIFIED BY Dbaora$ ACCOUNT UNLOCK;
Install and configure
You can install and configure APEX and ORDS by using the following methods:
For this post, I chose the first option, which Oracle recommends: Install APEX and ORDS and
configure ORDS.
Step Six
Now you need to decide which gateway to use to access APEX. The Oracle recommendation is
ORDS.
Note: Oracle REST Data Services (ORDS), formerly known as the APEX Listener, allows APEX
applications to be deployed without the use of Oracle HTTP Server (OHS) and mod_plsql or the
Embedded PL/SQL Gateway. ORDS version 3.0 onward also includes JSON API support to work
in conjunction with the JSON support in the database. ORDS can be deployed on WebLogic,
Tomcat or run in standalone mode. This article describes the installation of ORDS on Tomcat 8
and 9.
For Lone-PDB installations (a CDB with one PDB), or for CDBs with small numbers of PDBs, ORDS
can be installed directly into the PDB.
If you are using many PDBs per CDB, you may prefer to install ORDS into the CDB to allow all
PDBs to share the same connection pool.
Create directory /home/oracle/ords for ords software and unzip it
mkdir /home/oracle/ords
cp ords-21.4.2.062.1806.zip /home/oracle/ords
cd /home/oracle/ords
unzip ords-21.4.2.062.1806.zip
Create configuration directory /home/oracle/ords/conf for ords standalone
mkdir /home/oracle/ords/conf
Run ords first time you are asked for:
directory to save configuration: /home/oracle/ords/conf
password for ORDS_PUBLIC_USER(be created): Dbaora$
administrator user: SYS
password for SYS AS SYSDBA: !!! you must know it from your DBA !!!
use PL/SQL Gateway or not: 1 for yes
password for APEX_PUBLIC_USER: Dbaora$
password for APEX_LISTENER: Dbaora$
feature to enable: 1 for SQL Developer Web (Enables all features)
wish to start in standalone mode: 1 for standalone mode
[oracle@oel8 ords]$ java -jar ords.war
This Oracle REST Data Services instance has not yet been configured.
Please complete the following prompts
Enter the location to store configuration data: /home/oracle/ords/conf
Enter the database password for ORDS_PUBLIC_USER:
Confirm password:
Requires to login with administrator privileges to verify Oracle REST Data Services schema.
Change the password and unlock the APEX_PUBLIC_USER account. This will be used for any
Database Access Descriptors (DADs).
SQL> ALTER USER APEX_PUBLIC_USER IDENTIFIED BY myPassword ACCOUNT UNLOCK;
Step Seven
Unlock the ANONYMOUS account.
SQL> CONN sys@cdb1 AS SYSDBA
DECLARE
l_passwd VARCHAR2(40);
BEGIN
l_passwd := DBMS_RANDOM.string('a',10) || DBMS_RANDOM.string('x',10) || '1#';
-- Remove CONTAINER=ALL for non-CDB environments.
EXECUTE IMMEDIATE 'ALTER USER anonymous IDENTIFIED BY ' || l_passwd || ' ACCOUNT
UNLOCK CONTAINER=ALL';
END;
/
Check the port setting for XML DB Protocol Server.
Workspace administrators have all the rights and privileges available to developer and manage
administrator tasks specific to a workspace.
In Oracle Application Express, users sign in to a shared work area called a workspace. A workspace
enables multiple users to work within the same Oracle Application Express installation while keeping
their objects, data and applications private. This flexible architecture enables a single database
instance to manage thousands of applications.
Within a workspace, End users can only run existing database or Websheet
application. Developers can create and edit applications, monitor workspace activity, and view
dashboards. Oracle Application Express includes two administrator roles:
Workspace administrators can reset passwords, view product and environment information, manage
the Export repository, manage saved interactive reports, view the workspace summary report, and
manage Websheet database objects. Additionally, workspace administrators manage service
requests, configure workspace preferences, manage user accounts, monitor workspace activity, and
view log files.
Application Builder, you must find a balance between two dramatically different development
methodologies:
Iterative, rapid application development
Planned, linear style development
The first approach offers so much flexibility that you run the risk of never completing your
project. In contrast, the second approach can yield applications that do not meet the needs of
end users even if they meet the stated requirements on paper.
System Development Life Cycle Methodologies to Consider
The system development life cycle (SDLC) is the overall process of developing software using a
series of defined steps. There are several SDLC models that work well for developing applications
in Oracle Application Express.
can be established in advance. Unfortunately, requirements often change and evolve during the
development process.
The Oracle Application Express development environment enables developers to take a more
iterative approach to development. Unlike many other development environments, creating
prototypes is easy. With Oracle Application Express, developers can:
Use built-in wizards to quickly design an application user interface
Make prototypes available to users and gather feedback
Implement changes in real time, creating new prototypes instantly
Other methodologies that work well with Oracle Application Express include:
Spiral - This approach is actually a series of short waterfall cycles. Each waterfall cycle yields new
requirements and enables the development team to create a robust series of prototypes.
Rapid application development (RAD) life cycle - This approach has a heavy emphasis on creating
a prototype that closely resembles the final product. The prototype is an essential part of the
requirements phase. One disadvantage of this model is that the emphasis on creating the
prototype can cause scope creep; developers can lose sight of their initial goals in the attempt to
create the perfect application.
Apex Development
Use a different workspace and different schema. Export and then import the application into a
different workspace and install it so that it uses a different schema. This new schema needs to
have the database objects required by your application. See "Using the Database Object
Dependencies Report".
Use a different database with all its variations. Export and then import the application into a
different Oracle Application Express instance and install it using a different workspace, schema,
and database.
Migration of Applications
Now let's review each component in the upload forms to determine proper regions to use in the
APEX Application. Also, let's review the Triggers and Program Units in order to identify the
business logic in your Forms Application and determine if it will need to be replicated or not.
Oracle Forms applications still play a vital role, but many are looking for ways to modernize their
applications. Modernize your Oracle Forms applications by migrating them to Oracle Application
Express (Oracle APEX) in the cloud.
Your stored procedures and PL/SQL packages work natively in Oracle APEX, making it the clear
platform of choice for easily transitioning Oracle Forms applications to modern web applications
with more capabilities, less complexity, and lower development and maintenance costs.
Oracle APEX is a low-code development platform that enables you to build scalable, secure
enterprise apps, with world-class features, that you can deploy anywhere. You can quickly
develop and deploy compelling apps that solve real problems and provide immediate value. You
won't need to be an expert in a vast array of technologies to deliver sophisticated solutions.
Architecture
This architecture shows the process of migrating on-premises Oracle Forms applications
to Oracle Application Express (APEX) applications with the help of an XML converter,
and then moving them to the cloud.The following diagram illustrates this reference
architecture.
Recommendations
Use the following recommendations as a starting point to plan your migration to Oracle Application
Express.Your requirements might differ from the architecture described here.
VCN
When you create a VCN, determine how many IP addresses your cloud resources
in each subnet require. Using Classless Inter-Domain Routing (CIDR) notation,
specify a subnet mask and a network address range large enough for the
required IP addresses. Use CIDR blocks that are within the standard private IP
address space.
After you create a VCN, you can change, add, and remove its CIDR blocks.
When you design the subnets, consider functionality and security requirements.
All compute instances within the same tier or role should go into the same
subnet.
Use regional subnets.
Security lists
Use security lists to define ingress and egress rules that apply to the entire
subnet.
Cloud Guard
Clone and customize the default recipes provided by Oracle to create custom
detector and responder recipes. These recipes enable you to specify what type of
security violations generate a warning and what actions are allowed to be
performed on them. For example, you might want to detect Object Storage
buckets that have visibility set to public.
Apply Cloud Guard at the tenancy level to cover the broadest scope and to
reduce the administrative burden of maintaining multiple configurations.
You can also use the Managed List feature to apply certain configurations to
detectors.
Security Zones
For resources that require maximum security, Oracle recommends that you use
security zones. A security zone is a compartment associated with an Oracle-
defined recipe of security policies that are based on best practices. For example,
the resources in a security zone must not be accessible from the public internet
and they must be encrypted using customer-managed keys. When you create
and update resources in a security zone, Oracle Cloud Infrastructure validates the
operations against the policies in the security-zone recipe, and denies operations
that violate any of the policies.
Schema
Retain the database structure that Oracle Forms was built on, as is, and use that
as the schema for Oracle APEX.
Business Logic
Most of the business logic for Oracle Forms is in triggers, program units, and
events. Before starting the migration of Oracle Forms to Oracle APEX, migrate the
business logic to stored procedures, functions, and packages in the database.
Considerations
Consider the following key items when migrating Oracle Forms Object navigator components to Oracle
Application Express (APEX):
Data Blocks
A data block from Oracle Forms relates to Oracle APEX with each page broken up
into several regions and components. Review the Oracle APEX Component
Templates available in the Universal Theme.
Triggers
Most messages in Oracle APEX are generated when you submit a page.
Attached Libraries
Oracle APEX takes care of the JavaScript and CSS libraries that support the
Universal Theme, which supports all of the components that you need for flexible,
dynamic applications. You can include your own JavaScript and CSS in several
ways, mostly through page attributes. You can choose to add inline code as
reference files that exist either in the database as a BLOB (#APP_IMAGES#) or sit
on the middle tier, typically served by Oracle REST Data Services (ORDS). When a
reference file is on an Oracle WebLogic Server, the file location is prefixed with
#IMAGE_PREFIX#.
Editors
Oracle APEX has a text area and a rich text editor, which is equivalent to Editors in
Oracle Forms.
In APEX, the LOV is coupled with the Item type. A radio group works well with a
small handful of values. Select Lists for middle-sized sets, and select Popup LOV
for large data sets. You can use the queries from Record Group in Oracle Forms
for the LOV query in Oracle APEX. LOV's in Oracle APEX can be dynamically
driven by a SQL query, or be statically defined. A static definition allows a variety
of conditions to be applied to each entry. These LOVs can then be associated
with Items such as Radio Groups & Select Lists, or with a column in a report, to
translate a code to a label.
Parameters
Page Items in Oracle APEX are populated between pages to pass information to
the next page, such as the selected record in a report. Larger forms with a
number of items are generally submitted as a whole, where the page process
handles the data, and branches to the next page. These values can be protected
from URL tampering by session state security, at item, page, and application
levels, often by default.
Popup Menus
Popup Menus are not available out of the box in Oracle APEX, but you can build
them by using Lists and associating a button with the menu.
Program Units
Migrate the Stored procedures and functions defined in program units in Oracle
Forms into Database Stored Procedures/Functions and use Database Stored
procedures/functions in Oracle APEX processes/validations/computations.
Property Classes
Classic Reports are simple reports that don't provide runtime manipulation
options, but are based on SQL.
Menus
Oracle Forms have specific menu files, controlled by database roles. Updating
the .mmx file required that there be no active users. The menu in Oracle APEX can
either be across the top, or down the left side. These menus can be statically
defined, or dynamically driven. Static navigation entries can be controlled by
authorization schemes, or custom conditions. Dynamic menus can have security
tables integrated within the SQL.
Properties
The Application Express engine uses two logs to track user activity. At any given time,
one log is designated as current. For each rendered page view, the Application Express
engine inserts one row into the log file. A log switch occurs at the interval listed on the
Page View Activity Logs page. At that point, the Application Express engine removes all
entries in the noncurrent log and designates it as current.
Delete SQL Workshop log entries. The SQL Workshop maintains a history of SQL
statements run in the SQL Commands.
For Delete entries this number of days and older - Select the number of
days.
Click one of the following:
Truncate Log - To delete all entries, click .
Delete Entries - To delete entries by age, specify the age of the
entries to be deleted and click .
Workspace administrators are users who perform administrator tasks specific to a workspace and have
the access to various types of activity reports.
Page Views - Contains reports of page views organized by view, user, application, application
and page, day, hour, and by interactive report.
Developer Activity - Offers reports of developer activity organized by developer, day,
application, application changes, and day or month.
Page View Analysis - Contains reports analyzing page views, such most viewed pages, page
views by day, usage by day (chart), weighted page performance, and Websheet page views.
Sessions - Lists active sessions with the current workspace (report or chart).
Login Attempts - Offers reports listing login attempts, login attempts by authentication result,
and a developer login summary.
Environment - Contains reports of environments organized by user agent, browser, external
clicks, or operating system.
Application Errors - Contains a report of application errors.
Workspace Schema Reports - Offers summaries of schema tablespace utilization and database
privileges by schema, workspace schemas, and report tablespace utilization.
Instance administrators are superusers that manage an entire hosted instance using the Application
Express Administration Services application.
Page Views - View activity by application, user, workspace, day, or REST access.
Workspace Purge - View a dashboard summary, inactive workspaces, workspaces purged,
workspaces that became active, or a workspace purge log.
Environment Reports - View a summary of used operating systems, browser types, user agent,
or external sites.
Calendar Reports - View workspaces by date last used, page views by day and then by
application and user, or by hour.
Service Requests - View new service requests or sign up survey activity.
Logs - View the mail log, jobs log, automatic delete log, or monitor productivity or sample
application installations.
Login Attempts - View login attempts or developer last login.
Developer Activity - View application changes by developer or workspace.
Recovering an Oracle tablespace (APEX) from file system backups of database tablespace files
Clarification
I am assuming you only lost the APEX tablespace but your database is currently functioning.
If this is the case, and assuming your APEX tablespace does not span multiple datafiles, you
can attempt to swap out the datafile. Please force a backup in rman before trying any of this.
There are a few different options here. All you really need are the following
1. Datafile
2. Control file
3. Archive / redologs (if you want to move forward or backward in time)
I'm going to outline two options because I don't have all the pertinent information. The first
option attempts to actually restore the datafiles through rman, the second one simply swaps
it out. The first is obviously preferential but may not be achievable.
RMAN Restore
First set the following parameter in your init.ora file
_allow_resetlogs_corruption=TRUE
Move your entire oradata backup directory to /tmp/oradata. Locate then location of your dbf and ctl files in
that directory.
Then run rman target / from bash terminal. In rman run the following.
spool '/tmp/spool.out'
select value from v$parameter where name = 'db_create_file_dest';
select tablespace name from dba_data_files;
View the spool.out file and
select file_name, status from dba_data_files WHERE tablespace name = < name >
You want your your datafile to be available. Then you want to set the tablespace to read only and take it
offline
The APEX option uses storage on the DB instance class for your DB instance.
Following are the supported versions and approximate storage requirements for Oracle
APEX.
SELECT PATCH_VERSION,
PATCH_NUMBER
FROM APEX_PATCHES;
Custom Authentication
Creating a Custom Authentication scheme from scratch to have complete control over your
authentication interface.
Database Accounts
Database Account Credentials authentication utilizes database schema accounts to authenticate
users.
LDAP Directory
Authenticate a user and password with an authentication request to a LDAP server.
SAML Sign-In
Delegates authentication to the Security Assertion Markup Language (SAML) Sign In
authentication scheme.
Social Sign-In
Social Sign-In supports authentication with Google, Facebook, and other social network that
supports OpenID Connect or OAuth2 standards.
When you create an authorization scheme you select an authorization scheme type. The
authorization scheme type determines how an authorization scheme is applied. Developers can
create new authorization type plug-ins to extend this list.
Exists SQL Query Enter a query that causes the authorization scheme to pass if it
returns at least one row and causes the scheme to fail if it returns
no rows
NOT Exists SQL Query Enter a query that causes the authorization scheme to pass if it
returns no rows and causes the scheme to fail if it returns one or
more rows
PL/SQL Function Returning Enter a function body. If the function returns true, the authorization
Boolean succeeds.
Item in Expression 1 is NULL Enter an item name. If the item is null, the authorization succeeds.
Item in Expression1 is NOT NULL Enter an item name. If the item is not null, the authorization
succeeds.
Value of Item in Expression 1 Enter and item name and value.The authorization succeeds if the
Equals Expression 2 item's value equals the authorization value.
Value of Item in Expression 1 Enter an item name and a value. The authorization succeeds if the
Does NOT Equal Expression 2 item's value is not equal to the authorization value.
Value of Preference in Expression Enter an preference name and a value. The authorization
1 Does NOT Equal Expression 2 succeeds if the preference's value is not equal to the authorization
value.
Value of Preference in Expression Enter an preference name and a value. The authorization
1 Equals Expression 2 succeeds if the preference's value equal the authorization value.
Is Not In Group Enter a group name. The authorization succeeds if the group is
not enabled as a dynamic group for the session.
1. Run the APEX installation script against the target database. The same script is used for new
installations and upgrades. The script automatically senses whether there is a version of APEX
present and automatically takes the appropriate action.
2. Update the existing version of the /i/ virtual directory with the images, javascript, css, etc. with the
current versions APEX installation medium.
3. For the standard HTTP Server installations, this is just a simple copy command.
4. For the Embedded PL/SQL Gateway (EPG), the script apxldimg.sql is used to load the images
into the database.
5. For the APEX Listener / Oracle REST Data Services (ORDS), recreate the i.jar file that contains
the references to the images, javascript, css, etc. from the APEX installation media OR copy the
new versions of the files to the existing location referenced by the current APEX Listener / ORDS /
web server.
Prior to the Application Express (APEX) upgrade, begin by identifying the version of the APEX
currently installed and the database prerequisites. To do this run the following query in SQLPLUS
as SYS or SYSTEM:
Where <SCHEMA> represents the current version of APEX and is one of the following:
For APEX (HTML DB) versions 1.5 - 3.1, the schema name is: FLOWS_XXXXXX.
For example: FLOWS_010500 for HTML DB version 1.5.x
For APEX (HTML DB) versions 3.2.x and above, the schema name is: APEX_XXXXXX.
For example: APEX_210100 for APEX version 21.1.
If the query returns 0, it is a runtime only installation, and apxrtins.sql should be used for the
upgrade. If the query returns 1, this is a development install and apexins.sql should be used.
The full download is needed if the first two digits of the APEX version are different. For example,
the full Application Express download is needed to go from 20.0 to 21.1. See <Note 752705.1>
ORA-1435: User Does not Exist" When Upgrading APEX Using apxpatch.sql: for more information.
The patch is needed if only the third digit of the version changes. So when upgrading from from
21.1.0 patch to upgrade to 21.1.2.
===========================END=========================