Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

ADVANCED

DATABASE CONCEPTS
Submission 2
Table of Contents
Optimisation and monitoring of centralised CCP database .......................................................................................... 2
Database architecture .................................................................................................................................................... 6
Distributed Database System (DDS) .......................................................................................................................... 6
DDS query processing ............................................................................................................................................ 9
DDS Indexing ...................................................................................................................................................... 11
DDS Security ........................................................................................................................................................ 11
DDS Backup and Recovery .................................................................................................................................. 12
SQL Statements............................................................................................................................................................ 13
Query 1 ..................................................................................................................................................................... 14
Query 2 ..................................................................................................................................................................... 15
Query 3 ..................................................................................................................................................................... 16
Query 4 ..................................................................................................................................................................... 17
Query 5 ..................................................................................................................................................................... 18
Query 6 ..................................................................................................................................................................... 19
Bibliography................................................................................................................................................................. 20
Appendix ...................................................................................................................................................................... 21

Page 1 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Crazy Cat Pubs Database

Optimisation and monitoring of centralised CCP database

The CCP database is potentially a huge inter-organisational database. In Figure 1 we can observe that in the
DBMS environment are more than one element involved and is possible that every one of them to operate in the same
time. For example queries, forms and reports to be
generated while the interface to access the database (e.g.
web interface) possibly invoking stored procedures.
The CCP must assure the concurrency control
where its purposes is to ensure that one user’s work does not
inappropriately influence another user’s work. In some
cases, these measures ensure that a user gets the same result
when processing with other users as that person would have
received if processing alone. For example a CCP user
should be able to enter an order and get the same result,
whether there are no other users or hundreds of other users
accessing the database.
The SQL VIEWs will be used to hide columns to
simplify results or to prevent the display of sensitive data.
They also can be used to display the results of computed Figure 1 - Database Processing Environment (Abstract)
columns, to hide complicated SQL syntax, and to
layer the use of built-in functions to create results
that are not possible with a single SQL statement. In
addition using VIEWs will help with query
optimisation, reducing the number of necessary
JOINs of the query or subquery. SQL views also are
used to assign different processing permissions and
different triggers to different views of the same
table.
For example a view created for staff (Figure
2) with the associated position (Figure 3) and is Figure 2 - Simple VIEW creation example of staff position, contact details
limited to show just the basic contact details for and availability
staff instead of returning all records, will return only selected values.
The output can observed in Figure 4 and we will make use of this
view later on in query samples. Please see Appendix B for a report sample
output of the staff availability.

Figure 3 - STAFF > Staff Position relationship


and table structure

Figure 4 - VEW output

Page 2 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
To improve the security, derivate queries
simplicity, overall performance and queries speed CPP
must use where is possible the virtual tables (views).
The normalisation to 3NF for this case study is a
straight forward process. The entities identified into the
ERD are already designed to limit any data redundancy
and the tables contains facts about just one entity type and
every record or row contains data facts about one entity.
For example the STAFF table contain only the details
necessary for the staff. In Figure 5 we can 3NF analysis
for BEVERAGE table. However, for data types few
aspects must be highlighted.
Figure 5 - Normalisation example for BEVERAGE table
The first step in deciding what data type to use for
a given column is to determine what general class of types is appropriate: numeric, string, boolean, temporal and so
on. This is usually pretty straightforward, but sometimes having multiple data type options for same data (e.g.
VARCHAR vs CHAR or INT vs TINYINT) can make a notable difference in overall performance.

For the CCP database the LOCATION, MENU, FOOD and BEVERAGE PKs/FKs were selected as UNSIGNED
SMALL INT with a maximum assigned value of 65535, rather than as an INT, because I’m assuming that CCP will
never have more than 65535 locations, menus or beverages recorded into table. An INT data type is 4Bytes in storage
where SMALL INT is 2Bytes. It may not be seen as a big number but considering that millions of records could be
recorded over the years, this will save a considerable amount of disk space. For the UNITS and STAFF_POSITION
PKs/FKs a UNISIGNED TINYINT with a maximum value of 255 and 1Byte storage was selected. As a general rule
followed for integers data type is to use the smallest data type that can reliably contain all possible values.
Another step in order to improve the speed was to limit, where was possible, reads/inserts using ENUM data
type, where table field was defined with the smallest set of possible values (store a predefined set of distinct string
values). For optimisation demonstration purposes we can have a look at Figure 6 where a micro experiment was
created. The experiment was to use exactly same data sample and joints for the attribute CONTRACT_TYPE (runned
1000 times and the result time is average from those 1000 runs) but for the first run VARCHAR (10) as data type and
ENUM (‘Full-Time’,’ Part-Time’) for second run.

Figure 6 - Using ENUM data type micro-experiment


We can observe that with VARCHAR data type took 0.063s for the query to run where using ENUM took 0.002s.
It may not be seen as a huge difference but creating more complex queries, and with possible thousands of concurrent
read/write transactions, every millisecond counts. However, we must consider the fact that this micro benchmark was
in memory and in real life there could be a lot of extra implications in terms of performance. The biggest downside of
ENUM is that the list of strings is fixed, and adding or removing strings requires the use of ALTER TABLE and it can be
a challenging task if is required to change the definition of the field in any way or another value needs to be added at a
later time (e.g. a 3rd form of employment). Therefore, ENUM should be used only when the values were unlikely to be
ever changed. Please see Appendix A for an extended description of used data type.

Indexes
For CCP database an index will be added for each foreign key that can optimize the performance when
accessing and enforcing referential constraints (RI – referential integrity) (Mullins, 2014) which can improve overall
performance, a practice sustained also by Kroenke and Auer and they highlight the importance of indexing FKs for
query processing that is central activity during the parsing phase to perform join operations (Kroenke & Auer, 2016).

Page 3 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
The query optimization is a critical performance issue in any centralized database management system,
because the main difficulty is to choose one execution strategy that minimizes the consumption of computing
resources. Another objective of query optimization is to reduce the total execution time of the query which is the sum
of the execution times of all individual operations that make up the query.
For CCP database in addition to the PKs/FKs indexing few other attributes will be considered for indexing:
ALTER TABLE `menu` ADD INDEX `menu_idx_id_type_name` (`menu_name`);

ALTER TABLE `beverage` ADD INDEX `beverage_idx_bev_name` (`bev_name`);

ALTER TABLE `food_item` ADD INDEX `food_item_idx_food_name` (`food_name`);

ALTER TABLE `view_staff` ADD INDEX `view_staff_idx_available_staff_city` (`Available`,`Staff


City`);

ALTER TABLE `location` ADD INDEX `location_idx_name` (`loc_name`);

ALTER TABLE `order` ADD INDEX `order_idx_date` (`order_date`);

Query optimisation
Probably one of the most difficult problem in query optimization is to accurately estimate the costs of
alternative query plans. For CCP the most effective method will be to use the indexes all the predicates in JOIN,
WHERE, ORDER BY and GROUP BY clauses, along with the already predefined indexes. Another fact
recommended by IBM (IBM, n.d.) for query optimisation is to avoid as much as possible the creation of queries using
a wildcard (%) at the beginning of a predicate, this being a known performance limitation in all databases.
The most queries for CCP will use INNER JOIN, rather than OUTER JOIN (where possible) because, using
OUTER JOIN limits the database optimization options which typically results in slower SQL execution (IBM, n.d.).

Another step taken for optimisation of the query is to avoid the SELECT * if there are unnecessary columns in
SELECT clause. However, for Query 3 a SELECT * was used because the actual selection is done through 3 subqueries
and the main query returns all results from subqueries.
Stored Procedures / Triggers
One of the facts that will affect CCP database consistency overtime (one of the ACID properties), is the
change of price. As any other business, CCP will update their prices from time to time (up or down) but changing the
prices in menu, food or beverages tables, will affect the correct calculations of sales. For example a bottle of Coca
Cola at 01/01/2019 was £1 but at 01/02/2019 it had an increase of 5%, so the new price is £1.05. Creating queries to
calculate different values based on the new current price will affect all past records. To protect database against
inconsistencies, the price at the date must be stored into ORDER_DETAILS table. This can be inserted using stored
procedures or triggers. When a new order is created and the items are added to the ORDER_DETAILS, the stored
procedure will INSERT the price for that specific item at the date.

This method will keep the database consistency overtime and all the queries regarding orders will be correctly
outputted.
NOTE: The queries created in this document are created without using stored procedures or triggers.
Monitoring Tools
Using a 3rd party software for complete monitoring of database servers and database – including availability,
database and table sizes, cache ratios, and other key metrics will allow the DBAs to anticipate any issues that may
appear to the database or troubleshoot and diagnose the root cause in a live production environment. For CCP an
Oracle database was recommended therefore the most suitable monitoring tool would be Oracle Enterprise Monitoring
(Oracle, n.d.). With the complete integration into the Oracle database and capabilities of functional testing, load
testing, test scripts for performance optimisation, performance management using Automatic Workload Repository
(AWR), real time SQL monitoring, DB operations monitoring and many other feature is the perfect candidate for
monitoring.
Page 4 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
However if CCP business decide to use other monitoring tools a wide range of software are available each one
of them with their own advantages and disadvantages. Few Worthing to be mentioned would be Solarwinds Database
Performance Analyzer, Idera SQL Diagnostic Manager or Red-Gate SQL Monitor. Most of the applications offers free
trials, therefore for the DBAs is a good opportunity to test the software capabilities before committing to purchase.

Page 5 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Database architecture
Distributed Database System (DDS)

As a general simplified definition we can say that distributed


database system consists of a single logical database that is split into a
number of fragments. Each fragment is stored on one or more
computers under the control of a separate DBMS, with the computers
connected by a communications network. Each site is capable of
independently processing user requests that require access to local data
and is also capable of processing data stored on other computers in the
network (Oracle, n.d.). A representation of how Crazy Cat Pubs
database is distributed over multiple locations we can see in Figure 8
Figure 8 – CCP Offices Locations
and Figure 8.

CCP database users will access the distributed database through:

• Local applications: applications which do not require data from other sites.
• Global applications: applications which do require data from other sites.
Because the CCP databases design is done
from scratch and all sites offer same services, an
autonomous homogeneous distributed database is
recommended, in orders to reduce the risks for
hardware/software incompatibilities or costly
bridge connections. A homogeneous distributed
database has identical software and hardware
running all databases instances and are aware of
each other and agree to cooperate in processing
user requests. The system will appear through a
single interface as if it were a single database.
(E.g. when a shift is created for staff 1 at location
1, this shift will be available for any query request
from any other location). As an autonomous
system each database is independent that functions
on its own (Oracle, n.d.). They are integrated by a
controlling application and use message passing to Figure 9 - CCP Distributed Database System (Abstract)
share data updates. (e.g. each location being able
to store orders independently of other locations will reduce the risk of a single point of failure when the connection to
any other node will fail).
Similar with a centralised database for a DDS, an interface needs to be designed for users to input/read data
rather than any direct access to the local database. However, the development of the user interface will increase
considerable in difficulty because the developer(s) must take into account several databases connections rather than
only one connection but the end user won’t see any difference because the database will act as a relational single
database.
In order to implement a distributed relational database for Crazy Cat Pubs, a series of factors must be
considered for redesign.
Fragmentation
A relation may be divided into a number of sub-relations, called fragments, which are distributed at each
location. The relation r is fragmented into several relations r1, r2, r3....rn in such a way that the actual relation could be
reconstructed from the fragments and then the fragments are scattered to different locations. There are three types of
fragmentation:

Page 6 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
1. Horizontal fragmentation which splits the relation by assigning each tuple of R to one or more fragments. To
construct the relation R from various horizontal fragments, an UNION operation can be performed on the fragments.
Such a fragment containing all the tuples of relation R is called a complete horizontal fragment. To be more specific
related with CCP DDS we can have a look at the below example where STAFF was fragmented over 4 sites (not pub
location (venue) but office location, where an office will serve a region and have more venues affiliated) where each
staff belongs to that specific region.
STAF1=σ location = 1 (STF), (fragment r1)
STAF2=σ location = 2 (STF), (fragment r2)
STAF3=σ location = 3 (STF), (fragment r3)
STAF4=σ location = 4 (STF), (fragment r4)

The reconstruction of the relation r by taking the union of all fragments will be like:

R = r1 ∪ r2 ∪ r3 ∪ r4
More about reconstruction process please see distributed query
processing section of this document.

Due to the structure of Crazy Cat Pubs offices distribution, a


horizontal fragmentation is the most optimal approach, where
each location (site) holds the fragment related with their
coverage area. An example of fragmentation we can observe in Figure 10 - CCP Fragmentation example
Figure 10 where staff belonging to a specific location will be
stored in their respective fragment.

r1 fragment of LOCATION = 1

STF_ID STF_NIN STF_NAME STF_SURNAME STF_ADDRESS LOCATION_ID


7782 LM267246B Smith Scott Address 1 1
7839 ZS941326 Miller Clark Address 2 1
7934 NH774897D Michelle Adams Address 3 1

R2 fragment of LOCATION = 2

STF_ID STF_NIN STF_NAME STF_SURNAME STF_ADDRESS LOCATION_ID


7369 EP367063A Smith Scott Address 1 2
7566 TB943169F Miller Clark Address 2 2
7788 MS127906B Michelle Adams Address 3 2
7876 HP428104P Thomas Howard Address 4 2
7902 CZ547348D Kian S O'Brien Address 5 2

Table 1 - Staff table horizontal fragmentation example

2. Vertical fragmentation which splits the relation by decomposing the schema r of relation R. In other words in
vertical fragmentation, some of the columns (attributes) are stored in one computer and the rest are stored in other
computers. This is because each site may not need all the attributes of a relation or better say, a vertical fragment
keeps only certain attributes of the relation. For CCP is not recommended a vertical fragmentation.
The fragmentation should be done such that we can reconstruct relation r from the fragments by taking the natural
join:
r=r1*r2*r3*r4

Page 7 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Mix fragmentation (Intermixes of horizontal and vertical fragmentation) also known as Hybrid
fragmentation. Each fragment is obtained as the result of application of either the horizontal fragmentation or vertical
fragmentation scheme on relation r, or on a fragment of r that was obtained previously.
Replication
In replication, the system maintains several identical replicas of the same relation r in different sites. The
advantage of replication is data are faster available but as a downside, it will increases overhead on update operations
as each location containing the replica needed to be updated in order to maintain consistency. The definition and
allocation of fragments must be based on how the database is to be used. However, this involves a thorough analysis
of transactions where both quantitative and qualitative information will be take in account. (e.g. Quantitative:
frequency with which a transaction is run; site from which a transaction is run; the performance criteria for
transactions. Qualitative: locality of reference; improved reliability and availability; acceptable performance;
balanced storage capacities and costs; minimal communication costs).
CCP could use a selective replication (partially) where this strategy is a combination of fragmentation and
replication. Some data items are fragmented to achieve high locality of reference and others, which are used at many
sites and are not frequently updated, are replicated; (otherwise, the data items are centralized). The objective of this
strategy is to have all the advantages of the other approaches but none of the disadvantages. This is the most
commonly used strategy because of its flexibility (Hiremath & Kishor, 2016). For example the BEVERAGES,
MENU, FOOD could be replicated on all sites because according to scenario “…menus may change over time but
they are not expecting them to change every week. The menu and prices are the same in every location.” therefore,
they won’t be frequently updated, where STAFF, SHIFT and ORDERS could be fragmented over each location.
In Figure 11 we can observe how the CCP database architecture is distributed between locations, using the global and
local schemas.

- Global Conceptual Schema is the logical description


of the DB as if it were not distributed. It contains
definitions of entities, relationships, constraints,
security, and integrity information
- Fragmentation and Allocation Schemas describe
how data are logically partitioned, and where they are
located, taking replication into account
- Local Schemas are the logical descriptions of the
local DBs.

Figure 11 - CCP Reference Architecture for DDS

Page 8 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
DDS Data distribution (allocation)
The allocation problem involves finding the “optimal” distribution
of fragments on sites (Özsu & Valduriez, 2011). The optimality can be
defined with respect of two measures: minimal cost and performance. The
cost function consist of the cost of storing each fragment at a site, the cost of
querying a fragment at a site, the cost of updating a fragment at all sites
where it is stored and the cost of data communication. The allocation
problem attempts to find an allocation scheme that minimizes the combined
cost function. The allocation strategy should be designed to maintain
performance criteria. In Figure 12 we can see how data allocation
(fragments and replica) is done across the locations. Each fragment on a
location contain only data relevant to the specific location and they are
updated often (e.g. ORDERS) where replicated data, are identical to each
location because the update is done very rarely. A solution for replicated
data is to use snapshot which will be bringing up to date periodically.
Figure 12 - CCP Data allocation (fragments-
DDS query processing green; replica-blue)

There are two aspects to query processing in a DDS: query transformation and query optimization. The query
transformation process transforms a global query into equivalent fragment queries where query optimization process
optimizes the query with respect to the cost of executing the query.
To understand the process involved into query execution into DDS, we must compare and analyse the process
flow between a centralised query and a distributed one.
Let us consider the following two relations which are stored in the centralized Crazy Cat Pubs DBMS…
STAFF (staff_id, staff_name, staff_surname, position_id)
POSITION (position_id, position_name, contract_type)

…and the accounting department runs following query: [Retrieve the names of all staff whose position is ‘full-
time’], where, staff_id and position_id are PKs for the relation STAFF - POSITION respectively and
position_id is a FK of the relation STAFF. Therefore, the above query can be expressed as:
SELECT staff_name, staff_surname
FROM staff, position
WHERE staff.position_id = position.position_is AND contract_type = “full-time”

The equivalent RA expressions that correspond to the above SQL statement is as follows:
1. ∏staff_name, staff_surname (σ (contract_type = ‘full_time’) ∧ (Staff.position_id = Position.position_id) (Staff х Position))
2. ∏staff_name, staff_surname (Staff ⋈ Staff.position_id = Position.position_id (σ contract_type = ‘full-time’ (Position)))

However, distributed query processing involves the retrieving of data from physically distributed databases
which provides the view of a single logical database to users. In Figure 13 we can observe the process difference
between a centralized DBMS and a distributed context. The query processing is significantly more difficult because
the choice of optimum execution strategy depends on some other factors such as data transfer among sites and
the selection of best site for query execution.

Page 9 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Figure 13 - Query processing steps for centralised DBMS (Left) compared with steps in Distributed Query Processing (Right)

To analyse the same query in a distributed database environment where the STAFF and POSITION relations
are fragmented and stored into different sites. Assuming that the STAFF relation has horizontally fragmented into two
partitions STF1 and STF2 which are stored at Location 1 and Location 2 respectively and the POSITION
relation has horizontally fragmented into two relations POS1 and POS2 which are stored in LOCATION 3 and
LOCATION 4 respectively as listed below.
STF1 = σ position_id ≤ L2 (Staff)
STF2 = σ position_id > L2 (Staff)
POS1 = σ position_id ≤ L2 (Position)
POS2 = σ position_id > L2 (Position)
For example, assuming that accounting department wants to see the entire full-time staff list and the
accounting company is at a different location and the data are transferred into accounting company location (e.g.
Location 5) before processing the query and the RA result would be like:

Result = ∏ staff_name, staff_surname ((STF1 ∪ STF2) ⋈ position_id (σ contract_type=’full-time’ (POS1 ∪ POS2))

However, a second strategy (Figure 14) is to perform individually selection operations into fragmented
relations POS1 and POS2 at Location 3 and Location 4 respectively, and then the resultant data to be transferred at
Location 1 and Location 2 respectively. After evaluating the join operations at Location 1 and Location 2, the resultant
to be transmitted to Location 5 and the final project operation has been performed.

Page 10 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Location 5

Result = ∏ staff_name, staff_surname (S1 ∪ S2)


S1 S2
Location 1 Location 2
S1_STF1 ⋈ position_id (P1) S2_STF1 ⋈ position_id (P2)

P1 P2
Location 3 Location 4

P1 = σ (contract_type=’full-time’) (POS1) P2 = σ (contract_type=’full-time’) (POS2)

Figure 14 - Distributed query execution

Despite the fact that both strategies provide same results the second strategy will be faster but we must be
aware that slower communication between the locations and the higher degree of fragmentation can have performance
impact over query processing execution strategies.
Query optimisation for distributed databases is one of the most important and challenging tasks and for years
have been under discussions and researches. And with the increase number of interconnected databases at the global
level becomes more difficult to optimize them (Haroon, 2018). To characterize query optimizers, it is useful to
concentrate on join trees, which are operator trees whose operators are join or Cartesian product. Every SQL query has
a run time but the biggest trade comes when we use JOINS, very useful but very expensive in term of performance.
Like in the in the above example both queries gives same result however, professionally the query with best time and
space average is the query that should be considered.
The above examples are based on an excessive fragmentation to show the distributed query execution. For
CCP using the horizontal fragmentation, the fragmentation will be less and each location will perform individually
selection operations locally pushing the results to the initial query location (similar with second described strategy).

DDS Indexing

For a DDS system, several additional factors further complicate the process of query execution. In
general, the relations are fragmented in a distributed database, therefore, the distributed query processor transform a
high-level query on a logical distributed database (entire relation) into a sequence of database operations (relational
algebra) on relation fragments. The data accessed by the query in a distributed database must be localized so that the
database operations can be performed on local data or relation fragments. Finally, the query on fragments must be
extended with communication operations, and optimization should be done with respect to a cost function that
minimizes the use of computing resources such as disk I/Os, CPUs and communication networks.

DDS Security

Addressing data security is an imperative aspect of any database system but it is of particular importance in
distributed systems because of fragmented and replicated data, large number of users, multiple sites and distributed
control, different interaction systems. Probably is safe to summarize the main threats to the database as: availability,
integrity and/or confidentiality loss. In general they are use two types of security architecture for DDS. One is when
control is centralized and data are distributed when the other where data and control are distributed among locations.
Following the earlier recommendation for an autonomous homogeneous environment where all of the nodes
are designed identically and each node is capable of handling multilevel data a multilevel security (MLS) architecture
is recommended, where data and control are distributed among locations. Each node has a MLS/DBMS, which
manages the local multilevel database. Each node also has a distributed processing component called the Secure
Distributed Processor (SDP) because multilevel security must be taken into consideration during all of the processing.
Page 11 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
The distributed DBMS must thus be able to restrict the
access of a subset of the database to a subset of the users. For
CCP the admin privileges must be divided into two categories
such as Main Admin (global) and sub-admin or local DBAs with
only local access. The admin access level will be different than
the access into the centralized database where the necessity of Figure 15 - Role permissions example
local admin is not required. However, the other users (e.g. venue managers, waiters etc.) should have access only to
the local (fragment/ replica) data, but the final decision of user privileges (what they can access) it can be done after a
clarification with CCP to understand in depth their business model. A possible solution for a full workable database
independent of business model would be an enhancement to the database design (Figure 14) where several other tables
would be added (USER, ROLES, PRMISSIONS and ROLE_PERMISSION) and a DBA can setup through user group
roles specific privileges according with business needs. This security approach is a multilevel or mandatory access
control (Lunt & Fernandez, 1990). We must be aware that additional security measure will bring additional problems
as well such as: remote user authentication, management of discretionary access rules, handling of views and of user
groups, and enforcing multilevel access control. As a short example for possible solution would be the authentication
information maintained at a single site for global users, which can then be authenticated only once and then accessed
from multiple sites or the information for authenticating users (user name and password) replicated at all sites.
Another aspect of security is data protection approach and implicit, data encryption (Fernandez, Summers, &
Wood, 1981) which is useful both for information stored on disk and for information exchanged over the network. In a
centralized DBMS the risk of unauthorised is smaller compared with a DDBMS where we multiple access points with
different vulnerabilities.
Last but not least for CCP physical security measures must be considered as well (e.g. locked cabinets/doors,
close of unused ports, restriction of USB connections where necessary, etc.). In a report published by OTA they found
that number of cyberattacks doubled in 2017 compared with 2016 where 11% of attacks were due to a lack of any
internal control or protections as physical attacks (OTA, 2018).

DDS Backup and Recovery

The two terms are tight related because making regular backups is a key point to minimize time to recover
from failures. As database architecture has fundamentally changed to meet new application requirements, backup and
recovery methods needs to be redefined and architected as well.
Referencing to the initial DBMS where Oracle database was recommended the importance of using a all-in-
one solution will be one more time highlighted. Along with the full support for DDS Oracle offers also integrated
solution of backup & recovery engine, known as Oracle Recovery Manager (RMAN).
However, having an automatic software that will help the DBA to achieve a performant RTO (recovery time
objective), is only half of the process. The whole process of the back-up and recovery strategies will include regular
backup’s policies and off-site storage location. For CPP database a selective backup is probably the best strategy
where, ORDERS will have real-time back function where other data in database a daily or even weekly backup will
probably be enough. The policies must be enforced and followed on all locations and the global configurations must
be taken in account as well for backups.
The backup procedure can include a full backup (a complete copy of entire database, ensuring a full recovery
of all data after a database integrity failure or physical disaster), an incremental backup (backup only of data changed
from last complete or incremental backup) and a concurrent backup (similar with real time backup, where the backup
is take place when one or more users are working on the database) (Coronel & Morris, 2017). The backups must be
performed at regular intervals and as a note it worth to be mentioned that the backed-up data must be regularly
checked for data corruption and if they can be used at any time. Please see Appendix C as a general policy guidance
example.

Page 12 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
SQL Statements
The queries presented in this document are based on a sample data insert of:
▪ 3 Venues(Locations)
▪ 30 Staff (Average of 10 staff per location – some of
them are not allocated)
▪ 30 Shifts (which are covering 3 locations over 10
days where 1 shift (day) = 1 location ROTA)
▪ 260 Rota (staff allocations over 10 days in 3
locations where each rota belongs only to 1 staff )
▪ 10 Beverages
▪ 10 Menus
▪ 10 Food Items
▪ 600 Orders (average 20 order per day, over 10 days Figure 16 - Order & Order Details Example
in 3 locations)
▪ 2400 Order details (with an average of 4 items per order – e.g. 1 menu, 1 food item, 2 different
beverages. Please note that quantity of same item is not counted as an individual insert. QTY is an
attribute indicating the numbers of same item ordered (see Figure 16).
Note: Some symbols in relational algebra for aggregate functions, (e.g. SUM, grouping , renaming) are
used as are presented in the book ‘Fundamentals of Database Systems’ (6th Edition) by Ramez Elmasri &
Shamkant B. Navathe in the chapter “The Relational Algebra and Relational Calculus”. All queries contain
one or more aggregate functions and the RA was enchanted to reflect as much as possible the aggregation.

Page 13 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Query 1

The following query can be useful for the business to calculate the |𝑁1 − 𝑁2 |
difference in percentage between the total sales of alcoholic drinks and 𝐷𝑖𝑓𝑓𝑟𝑒𝑛𝑐𝑒 % = 𝑥 100
(𝑁1 + 𝑁2 )
non-alcoholic. The formula used to calculate the difference is presented on 2
the right, where N1, N2 are the total amount of alcoholic and non-alcoholic
drinks. This query is useful to see which type of drinks customer order most. If we apply formula and calculate
((3765.5 - 3246.15) ÷ ((3765.5+3246.15) ÷ 2) * 100) = 14.8139168384…. The query correct calculation output can
be seen in Figure 17 and the output was limited to 2 decimals.

SELECT
CONCAT('£ ', order_details.n1) AS 'Alc Total Sale',
CONCAT('£ ', order_details.n2) AS 'Non-Alc Total Sale',
CONCAT(ROUND(((N1 - N2) / ((N1 + N2) / 2)) * 100, 2), ' %') AS 'Sale Difference'
FROM (SELECT
SUM(CASE WHEN beverage.bev_type = 'alc' THEN order_details.qty * bever-
age.bev_price ELSE 0 END) AS n1,
SUM(CASE WHEN beverage.bev_type = 'non-alc' THEN order_details.qty * bever-
age.bev_price ELSE 0 END) AS n2
FROM order_details
INNER JOIN beverage
ON order_details.beverage_id = beverage.bev_id) order_details;

Figure 17 - Query 1 output

RA can be expressed like below:


Π Ϭ (((n1, n2, d) ℑ (SUM(beverage.bev_price * order_details.qty = ‘alc’) ρ n1 ← alc), ℑ SUM(beverage.bev_price * order_details.qty = ‘non-alc’) ρ n2 ← non-
alc) (Ϭ ℑ (((n1 - n2) / ((n1 + n2) / 2)) * 100) ρ ← d)
(beverage ⋈ order_details))

Page 14 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Query 2

The following query is returning the top 5 most popular menus in the last month from current date, along with
total sales and the price per menu. The query can be easily modified for last day, week or year just by changing the
interval of WHERE clause
(WHERE order.order_date >= (CURDATE() - INTERVAL 1 MONTH)).

This query may be useful for the business if they want to see the most sold menu or the less popular menu by
changing ORDER BY statement to ASC

SELECT
menu.menu_name AS 'Menu Name',
menu.menu_type AS 'Menu Type',
SUM(order_details.qty) AS 'Total Sale',
CONCAT('£ ', menu.menu_price) AS 'Price'
FROM order_details
INNER JOIN menu
ON order_details.menu_id = menu.menu_id
INNER JOIN `order`
ON order_details.order_id = `order`.order_id
WHERE `order`.order_date >=(CURDATE() - INTERVAL 1 MONTH)
GROUP BY menu.menu_type,
menu.menu_name
ORDER BY `Total Sale` DESC
LIMIT 5;

Figure 18 - Query 2 output

RA can be expressed like below:

π ((menu_name, menu_type, ℑ SUM(order_details.qty), menu_price) σ order_date ≥ date('CURDATE') ℑ (menu_type,)→(menu_name)


(order_details ⋈ menu ⋈ order))

Page 15 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Query 3
This query will return all the income across all locations categorised on the type of sales (individual food
items, full menus, all drinks). This query can be easily modified to return a specific data range or location by adding a
WHERE clause.

SELECT *
FROM (SELECT
CONCAT('£ ',SUM(CASE WHEN order_details.food_id <> 0 THEN order_details.qty *
fi.food_price ELSE 0 END)) AS Food
FROM order_details
INNER JOIN food_item fi
ON order_details.food_id = fi.food_id) F,
(SELECT
CONCAT('£ ',SUM(CASE WHEN order_details.menu_id <> 0 THEN order_details.qty *
m.menu_price ELSE 0 END)) AS Menu
FROM order_details
INNER JOIN menu m
ON order_details.menu_id = m.menu_id) M,
(SELECT
CONCAT('£ ',SUM(CASE WHEN order_details.beverage_id <> 0 THEN order_details.qty *
b.bev_price ELSE 0 END)) AS Beverage
FROM order_details
INNER JOIN beverage b
ON order_details.beverage_id = b.bev_id) B;

Figure 19 - Query 3 output

RA can be expressed like below:

Π (((F, M, B) σ ((SUM(order_details.food_id = order_details.qty * food_item.food_price) ρ ← F ), (SUM(order_details.menu_id = order_details.qty * menu.menu_price ) ρ ← M


), (SUM(order_details.bev_id = order_details.qty * beverage.bev_price) ρ ← B )) (order_details ⋈ food_item ⋈ menu ⋈ beverage))

Page 16 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Query 4

Bellow query is a ‘zoom-in’ of query 3 where the total income from menus for each location is calculated and
the average price paid by a customer per menu at each location. The query calculate the grand total of all locations
along with the average price per menu across all locations. The ORDER table is written between `` (back strings)
because MySQL interpret the word order as a keyword for ORDER BY.

SELECT
COALESCE(location.loc_name, 'Grand Total') AS 'Location Name',
CONCAT('£ ',
SUM(CASE WHEN order_details.menu_id <> 0
THEN order_details.qty * menu.menu_price
ELSE 0 END)) AS 'Total Income Menus',
CONCAT('£ ',
ROUND(AVG(CASE WHEN order_details.menu_id <> 0
THEN menu.menu_price
ELSE 0 END), 2)) AS 'Average Price/Menu'
FROM order_details
INNER JOIN menu
ON order_details.menu_id = menu.menu_id
INNER JOIN `order`
ON order_details.order_id = `order`.order_id
INNER JOIN location
ON `order`.location_id = location.loc_id
GROUP BY location.loc_name
WITH ROLLUP;

Figure 20 - Query 4 output

RA can be expressed like below:

Π (((location_name, T, A) σ ((order_details.menu_id = order_details.qty *


menu.menu_price) ρ ← T ), (AVG(order_details.menu_id = order_details.qty *

menu.menu_price ) ρ ← A )) σ (location.loc_name )) (order_details ⋈ menu ⋈ location)


Page 17 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Query 5

The business scenario states that CCP keep records of “…employee shifts to show who is on duty at what time;
…all other staff are part time and have casual contracts” the following query was considered to see if is any available
staff without assigned is working hours, for busiest days. The query make use of a VIEW where staff availability,
name, position, city and contact details are stored. Using staff VIEW will reduce the number of necessary JOINs.

SELECT DISTINCT
view_staff.`Staff Full Name`,
view_staff.`Position`,
view_staff.`Staff City`,
view_staff.Phone,
view_staff.eMail,
view_staff.Available
FROM view_staff,
rota
INNER JOIN shift
ON rota.shift_id = shift.shift_id
WHERE NOT EXISTS
(SELECT *
FROM rota r
WHERE view_staff._id = r.staff_id)
ORDER BY view_staff.`Staff City`;

Figure 21 - Query 5 output


The query can be modified to return only the available staff by extending WHERE clause as follow:

WHERE NOT EXISTS


(SELECT *
FROM rota r
WHERE view_staff._id = r.staff_id)
AND view_staff.Available = 'yes'
ORDER BY view_staff.`Staff City`;

Figure 22 - Query 5 output with modified WHERE clause

RA can be expressed like below:

Π ((Staff_Full_Name, Position, Staff_City, Phone, eMail) ∩ Ϭ


(view.staff_id ≠ rota.staff_id) (view_staff ⋈ rota))

Page 18 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Query 6

The following query it can be useful for a general manager (not venue manager) to see if a location is
understaff and estimate how many orders a staff handled on a specific day. The query will return the name of the staff,
position, location where the staff was working and the total of orders handled on specific day.

SELECT
view_staff.`Staff Full Name`,
view_staff.`Position`,
location.loc_name AS 'Location',
COUNT(`order`.order_id) AS 'Total No of Orders'
FROM rota
INNER JOIN view_staff
ON rota.staff_id = view_staff._id
INNER JOIN `order`
ON `order`.rota_id = rota.rota_id
INNER JOIN location
ON `order`.location_id = location.loc_id
WHERE `order`.order_date = '2019-03-02'
GROUP BY location.loc_name,
view_staff._id
ORDER BY loc_name;

Π (((location_name, T, staff_name,staff_position) σ (COUNT(order_id) ρ ← T ) ,


(location_name, staff_name_staff_position) ) σ (location.loc_name )) σ
(staff_name,staff_psition ) (staff ⋈ position ⋈ location ⋈ order)))
Page 19 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Bibliography
Coronel, C., & Morris, S. (2017). Database Systems: A Practical Approach to Design, Implementation and
Management. Essex: Pearson Education Ltd.

Fernandez, E. B., Summers, R. C., & Wood, C. (1981). Database Security and Integrity. Addison: Wesley.

Haroon, M. (2018). Query Processing and Optimization in Distributed Database Systems . International Journal of
Modern Computation, Information and Communication Technology, 83-87.

Hiremath, D. S., & Kishor, D. S. (2016). Distributed Database Problem areas and Approaches. IOSR Journal of
Computer Engineering, 15-18.

IBM. (n.d.). Techniques for improving the performance of SQL queries under workspaces in the Data Service Layer.
Retrieved from IBM.com:
https://www.ibm.com/support/knowledgecenter/en/SSZLC2_9.0.0/com.ibm.commerce.developer.doc/refs/
rsdperformanceworkspaces.htm

Kroenke, D. M., & Auer, D. J. (2016). Database Processing: Fundamentals, Design and Implementation. Harlow:
Pearson.

Lunt, T., & Fernandez, E. B. (1990). Database security. ACM SIGMOD Rec., 90-97.

Mullins, C. S. (2014, Nov 12). Top 10 Steps to Building Useful Database Indexes. Retrieved from Database Trends and
Applications: http://www.dbta.com/Columns/DBA-Corner/Top-10-Steps-to-Building-Useful-Database-
Indexes-100498.aspx

Oracle. (n.d.). Distributed Database Architecture. Retrieved from Oracle.com:


https://docs.oracle.com/cd/B28359_01/server.111/b28310/ds_concepts001.htm#ADMIN12074

Oracle. (n.d.). Distributed Database Concepts. Retrieved from Oracle.com:


https://docs.oracle.com/cd/B19306_01/server.102/b14231/ds_concepts.htm

Oracle. (n.d.). Enterprise Monitoring. Retrieved from Oracle.com: https://www.oracle.com/technetwork/oem/sys-


mgmt/index.html

OTA. (2018, Jan 25). Cyber Incident & Breach Trends Report - Review and analysis of 2017 cyber incidents, trends and
key issues to address. Retrieved from OTAlliance.org:
https://otalliance.org/system/files/files/initiative/documents/ota_cyber_incident_trends_report_jan2018.p
df

Özsu, M. T., & Valduriez, P. (2011). Principles of Distributed Database Systems . New York: Springer.

Page 20 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Appendix

Appendix A - Datatype schema

Page 21 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Appendix B - Report sample

Page 22 of 23
ADVANCED DATABASE CONCEPTS | Submission 2
Appendix C - Database Backup and Recovery Policy example (abstract)

Page 23 of 23
ADVANCED DATABASE CONCEPTS | Submission 2

You might also like