Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

CHINHOYI UNIVERSITY OF TECHNOLOGY

SCHOOL OF ENTREPRENEURSHIP AND BUSINESS SCIENCES

GRADUATE BUSINESS SCHOOL

MASTER OF SCIENCE IN DATA ANALYTICS EXAMINATION

ASSIGNMENT Details

COURSE NAME : DATA BASE MANAGEMENT AND WAREHOUSING

STUDENT NAME : Tongai Mukarati

STUDENT NUMBER : C21143509N

Page 1 of 15
Section A

Question 1 [Out of 50]

Suppose that we have a relational database, dbdvdclub, with the following tables.

Table Name Field Name(s) Primary Key


tblmovies movie_id, movie_title, year, genre_id movie_id
tblactor actor_id, first_name, actor_id
surname,national_id,date_of_birth
tblgenres genre_id, genre_name genre_id
tblactor_acts movie_id, actor_id movie_id, actor_id
tblactor_role actor_id, movie_id, role_id actor_id, movie_id, role_id
tblrole role_id, role_name role_id

a) Write an SQL to create the Database dbdvdclub [2]

Create database dbdvdclub;

b) Write an SQL to create the tables below


i) tblmovies
[4]
use dbdvdclub;

create table tblmovies


(movie_id int unsigned not null auto_increment,
movie_title char(50) default Null,
year smallint(4) unsigned ,
genre_id smallint(5),
Primary key (movie_id)
);

ii) tblactor [4]

use dbdvdclub;

Create table tblactor


(
actor_id int(10) not null auto_increment,
first_name char(50) default Null,
surname char(50) default Null,
national_id char(12),
date_of_birth date
primary key (actor_id)
)

Page 2 of 15
iii) tblactor_role [3]

use dbdvdclub;

create table tblactor_role


(
actor_id int(10) unsigned not null,
movie_id int unsigned not null,
role_id int unsigned not null,
Primary key (actor_id, movie_id,role_id),
Constraint fk_actorrole_movies foreign key (movie_id) references tblmovies
(movie_id) on update cascade on delete restrict
Constraint fk_actorrole_actor foreign key (actor_id) references tblactor
(actor_id) on update cascade on delete restrict
Constraint fk_actorrole_role foreign key (role_id) references tblrole (role_id)
on update cascade on delete restrict
);

c) i) Using the table tblactor_acts, carefully explain what is meant by saying that
actor_id is a foreign key. [3]

Foreign keys are the columns of a table that points to the candidate key of another
table where a candidate key is a set of attributes that uniquely identify tuples in a table
(Hernandez, 2003). The table that contains the foreign key is called the child table,
and the table that has the primary key is called the referenced or parent table. They act
as a cross-reference between tables. For example, the actor_id column in
the tblactor_acts table is a foreign key as it points to the primary key of the tblactor
table which is actor_id.

ii) Using table tblactor compare and contrast Candidate Key and Alternate Key -
[6]
A candidate key refers to a set of attributes that uniquely identify tuples in a table. A
candidate key refers to a super key which has no repeated attributes. Due to this fact a
primary key must be selected from the candidate keys. Every table must have at least
a single candidate key (Hernandez, 2003). A table may have multiple candidate keys
but only a single primary key. For example, in the tblactor table the actor_id and
national_id are candidate keys which help us to uniquely identify an actor in the
tblactor table.

However alternate keys are those Candidate keys that were not chosen to be the


Primary key of the table (Kahate, 2004), in this case the national_id column in the
tblactor table would be an alternate key as it was not chosen to be a primary key.

Page 3 of 15
iii) Using tables a) tblactor_acts and b) tblactor_role differentiate between a
Foreign Key and a Composite Key [6]

Composite key is a key which is composed of two or more attributes that collectively
uniquely identify each record (Kahate, 2004). If we are using the table tblactor_acts,
there will be two keys that developers might identify as candidate keys which are:
Movie_id
Actor_id

one of these attributes will qualify as the primary key. A composite key in this case
would be the combination of two keys, for example the combination of movie_id and
actor_id might qualify as a composite key. If so, then when someone searches using
both the movie_id and actor_id they should only get one single record returned.

Meanwhile a foreign key refer to a column that creates a relationship between two


tables. A foreign key is used to maintain data integrity and to allow navigation
between two different instances of an entity (Hernandez, 2003). A foreign key acts as
a cross-reference between two tables as it references the primary key of another table
(Hernandez, 2003). However, a foreign key does not always have to be linked only to
a primary key constraint in another table, it can also reference the columns of a
UNIQUE constraint in another table (Hernandez, 2003). So in our case the actor_id
column in the tblactor_role can be used to link the tblactor_role table to the
tblactor_acts table.

d) The DVD rental business with the database above has a policy whereby movies which
are 15 years of age (i.e. 15 years post production) are donated to the society. Write an
SQL statement to retrieve all movies which are due for donations. [3]

Select movie_title,YEAR(NOW())-year AS "age" FROM tblmovies where Year <=


2006;

e) Write an SQL statement to compute the average age of Actors who acted in movies
which belong to the ‘Horror” genre. [6]

In the genre table (tblgenres) the genre ID for horror movies is 100

select a.actor_id,a.first_name,a.surname,a.date_of_birth,m.movie_title from


tblactor a
join tblactor_role r
on a.actor_id = r.actor_id
join tblmovies m
on m.movie_id = r.movie_id
where m.genre_id = 100;

Page 4 of 15
f) Write an SQL statement to retrieve actors who have acted in movies and played the
“Main Actor” role. [4]

select actor_id, first_name, surname,national_id,date_of_birth from tblactor


where actor_id in (select actor_id from tblactor_role
where role_id in (select role_id from tblrole
where role_name = "Main_actor"));

g) Explain the principle of entity integrity using the table tblactor_role table. [4]

Entity Integrity is concerned with ensuring that the rows in a table have no duplicate
records and that the field that identifies each record within the table are unique and
never null. That is the table should have one column or a set of columns that provides
a unique identifier for the rows of the table (Tylor, 2011). For example, in the
tblactor_role the columns actor_id, movie_id, role_id act as unique identifiers. This
set of columns is referred to as the parent key of the table.

A primary key acts as a unique identifier for rows in the table. Entity Integrity ensures
two properties for primary keys:
- It ensures that the primary key for a row is unique and it does not match
the primary key of any other row in the table.
- It also ensures that the primary key is not null and no component of the
primary key may be set to null.

A system enforces entity integrity by ensuring that any operation that creates a
duplicate primary key or one containing nulls is rejected.

h) There is a missing constraint “minimum_age” in the table tblactor where no one


below the age of 5 (five years) is allowed to Act, write an SQL statement to effect this
constraint. [5]

ALTER TABLE tblactor ADD CONSTRAINT CHK_PersonAge CHECK


(date_of_birth >= "2016-04-01");

Page 5 of 15
Section B

2 a) An SQL developer is bidding for a new contract with a prestigious blue-chip


organisation. Part of the selection process is a technical interview. Answer the following
questions from the interview panel: For each of the following terms, explain what the term
stands for, the essence of the functions it provides and a set of example SQL statements (at
least TWO for each) that implement these functions:
(i) Data Definition Language (DDL) [6]

The commands used to create or modify database objects such as tables, indices and
users are called data definition language (DDL) commands. DDL commands are used
for altering the database structure such as creating new tables or objects along with all
their attributes (data type, table name, etc.) (Casteel, 2016). The most commonly used
DDL statement are CREATE, ALTER, DROP, and TRUNCATE.

For example:

The create command - This command builds a new table as show below

CREATE TABLE Employee (Employee Id INTEGER PRIMARY KEY, First name


CHAR (50) NULL, Last name CHAR (75) NOT NULL);

The Alter Command

An alter command modifies an existing database table. This command can add up
additional column, drop existing columns and even change the data type of columns
involved in a database table.

For example:

ALTER TABLE Employee ADD PRIMARY KEY (employee_pk);

In this example, we added a unique primary key to the table to add a constraint and
enforce a unique value. The constraint “employee_pk” is a primary key and is on the
Employee table.

(ii) Data Manipulation Language (DML) [6]


Data Manipulation Language (DML) refer to a set of syntax elements that are used to
manage the data in the database. The commands of DML are not auto-committed so
modifications made by them are not permanent to the database (Thakur, 2018). DML
statements are to specify how data can be inserted, updated, retrieved and deleted in
the objects defined using the Data Definition Language (DDL). Data Manipulation
Language is responsible for data modification in a database.

Page 6 of 15
Data Manipulation Language Example

Insert into Student Values (78,’Nicole’, 8)


The above command will insert a record into the student table

For example

UPDATE tutorials SET Author= “webmaster” Where Author= “anonymous”

The above command will update the tutorials table.

(iii) Data Control Language (DCL) [6]


Data Control Language (DCL) is used to control access to data stored in a database,
that is it is syntax that has to do with Authorization. DCL provides the means to
maintain the database effectively so that no other users can make any changes that do
not concern their role or might impact the security of the database (Pedamkar, 2020).

For example:

GRANT SELECT ON STUDENTS TO Tongai123;


The above command grants user Tongai123 the ability to use the select command on
the student’s table.

The Revoke command


This command takes back or cancels the privileges or permissions previously allowed
to user Tongai123

REVOKE SELECT ON STUDENTS FROM user123;

b) Relational databases are very effective in situations for which they are appropriate. In
other situations, simpler file-based solutions may be sufficient. Suppose you are required to
implement a system for storing information about a library’s books, borrowers, and loans.
Give FOUR reasons why a database system is superior to a file-based system for this task.
Illustrate the answer with suitable examples. [4]

Data Security
A database system makes it ease for us to apply access constraints so that only
authorized users are able to access the data. Each user has a different set of access
thus data is secured from issues such as identity theft, data leaks and misuse of data
(Shukla, 2020).

Page 7 of 15
Data Searching
A database management system provides inbuilt searching operations making it easier
and faster for users to search for data. User only have to write a small query to
retrieve data from database (Shukla, 2020). 

Data Integrity
In some cases some constraints need to be applied on the data before inserting it in
database. A file system does not provide any mechanism or procedure to check these
constraints automatically. Whereas DBMS maintains data integrity by enforcing user
defined constraints on data by itself (Shukla, 2020).

Easy Recovery
database systems are able to keep a backup of data, making it is easy to fully recover
data in case of a failure. However, in the case of file systems because it is not easy to
keep backup once the system crashes, there will be no recovery of the data that has
been lost. A database system normally has a recovery manager which retrieves the
data making it another advantage over file systems (Shukla, 2020). 

c) A company wants to move its current file-based system to a database system. In many
ways, this can be seen as a good decision. Identify and describe three disadvantages in
adopting a database approach. [3]

High Cost of Hardware and Software


The costs of implementing a database system are higher compare to those of a file
system. This is because a database system requires computers with high-speed
processor and huge memory size in order to run successfully and these are costly to
acquire.

Complexity
Database systems are very complex systems and user training required before user can
use the database system. In order for the database system to run properly, it is very
important for developers, database administrator, designers, and also the end-users to
have a good knowledge of it. If users are not properly trained then this may lead to
data loss or database failure. 

Higher impact of a failure


A database system ensure that all the data is store in centralized location and this
increases the vulnerability of the system. Since all the user and applications will be
accessing data from the centralized database system, the failure of any component can
bring operations to a halt.

Page 8 of 15
3. a) Using relevant examples define the following terms as used in database normalisation

i) Insert anomaly [3]

An insertion anomaly occurs as a result of the inability to add data to the database as a
result of the absence of other data (Ricardo, 2004). For example, if a system is
designed to require that a customer be on file before a sale can be made to that
customer, but you cannot add a customer until they have bought something, then you
have an insert anomaly.

ii) Update anomaly [3]

An update anomaly refers to data inconsistency that results from data redundancy and
a partial update (Ricardo, 2004). For example, update anomalies happen when the
person charged with the task of keeping all the records current and accurate, is asked,
for example, to change an employee’s title due to a promotion. If the data is stored
redundantly in the same table, and the person misses any of them, then there will be
multiple titles associated with the employee and the end user has no way of knowing
which is the correct title.

iii) Delete anomaly [3]

A deletion anomaly is the unintended loss of data due to deletion of other data, that is
deletion anomalies happen when the deletion of unwanted information causes desired
information to be deleted as well (Ricardo, 2004). For example, if a single database
record contains information about a particular product along with information about a
salesperson for the company and the salesperson quits, then information about the
product is deleted along with salesperson information.

b) Identify and describe the 8 (eight) major duties of a database administrator. [16]

Database Design
Database Design refers to a set of steps that help with designing, creating,
implementing, and maintaining a business’s database systems. The database
administrator determines what data must be stored and how the data elements
interrelate, that is the database administrator is responsible for producing the physical
and logical models for the proposed database system design.

Performance Monitoring
Database monitoring is a very important part of application maintenance. The ability
to discover database issues on time can ensure that an application remains healthy and
accessible. Without good monitoring in place, database outages or issues can go
unnoticed until it is too late and the business is losing money and customers

Page 9 of 15
Capacity planning 
Is the process of determining the production capacity needed by an organization to
meet changing demands for its products. All databases have limits on the amount of
data they can store and the amount of physical memory that they can use. It is
therefore the job of a database administrate to decide the limit and capacity of a
database and all the issues related to it.

Decides Data Recovery and Back up method


A database may fail at any time. Therefore, a database administrator is required to
take backups of the entire database on a regular basis. The database administrator has
to decide how much data should be backed up and how frequently the back should be
taken. The database administrator is also responsible for data recovery in the event
that data is lost.

Database accessibility
It is the job of database administrator to decide on the accessibility of a database. The
database administrator determines which users can access the database and also which
data the user is able to access. No user has to power to access the entire database
without the permission of database administrator.

Managing Data Security and Privacy


The database administrator is also responsible for protecting the database against
accidental or intentional data loss, destruction, or misuse and the establishment of user
privileges.

Managing Data Integrity


Data integrity protects database data from unauthorized use and should therefore be
managed accurately because. It is the job of the database administrator to manage
relationship between our data so as to maintain data consistency.

Making Decisions on Hardware and Software


It is the job of the database administrator to decide which hardware will suit the
company requirement. Hardware acts as an interface that between end users and the
database so the database administrator must ensure that it is the right fit for the job.

Improve query processing performance


When a user submits a query to the database these must be executed quickly. This is
because users require fast retrieval of answers from the database. It is the job of the
database administrator to improve query execution and performance when processing
database queries.

Page 10 of 15
4. a) Using examples explain the Data Warehousing ETL Process [13]

The (ETL) process encompasses three steps which are extraction, transformation, and
loading. The ETL process is a process that takes large volumes of raw data from
multiple sources, converts it for analysis, and loads that data into your warehouse
bringing it all together to build a unified source of information for business
intelligence (Tobin, 2020).

The ETL process is a three-step process and the three primary ETL steps are shown
below:

The ETL process is a key design concept used in the design of a data warehouse
architecture. This is because it ensures that all the processes connect seamlessly and
data continues to flow as defined by the business, shaping and modifying itself where
and when needed according to your workflow. Below we take a look steps involve in
the ETL Process.

Step 1 - Extraction
In the first step, data is extracted from the source system into the staging area. Data
may be extracted from multiple sources for example Excel, Pastel, Sage ERP,
Facebook, text files etc into a staging area. The staging area works like a buffer
between the data warehouse and the source data. Since data may be coming from
multiple different sources and it is in various formats, directly transferring the data to
the warehouse may result in corrupted data. The staging area is used for data
cleansing and to organize the data (Guru99, 2021). Hence one needs a logical data
map before data is extracted and loaded physically. This data map describes the
relationship between sources and target data.

Page 11 of 15
For example, we have Chicken Inn which has many shops in Zimbabwe and the
region. Let's say there is Chicken Inn Blantyre Malawi and it has its own system of
saving customer visit and product purchase history and this data is stored in Excel and
the point-of-sale system. As part of the extraction process purchase history data will
be collected from the point-of-sale system and client visit information will be collect
from the excel file.

Step 2 – Transformation

This stage is closely associated with the data extraction stage. This stage is mainly
concerned with converting data to a format that conforms to a standard schema that is
used the data warehouse uses for storage.  During this stage data cleaning and
organization processes take place. All the data from multiple source systems will be
normalized and converted to a single system format, improving data quality and
compliance. ETL yields transformed data through these methods (Tobin, 2020):

 Filtering – This refers to loading specific attributes into the data warehouse.
 Cleaning – replacing NULL values with default values, etc.
 Joining – multiple attributes are joined into one.
 Splitting – a single attribute is split into multiple attributes.
 Sorting – sorting tuples on the basis of some attribute (generally a key-
attribute).
Keeping our chicken Inn example in mind the amount in the point-of-sale system is
capture in Malawi Kwacha and as part of the transformation process the amount
would need to be convert to USD since the information has to be submitted to the
Harare headquarters and all the client names will also be convert to uppercase.
Step 3 – Loading

At this stage the data that has been extracted and transformed is now written into the
target database. Depending upon the requirements of the business, data can be loaded
into the target database in batches or all at once (Tobin, 2020).

Loading can be carried in two ways:

1. Refresh: In this case the data in the data warehouse is completely rewritten.
That is older files are completely replaced.

2. Update: In this case only those changes applied to our source information are
added to the Data Warehouse. An update is typically carried out without
deleting or modifying pre-existing data.

Continuing with our Chicken inn example at this stage the data that has been convert
and formatted in the transformation stage is now loaded into the Chicken Inn servers
in Harare for storage and to be used as part of business intelligence by management.

Page 12 of 15
b) Differentiate the following terms as used in Data Warehousing:

i) Fact Table and Dimension Table [4]

Fact table
A fact table is a primary table in a dimensional model that is It is a table that has
values of the attributes of the dimension table. It contains quantitative information in a
denormalized form. It basically contains the data that needs to be analysed. Fact tables
mostly have two columns, one for foreign keys that helps to join them with a
dimension table and others that contains the value or data that needs to be analysed. It
mostly contains numeric data. It grows vertically and it contains more records and
fewer attributes (Pedamkar, 2020).

Dimension Table
In data warehousing, a dimension table refers to a collection of reference information
about a measurable event and such events are known as facts and these are stored in a
fact table. Dimensions categorize and describe data warehouse facts and measures in a
way that supports meaningful answers to business questions.  Dimension tables form
the very core of dimensional modelling (Pedamkar, 2020).

KEY DIFFERENCE Between Fact Table and Dimensions Table

 Fact table contains measurements, metrics, and facts about a business process while
the Dimension table is a companion to the fact table which contains descriptive
attributes to be used as query constraining.
 Fact table is located at the center of a star or snowflake schema, whereas the
Dimension table is located at the edges of the star or snowflake schema.
 Fact table is defined by their grain or its most atomic level whereas Dimension table
should be wordy, descriptive, complete, and quality assured.
 Fact table helps to store report labels whereas Dimension table contains detailed data.
 Fact table does not contain a hierarchy whereas the Dimension table contains
hierarchies.

ii) OLAP and OLTP [4]

Online Transaction Processing – OLTP


OLTP refers to system that supports transaction-oriented applications in a 3-tier
architecture. It administers the day-to-day transactions of an organization. A good
example of OLTP system is an ATM, in which using short transactions we modify the
status of our account. The emphasis in OLTP system is on faster processing and this is
because these databases are read, written and updated frequently. In the event that a
transaction fails OLPT system use in built logic to ensure that data integrity in
maintained. OLTP systems act as the source of data for OLAP (techdifferences,
2018).

Page 13 of 15
Online Analytical Processing – OLAP
However, OLAP refers to Online Analytical Processing system. OLAP database
stores historical data that has been inputted by OLTP. It allows a user to view
different summaries of multi-dimensional data. Using OLAP, you can extract
information from a large database and analyse it for decision making (techdifferences,
2018).

OLAP also allow a user to execute complex queries to extract multidimensional data.


In OLTP even if the transaction fails in middle it will not harm data integrity as the
user use OLAP system to retrieve data from a large database to analyze. Simply the
user can fire the query again and extract the data for analysis.

The transaction in OLAP are long and hence take comparatively more time for


processing and requires large space. The transactions in OLAP are less frequent as
compared to OLTP. Even the tables in OLAP database may not be normalized. The
example for OLAP is to view a financial report, or budgeting, marketing
management, sales report, etc.

iii) Snow Flake Schema and Star Schema [4]

Star schema is the simple and common modelling paradigm where the data warehouse
comprises of a fact table with a single table for each dimension. The schema imitates
a star, with dimension table presented in an outspread pattern encircling the central
fact table. The dimensions in fact table are connected to dimension table through
primary key and foreign key (techdifferences, 2018).

Snowflake schema is the kind of the star schema which includes the hierarchical form
of dimensional tables. In this schema, there is a fact table comprise of various
dimension and sub-dimension table connected across through primary and foreign key
to the fact table. It is named as the snowflake because its structure is similar to a
snowflake. The crucial difference between Star schema and snowflake schema is that
star schema does not use normalization whereas snowflake schema uses normalization
to eliminate redundancy of data. Fact and dimension tables are essential requisites for
creating schema (techdifferences, 2018).

Page 14 of 15
List Of References

Hernandez, M.J., (2003). Database Design for Mere Mortals: A Hands-on Guide to
Relational Database Design. Retrieved from https://books.google.co.zw/books?
id=dkxsjXNayHQC&dq=candidate+key&source=gbs_navlinks_s

Kahate, A., (2004). Introduction to Database Management Systems. Pearson Education India.

Taylor, A.G., (2011). SQL for Dummies. John Wiley & Sons.

Casteel, J., (2016). Oracle 12c: SQL. United States: Cengage Learning.

Pedamkar, P., (2020, Dec 12). Data Control language. Retrieved from
https://www.educba.com/data-control-language/

Shukla, S., (2020 Oct 20). Advantages of DBMS over File system. Retrieved from
https://www.geeksforgeeks.org/advantages-of-dbms-over-file-system/

Ricardo, C.M., (2004). Databases Illuminated. Retrieved from


https://books.google.co.zw/books?
id=zjbyULlyIYIC&pg=PA228&dq=Update+Anomalies+example&hl=en&sa=X&ved=2ahU
KEwjR4fHL2uzvAhUNQUEAHUuBDksQ6AEwAXoECAEQAg#v=onepage&q=Update
%20Anomalies%20example&f=false

Thakur, S., (2018, May 07). What is DBMS Lets Define DBMS. Retrieved from
https://whatisdbms.com/explain-data-manipulation-language-with-examples-in-dbms/

Tobin, D., (2020, September 28). ETL & Data Warehousing Explained: ETL Tool Basics.
Retrieved From https://www.xplenty.com/blog/etl-data-warehousing-explained-etl-tool-
basics/#:~:text=ETL%20(or%20Extract%2C%20Transform%2C%20Load)%20is%20a%20process,that
%20data%20into%20your%20warehouse.

Guru99. (2020, December 12). ETL (Extract, Transform, and Load) Process in Data
Warehouse. Retrieved from https://www.guru99.com/etl-extract-load-process.html

Tech Differences. (2018, December 9). Difference Between OLTP and OLA. Retrieved from
https://techdifferences.com/difference-between-oltp-and-olap.html#KeyDifferences

*****END OF PAPER*****

Page 15 of 15

You might also like