DMF Exam Jan2022

MSc DATA SCIENCE
SECTION A
Question 1:
Drawing a logical Entity-Relationship Model derived from an invoice of ZZ trading of Bristol.
Aspects such as keys, relationships, attributes, and entities have to be fully described.
The above invoice has three major entities. The entities are Customer, Order, and
Product. An entity refers to an object that exists. The entities are the tables in a relational
database system. Every entity in a database system must always have some features that describe
it (Amran, Mohammed, and Diana, 2018). The features that describe the entities are known as
attributes. The following table shows the entities of the database system and their attributes:
Entity Attributes
Customer Customer ID, name, address, and order
number
Order Order ID, date, time, and product ID
Product Product ID, name, description, price, quantity,
and line total.
In a database system, there exist some special attributes that uniquely identify an entry.
These special attributes are called primary keys. Primary keys are always the first attributes
when describing an entity. Customer ID is the primary key for the Customer entity, the Order ID
is the primary key for the Order entity, while Product ID is the primary key for the Product
entity. Also, there is another special type of key called foreign keys. In any relational database
system, these keys will always be there. A foreign key is a primary key that is passed to another
table or entity of the same database system. The attributes order number and product ID in the
entities Customer and Order respectively are the foreign keys in the given database system.
Relationships define the kind of manner in which participating entities relate with one
another in a database system. Orders are the kind of relationship that exists between Customer
and Order entities. A customer makes orders an order from the company. Similarly, a customer
purchases a product from the organization. Another aspect that is worth mentioning is
cardinality. Cardinality refers to the number of times in which instances of an entity are mapped
onto instances of another entity. The kind of cardinality that is found in the database system is
one-to-many. This is the mode of cardinality where a single instance of an entity is mapped
many times to the instances of another entity. For instance, a customer can make more than one
order from the company.
There exist several advantages of employing logical entity relational models in database
systems. The pros include the following:

i. ERD diagrams make concepts easier to understand. It allows database designers and
engineers to have a complete understanding of the entire system design.
ii. The visual representation provides designers with a clear view of the systems they are
creating. This serves as a guide for them during the design process.
iii. Database designers can quickly turn ERD diagrams into other types of diagrams. As a
result, ERD images offer a lot of freedom.
iv. ERD images help a team of database engineers in creating a database system more
effectively (Amran, Mohammed, and Diana, 2018).
v. ERD diagrams give database designers a clear picture of the kind of systems they're
creating. Throughout the process, this serves as a guiding concept.
vi. ERD diagrams serve as a link between customers, clients, and designers. ERD allows
these two parties to communicate with one another.
Question 2
The relational model of databases is based on a formal foundation using relational algebra, which
is called the Relational Model of Data. It is often described as having three related aspects: the
Structural aspect, the Integrity aspect, and the Manipulative aspect. Describe and explain
these three aspects of the relational model.

i. Structural – Thee structural aspects of a relational data model describes the
architecture of the database system. Normally, the structural aspects deal with the
manner in which a database system is made up of tables (Amran, Mohammed, and
Diana, 2018). The tables are the entities of a database system. Each table should have
a unique name from one another. Moreover, these tables often comprise some
features that often define them. The features constitute the attributes of a database
system. The first attribute in every entry of a table is always the primary key. The
primary key is a special attribute that uniquely identifies every entry in the particular
tables. In a relational data model, there is also another special key called a foreign
key. It is these kinds of keys that often bring about the idea of relationships among
entities in a database system. Foreign keys are primary keys that are passed to other
tables in the same database system.
ii. Integrity – Integrity can be defined as the total consistency and accuracy of
information or data. Integrity ensures that data in a database system remain
unchanged and has high safety. There exist some standards and rules of laws that
ensure that data achieves a high sense of integrity. The rules and standards are often
implemented during the phase of design. In regards to the integrity of data in a
database system, the following are the common types of integrity that exist:
a. Physical integrity – This is a type of integrity that ensures that there is the
safeguarding of data in its wholeness as it is being retrieved or stored. Factors that
always compromise physical integrity include the following; natural disasters
such as earthquakes, malicious acts by hackers, power outages, and storage

erosion, among others. The purpose of this integrity is to ensure that data remain
unchanged even if such calamities occur.
b. Logical integrity – This type of integrity ensures that data is not changed as it is
being used in various manners in a relational database system. The purpose of
logical errors is to protect data from errors by human beings as well as malicious
people such as hackers. There are four main types of logical integrity which
includes the following:
i. Entity integrity – This type of logical integrity depends on primary keys.
The purpose of entity integrity is to ensure that an entry in a table is fed in
the system only once.
ii. Referential integrity – This kind of integrity ensures that there is
uniformity in the storage of data in a database system (Amran,
Mohammed, and Diana, 2018).
iii. Domain integrity – This category of integrity makes sure that data in the
domain remain as accurate as possible.
iv. User-defined integrity – This is a kind of integrity that is created by users
of a system for the purpose of making sure that constraints and rules are
best for particular needs.
iii. Manipulative – Data manipulation refers to the process of changing information with
the main purpose of making it more readable and organized. The tool that is often
used for data manipulation is called Data Manipulation Language, abbreviated as
DML. DML is a programming language that has the capabilities of removing,
altering, and adding databases with the major intention of making the information
more readable to the users. The following are some of the usefulness of data
manipulation:
a. Data manipulation ensures that data achieves a great sense of consistency. The
consistency of data ensures that users have better readability and understanding of
the data.
b. Data manipulation is essential in data projections. Data analysts can use historical
data in projecting various aspects, such as the future expected profits (Amran,
Mohammed, and Diana, 2018).
c. Data manipulation makes sure that data has great value. The fact that operations
such as deletion, addition, and updating a database system can be done easily
ensures that there is more value for the organizational data.
Question 3:
The three major manipulative operators in the relational algebra are the Restrict, Project and
Product operators. Describe and explain these operators.

i. Restrict – A restrict operator is used to make a copy of a table with some rows except
those ones that fails to meet particular conditions.
ii. Project – A project operator is used to make a copy of a table with an exclusion of
some attributes.
iii. Product –The purpose of this operator is to combine tuples of one table to other tables
in the same database system.
SECTION B
Question 5:
a. Identify & describe at least six factors that can be used to evaluate the quality and
Veracity of a Data Model.
The following are the six factors that can be used for the evaluation of data veracity and
quality:
i. Accuracy – Accuracy is described as the degree or extent to which a particular
data is right or not. The database system has to portray real-world information.
For instance, if a student scores 90% in a Mathematics test and the system records
this score as 60%, that information is not accurate at all (Cappiello, Sama, and
Vitali, 2018). In such a case, the system should record the 90% scored by the
particular student in the Mathematics test.
ii. Completeness – Completeness is an aspect of data veracity that ensures that there
is comprehensiveness. Optional data is not always important in this case. For
example, if a system requires a user to feed it with first name and last name, the
middle one will be optional in this case. Such information will not be necessary as
it is not required by the system.
iii. Consistency – The consistency aspect of data ensures that stored in various
storage facilities is the same. In the contemporary world, organizations often find
it necessary to have their data stored in different places such as the physical
location and in the cloud, among others. In order to achieve better functionalities,
the data has to be consistent in all the storage facilities.
iv. Timeliness – Timeliness refers to the ever availability of information. The
information has to be available whenever it is needed. For instance, if the data of

a company is ready the moment the managers want to use it for decision making,
then that data is said to be timely.
v. Validity – The validity of data can be defined as an aspect of data veracity that
does not follow some rules or format. An example of data validity is the aspect of
birthdays. Some systems might specify the manner in which the data is fed into
the system, and if that is not done, then the user gets some validity error.
vi. Uniqueness – Uniqueness of data is an aspect that ensures that information
appears only once in the database system. Data duplication is a common error that
database programmers often meet in their daily operations. The database
programmers are required to be keen while entering data into the system.
Moreover, they are expected to make frequent reviews of data to ensure that the
uniqueness of data is achieved.
b) Identify and explain at least eight factors that can negatively impact the Veracity of data.
The following are some of the factors that can affect data veracity in a negative manner:
i. Human error – As it is well known, human beings are prone to make mistakes. Some
of the mistakes that human beings can make can be great. For instance, an employee
of an organization can make a mistake that leads him or her to delete some essential
data unintentionally. Other human errors that can be done to negatively affect data
veracity include; overwriting crucial files and playing some roles in data loss
(Cappiello, Sama, and Vitali, 2018).
ii. Malware and viruses – Viruses and malware are the most common causes of data loss
in any organization. Viruses and malware have the capabilities of deleting and
stealing organizational data files. In the event that data is lost, then the employees of
the particular company will find it hard to execute various functions that are expected
from them.
iii. Damage of the hard drives – Malfunctions of the hard drives is the major cause of
data loss in the world. In fact, the most fragile part of a computer system is the hard
drive, and it is the part that is responsible for data storage. Statistically, around a
hundred and forty hard drives crash each week. Thus, a lot of businesses are brought
to a standstill sometimes every week.
iv. Power outages – In the event that there is a power outage, a business will be brought
to a halt for some time. In the process, the software systems that were being on were
shut down without any warning. Such makes any unsaved data or information get
lost. Moreover, the data in the computer system is at great risk of being corrupted due
to the fact that the machines were not properly shut down.
v. Computer theft – In the contemporary world, the modern workplace has become
mobile. Thus, employees of an organization are required to own a laptop so that they
can execute various functions from any place, including their homes. In situations
where such laptops are stolen, then these workers lose crucial data that they might
have been working on. It will be more absurd if an employee's laptop is stolen and the
data they were working on is not stored anywhere else.
vi. Liquid damage – Spilling of water or tea on a laptop has the ability to cause a short
circuit. This might make such a laptop to be unrecoverable; thus, the data inside the
laptop is lost.
vii. Disasters – There exist some disasters that can make it possible for organizational
data to get lost. An example of such a disaster is a fire incident that can happen to a
workplace, thereby burning down every computing device.
viii. Software corruption – Improperly shutdowns of software programs have the ability to
lead to data loss by deleting any program in progress.
Question 6:
a) Explain the intrinsic value of "data" and how organizations can make use of that value.
The intrinsic value of data refers to the degree of importance of the particular information.
The intrinsic value of data can be achieved through complicated financial models and
calculations. Organizations can make use of data for four main purposes that include the
following:
i. Description – In this case, data is used to make a description of an occurrence that has
occurred (Li and Snavely, 2018). For example, a blog received 50,000 viewers in a
week's time.
ii. Diagnosing – In this case, the data is used to give an explanation of the reason why
something happened. For example, the blog received 50 000 viewers that week
because an email encouraging customers to do so had been sent to 100,000 potential
customers.
iii. Predicting – In this case, the date is used for the prediction of what will happen in the
coming days. For instance, if an email is sent to 200,000 potential customers, then the
blog is likely to receive 100,000 viewers in a week's time.
iv. Prescribing – In this case, the data is used to describe a particular action that will be
taken. For example, 200,000 emails will be sent to potential customers every week.
b) Outline and discuss the concept of "data modelling." What features characterize a good
data model?
Data modelling refers to the procedures of building data models for particular
information that is needed to be stored in the database system. The data models, in this case,
are often a representation of the rules, relationships between objects of data, and the data
objects. Data modelling is helpful for visual depiction of information and implementation of
governmental policies, regulatory compliance, and business rules of data. The following are
good attributes of a good data model:
i. Consumable – The data in data models should be easily consumed by users. In other
words, this means that users should often be in a position to easily understand the data
in the models.
ii. Scalable – The data in the data model should not be static. The data should provide a
user with a chance of expanding it (Li and Snavely, 2018).
iii. Predictable performance – A user should be in a position of making some predictions
based on the data given in the data models.
iv. Adaptability – A user should be in a position to make some changes to data given in
the models of data.

DMF Exam Jan2022

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DMF Exam Jan2022

Uploaded by

Copyright:

Available Formats

MSc DATA SCIENCE

Drawing a logical Entity-Relationship Model derived from an invoice of ZZ trading of Bristol.

Customer Customer ID, name, address, and order

Product Product ID, name, description, price, quantity,

and line total.

order from the company.

systems. The pros include the following:

engineers to have a complete understanding of the entire system design.

result, ERD images offer a lot of freedom.

effectively (Amran, Mohammed, and Diana, 2018).

creating. Throughout the process, this serves as a guiding concept.

these two parties to communicate with one another.

these three aspects of the relational model.

manner in which a database system is made up of tables (Amran, Mohammed, and

tables in the same database system.

information or data. Integrity ensures that data in a database system remain

implemented during the phase of design. In regards to the integrity of data in a

safeguarding of data in its wholeness as it is being retrieved or stored. Factors that

always compromise physical integrity include the following; natural disasters

such as earthquakes, malicious acts by hackers, power outages, and storage

unchanged even if such calamities occur.

being used in various manners in a relational database system. The purpose of

includes the following:

i. Entity integrity – This type of logical integrity depends on primary keys.

The purpose of entity integrity is to ensure that an entry in a table is fed in

the system only once.

ii. Referential integrity – This kind of integrity ensures that there is

uniformity in the storage of data in a database system (Amran,

Mohammed, and Diana, 2018).

domain remain as accurate as possible.

iv. User-defined integrity – This is a kind of integrity that is created by users

best for particular needs.

used for data manipulation is called Data Manipulation Language, abbreviated as

DML. DML is a programming language that has the capabilities of removing,

Mohammed, and Diana, 2018).

ensures that there is more value for the organizational data.

Product operators. Describe and explain these operators.

those ones that fails to meet particular conditions.

in the same database system.

Veracity of a Data Model.

i. Accuracy – Accuracy is described as the degree or extent to which a particular

particular student in the Mathematics test.

is comprehensiveness. Optional data is not always important in this case. For

it is not required by the system.

the data has to be consistent in all the storage facilities.

iv. Timeliness – Timeliness refers to the ever availability of information. The

information has to be available whenever it is needed. For instance, if the data of

then that data is said to be timely.

vi. Uniqueness – Uniqueness of data is an aspect that ensures that information

database programmers often meet in their daily operations. The database

uniqueness of data is achieved.

(Cappiello, Sama, and Vitali, 2018).

to a standstill sometimes every week.

data they were working on is not stored anywhere else.

workplace, thereby burning down every computing device.

lead to data loss by deleting any program in progress.

because an email encouraging customers to do so had been sent to 100,000 potential

blog is likely to receive 100,000 viewers in a week's time.

good attributes of a good data model:

user with a chance of expanding it (Li and Snavely, 2018).

iii. Predictable performance – A user should be in a position of making some predictions

based on the data given in the data models.