Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

MSc DATA SCIENCE

SECTION A

Question 1:

Drawing a logical Entity-Relationship Model derived from an invoice of ZZ trading of Bristol.

Aspects such as keys, relationships, attributes, and entities have to be fully described.
The above invoice has three major entities. The entities are Customer, Order, and

Product. An entity refers to an object that exists. The entities are the tables in a relational

database system. Every entity in a database system must always have some features that describe

it (Amran, Mohammed, and Diana, 2018). The features that describe the entities are known as

attributes. The following table shows the entities of the database system and their attributes:

Entity Attributes

Customer Customer ID, name, address, and order

number
Order Order ID, date, time, and product ID

Product Product ID, name, description, price, quantity,

and line total.

In a database system, there exist some special attributes that uniquely identify an entry.

These special attributes are called primary keys. Primary keys are always the first attributes

when describing an entity. Customer ID is the primary key for the Customer entity, the Order ID

is the primary key for the Order entity, while Product ID is the primary key for the Product

entity. Also, there is another special type of key called foreign keys. In any relational database

system, these keys will always be there. A foreign key is a primary key that is passed to another

table or entity of the same database system. The attributes order number and product ID in the

entities Customer and Order respectively are the foreign keys in the given database system.

Relationships define the kind of manner in which participating entities relate with one

another in a database system. Orders are the kind of relationship that exists between Customer

and Order entities. A customer makes orders an order from the company. Similarly, a customer

purchases a product from the organization. Another aspect that is worth mentioning is

cardinality. Cardinality refers to the number of times in which instances of an entity are mapped

onto instances of another entity. The kind of cardinality that is found in the database system is

one-to-many. This is the mode of cardinality where a single instance of an entity is mapped

many times to the instances of another entity. For instance, a customer can make more than one

order from the company.

There exist several advantages of employing logical entity relational models in database

systems. The pros include the following:


i. ERD diagrams make concepts easier to understand. It allows database designers and

engineers to have a complete understanding of the entire system design.

ii. The visual representation provides designers with a clear view of the systems they are

creating. This serves as a guide for them during the design process.

iii. Database designers can quickly turn ERD diagrams into other types of diagrams. As a

result, ERD images offer a lot of freedom.

iv. ERD images help a team of database engineers in creating a database system more

effectively (Amran, Mohammed, and Diana, 2018).

v. ERD diagrams give database designers a clear picture of the kind of systems they're

creating. Throughout the process, this serves as a guiding concept.

vi. ERD diagrams serve as a link between customers, clients, and designers. ERD allows

these two parties to communicate with one another.

Question 2

The relational model of databases is based on a formal foundation using relational algebra, which

is called the Relational Model of Data. It is often described as having three related aspects: the

Structural aspect, the Integrity aspect, and the Manipulative aspect. Describe and explain

these three aspects of the relational model.


i. Structural – Thee structural aspects of a relational data model describes the

architecture of the database system. Normally, the structural aspects deal with the

manner in which a database system is made up of tables (Amran, Mohammed, and

Diana, 2018). The tables are the entities of a database system. Each table should have

a unique name from one another. Moreover, these tables often comprise some

features that often define them. The features constitute the attributes of a database

system. The first attribute in every entry of a table is always the primary key. The

primary key is a special attribute that uniquely identifies every entry in the particular

tables. In a relational data model, there is also another special key called a foreign

key. It is these kinds of keys that often bring about the idea of relationships among

entities in a database system. Foreign keys are primary keys that are passed to other

tables in the same database system.

ii. Integrity – Integrity can be defined as the total consistency and accuracy of

information or data. Integrity ensures that data in a database system remain

unchanged and has high safety. There exist some standards and rules of laws that

ensure that data achieves a high sense of integrity. The rules and standards are often

implemented during the phase of design. In regards to the integrity of data in a

database system, the following are the common types of integrity that exist:

a. Physical integrity – This is a type of integrity that ensures that there is the

safeguarding of data in its wholeness as it is being retrieved or stored. Factors that

always compromise physical integrity include the following; natural disasters

such as earthquakes, malicious acts by hackers, power outages, and storage


erosion, among others. The purpose of this integrity is to ensure that data remain

unchanged even if such calamities occur.

b. Logical integrity – This type of integrity ensures that data is not changed as it is

being used in various manners in a relational database system. The purpose of

logical errors is to protect data from errors by human beings as well as malicious

people such as hackers. There are four main types of logical integrity which

includes the following:

i. Entity integrity – This type of logical integrity depends on primary keys.

The purpose of entity integrity is to ensure that an entry in a table is fed in

the system only once.

ii. Referential integrity – This kind of integrity ensures that there is

uniformity in the storage of data in a database system (Amran,

Mohammed, and Diana, 2018).

iii. Domain integrity – This category of integrity makes sure that data in the

domain remain as accurate as possible.

iv. User-defined integrity – This is a kind of integrity that is created by users

of a system for the purpose of making sure that constraints and rules are

best for particular needs.

iii. Manipulative – Data manipulation refers to the process of changing information with

the main purpose of making it more readable and organized. The tool that is often

used for data manipulation is called Data Manipulation Language, abbreviated as

DML. DML is a programming language that has the capabilities of removing,

altering, and adding databases with the major intention of making the information
more readable to the users. The following are some of the usefulness of data

manipulation:

a. Data manipulation ensures that data achieves a great sense of consistency. The

consistency of data ensures that users have better readability and understanding of

the data.

b. Data manipulation is essential in data projections. Data analysts can use historical

data in projecting various aspects, such as the future expected profits (Amran,

Mohammed, and Diana, 2018).

c. Data manipulation makes sure that data has great value. The fact that operations

such as deletion, addition, and updating a database system can be done easily

ensures that there is more value for the organizational data.

Question 3:

The three major manipulative operators in the relational algebra are the Restrict, Project and

Product operators. Describe and explain these operators.


i. Restrict – A restrict operator is used to make a copy of a table with some rows except

those ones that fails to meet particular conditions.

ii. Project – A project operator is used to make a copy of a table with an exclusion of

some attributes.

iii. Product –The purpose of this operator is to combine tuples of one table to other tables

in the same database system.

SECTION B

Question 5:
a. Identify & describe at least six factors that can be used to evaluate the quality and

Veracity of a Data Model.

The following are the six factors that can be used for the evaluation of data veracity and

quality:

i. Accuracy – Accuracy is described as the degree or extent to which a particular

data is right or not. The database system has to portray real-world information.

For instance, if a student scores 90% in a Mathematics test and the system records

this score as 60%, that information is not accurate at all (Cappiello, Sama, and

Vitali, 2018). In such a case, the system should record the 90% scored by the

particular student in the Mathematics test.

ii. Completeness – Completeness is an aspect of data veracity that ensures that there

is comprehensiveness. Optional data is not always important in this case. For

example, if a system requires a user to feed it with first name and last name, the

middle one will be optional in this case. Such information will not be necessary as

it is not required by the system.

iii. Consistency – The consistency aspect of data ensures that stored in various

storage facilities is the same. In the contemporary world, organizations often find

it necessary to have their data stored in different places such as the physical

location and in the cloud, among others. In order to achieve better functionalities,

the data has to be consistent in all the storage facilities.

iv. Timeliness – Timeliness refers to the ever availability of information. The

information has to be available whenever it is needed. For instance, if the data of


a company is ready the moment the managers want to use it for decision making,

then that data is said to be timely.

v. Validity – The validity of data can be defined as an aspect of data veracity that

does not follow some rules or format. An example of data validity is the aspect of

birthdays. Some systems might specify the manner in which the data is fed into

the system, and if that is not done, then the user gets some validity error.

vi. Uniqueness – Uniqueness of data is an aspect that ensures that information

appears only once in the database system. Data duplication is a common error that

database programmers often meet in their daily operations. The database

programmers are required to be keen while entering data into the system.

Moreover, they are expected to make frequent reviews of data to ensure that the

uniqueness of data is achieved.

b) Identify and explain at least eight factors that can negatively impact the Veracity of data.

The following are some of the factors that can affect data veracity in a negative manner:

i. Human error – As it is well known, human beings are prone to make mistakes. Some

of the mistakes that human beings can make can be great. For instance, an employee

of an organization can make a mistake that leads him or her to delete some essential

data unintentionally. Other human errors that can be done to negatively affect data

veracity include; overwriting crucial files and playing some roles in data loss

(Cappiello, Sama, and Vitali, 2018).

ii. Malware and viruses – Viruses and malware are the most common causes of data loss

in any organization. Viruses and malware have the capabilities of deleting and

stealing organizational data files. In the event that data is lost, then the employees of
the particular company will find it hard to execute various functions that are expected

from them.

iii. Damage of the hard drives – Malfunctions of the hard drives is the major cause of

data loss in the world. In fact, the most fragile part of a computer system is the hard

drive, and it is the part that is responsible for data storage. Statistically, around a

hundred and forty hard drives crash each week. Thus, a lot of businesses are brought

to a standstill sometimes every week.

iv. Power outages – In the event that there is a power outage, a business will be brought

to a halt for some time. In the process, the software systems that were being on were

shut down without any warning. Such makes any unsaved data or information get

lost. Moreover, the data in the computer system is at great risk of being corrupted due

to the fact that the machines were not properly shut down.

v. Computer theft – In the contemporary world, the modern workplace has become

mobile. Thus, employees of an organization are required to own a laptop so that they

can execute various functions from any place, including their homes. In situations

where such laptops are stolen, then these workers lose crucial data that they might

have been working on. It will be more absurd if an employee's laptop is stolen and the

data they were working on is not stored anywhere else.

vi. Liquid damage – Spilling of water or tea on a laptop has the ability to cause a short

circuit. This might make such a laptop to be unrecoverable; thus, the data inside the

laptop is lost.
vii. Disasters – There exist some disasters that can make it possible for organizational

data to get lost. An example of such a disaster is a fire incident that can happen to a

workplace, thereby burning down every computing device.

viii. Software corruption – Improperly shutdowns of software programs have the ability to

lead to data loss by deleting any program in progress.

Question 6:

a) Explain the intrinsic value of "data" and how organizations can make use of that value.
The intrinsic value of data refers to the degree of importance of the particular information.

The intrinsic value of data can be achieved through complicated financial models and

calculations. Organizations can make use of data for four main purposes that include the

following:

i. Description – In this case, data is used to make a description of an occurrence that has

occurred (Li and Snavely, 2018). For example, a blog received 50,000 viewers in a

week's time.

ii. Diagnosing – In this case, the data is used to give an explanation of the reason why

something happened. For example, the blog received 50 000 viewers that week

because an email encouraging customers to do so had been sent to 100,000 potential

customers.

iii. Predicting – In this case, the date is used for the prediction of what will happen in the

coming days. For instance, if an email is sent to 200,000 potential customers, then the

blog is likely to receive 100,000 viewers in a week's time.

iv. Prescribing – In this case, the data is used to describe a particular action that will be

taken. For example, 200,000 emails will be sent to potential customers every week.

b) Outline and discuss the concept of "data modelling." What features characterize a good

data model?

Data modelling refers to the procedures of building data models for particular

information that is needed to be stored in the database system. The data models, in this case,

are often a representation of the rules, relationships between objects of data, and the data

objects. Data modelling is helpful for visual depiction of information and implementation of
governmental policies, regulatory compliance, and business rules of data. The following are

good attributes of a good data model:

i. Consumable – The data in data models should be easily consumed by users. In other

words, this means that users should often be in a position to easily understand the data

in the models.

ii. Scalable – The data in the data model should not be static. The data should provide a

user with a chance of expanding it (Li and Snavely, 2018).

iii. Predictable performance – A user should be in a position of making some predictions

based on the data given in the data models.

iv. Adaptability – A user should be in a position to make some changes to data given in

the models of data.

You might also like