BInDM Demo

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Business Intelligence (bussinNM)

Final Exam (VERSION) YEAR-SEMESTER


Time allowed: 75 minutes

Please note that the purpose here is to give you an idea about the level of detail of the questions on the exam.
These sample questions are not meant to be exhaustive and you may certainly find topics on the exam that are
not covered here at all.
Important notes:
Use this document as it is, do not make a copy of it. It contains special personalized watermarks.
Enter your answers in the marked placeholders only. Do not modify other parts of the document.
Before uploading export this document into Portable Document Format, and submit that PDF file.

INTRODUCTION

This first question must be answered for you to get credit for this exam.

I certify that I take this exam independently and that I have not received nor given unauthorized help on this
exam, which would be a violation of the Academic Integrity Policy and subject to the penalties described in the
syllabus and the program handbook. By entering my first timestamp below I affirm that I am a honest student
who completed thus exam with integrity.

This exam will be closed according to the PTE moodle server's time and it can not be extended or reopened. In
order to compare the server time with your local clock, please visit the ➥ exam timestamp maker link. Enter
your neptun code into the input box and press the button. Copy the generated line to the clipboard and paste
here as your answer.

The timestamp format is alias@yyyymmddThhmmssTZ::checksum, where the alias is your anonym


name for this exam, followed by the current time in server's local timezone (TZ). Compare the time with the
one of your local device for justification, and remember your alias. Results will be published using this
randomly generated anonymous name.

Before you read on, paste your first timestamp here:

chulex@20220508T170347CEST::fbFzUCyD8UBKLv4q7MmAonQ9wUnt
Do not close the timestamp maker tab or window. Whenever you want to know the exact time according to the
server, just press the button again. If you are asked to generate a new actual timestamp just do the same, and
copy the whole new timestamp line and paste where it is required. You will get several aliases, but you need to
remember only this very first one.

QUESTIONS

Q1 [1 point for 1 minute]


This kind of integrity constraint of databases is rule to validate the data as being of the appropriate range and
type.
(Write only one or few words.)

domain integrity
Q2 [1 point for 1 minute]
How is an entity called if it is dependent on another entity for its existence (e.g., users and passwords).
If an user is removed, then the dependent data must also be removed.
(Write only one or few words.)

weak entity
Q3 [1 point for 1 minute]
Any use of inaccurate or corrupted data to do any analysis is known with this 4 letter abbreviation.
(Write only one or few words.)

GIGO (garbage-in-garbage-out)
Q4 [1 point for 1 minute]
What kind of tools are used to analyse data cubes for historical reporting and predictive data mining purposes?
(Write only one or few words.)

OLAP (Online Analytical Processing) Tools


Q5 [1 point for 1 minute]
How that modern database language approach is called which provides flexible schemas for the storage and
retrieval of data beyond the traditional table structures found in relational databases?
(Write only one or few words.)

NoSQL
Q6 [1 point for 1 minute]
Name at least two principles of effective visualization and give a counterexample for both of them.
(Write one or few sentences.)

Q7 [1 point for 1 minute]


Select two from the key elements of the typical data warehouse architecture and describe how they related to
the data cubes.
(Write one or few sentences.)

Q8 [1 point for 1 minute]


What is a binary and multiway splitting?
(Write one or few sentences.)

Q9 [1 point for 1 minute]


Explain how texts may present a clear and consistent or mixed sentiment?
(Write one or few sentences.)
Q10 [1 point for 1 minute]
Name two distances which can be used in clustering algorithms and define how they are calculated.
(Write one or few sentences.)

Q11 [3 points for 9 minutes]


Under what conditions can a linear regression method be used for modeling? What kind of specific data
preparation is needed? And how to interpret and use the model results?
(List the main input-output requirements of the method, and not the details of the algorithm.)

Q12 [3 points for 9 minutes]


Compare and contrast when SOM or k-Means type of clustering methods are best to use.
(Write a structured, parallel comparison.)

Q13 [3 points for 9 minutes]


Describe the steps required to build a Self-organizing Maps (SOM) model in an enumerated list or pseudo-code
format. Also add an explanatory sentence to each item on your list.
(Write a list with comments.)
Q14 [4 points for 9 minutes]
Why CRISP-DM was designed to be a cyclical and iterative process? Select of the backward arrows on its
diagram and explain when it should be followed.
(Put concepts into context, give examples.)

Q15 [4 points for 9 minutes]


Here are a few comments from Twitter about COVID-19:
 Just saw the Coronavirus referred to as Captain Trumps.
 New: Captain Crozier has tested positive for Coronavirus.
 NEW: CA has 12,026 confirmed positive cases of #COVID19.
 2,300 of those who have tested positive are in our hospitals.
 Stay home - take this seriously.
 Everyone must stay home on Sunday.
Create a TDM with not more than six key terms, treat each comment as a document.
(Draw the table, and explain how to interpret it.)

Q16 [4 points for 10 minutes]


An email SPAM detection system was tested for several messages in order to determine its quality. In case of
SPAM detection unsolicited messages are considered as the positive class though they are viewed very
negatively. After the experiment the elements of the confusion matrix are the following.
There were 10 true positive, 4 false positive, 5 false negative and 3 true negative cases.
Calculate the following numbers:
 Step 1: The accuracy of the SPAM detection system.
 Step 2: The precision of the SPAM class.
 Step 3: The recall of the non-SPAM class.
Calculations and explanation is needed for the answer!
(Both real and percentage forms are accepted if they are accurate to 4 or 2 decimal places.)
Q17 [4 points for 10 minutes]
Below a business scenario is described where data mining might be applied. Follow the steps of CRISP-DM
and give an example task to do in this scenario. Indicate what kind of data mining technique or machine
learning method you would apply. How and why?
For the purpose of enhancing direct sales transaction revenue and profit, the Analytics team at HP was asked to
execute a cross-sell/up-sell project for the Small and Medium Business (SMB) store’s online site and call
center. Cross-selling includes, among others, adding a monitor, docking station, or a digital camera to a
notebook purchase. An up-sell is loosely defined as “inside the box.” For example, up-selling would include
adding anything that enhances value of a PC, such as upgraded memory, hard drive, or a DVD drive. The
pilot’s overall goal was to increase the revenue and margin of the store by increasing average order value
(AOV) and attach rate per product by implementing an analytic solution.
Answer: (Outline and explain your scenario how to handle the problem.)

OUTRODUCTION

As final step visit the ➥ exam timestamp maker link again, enter your neptun code into the input box and press
the button. Copy the generated line to the clipboard and paste here as your answer.

Right before uploading this PDF file, paste your last timestamp here:

xukes@20220508T173404CEST::ZIN94LPoHZMnJXSRCeZZFxSfsGez
The first timestamp will be compared to the opening time, and the last one to the closing time of the exam.

Thank you for your hard and honest work! If you have found errors in the questions or just have any comment
on the exam, please feel free to write down it here:

You might also like