Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Student Name: ………………………………………………………………..

AIN SHAMS UNIVERSITY


FACULTY OF ENGINEERING
Computer and Systems Engineering Department
4th Year, Computer and Systems Engineering
2nd Semester 2019/2020 Course Code: CSE412 Time Allowed: 2 Hrs.
Selected Topics in Computer Engineering (Elective Course (4))
The Exam Consists of Three Questions in Two Pages. Maximum Marks: 90 Marks 1/2
‫تعليمات هامة‬
.‫• حيازة التيلفون المحمول مفتو ًحا داخل لجنة االمتحان يعتبر حالة غش تستوجب العقاب وإذا كان ضروريًا الدخول بالمحمول فيوضع مغلقًا فى الحقائب‬
.‫• ال يسمح بدخول سماعة األذن أو البلوتوث‬
.Open Book ‫ ويستثنى من ذلك االمتحانات ذات الكتاب المفتوح‬،‫• ال يسمح بدخول أى كتب أو مالزم أو أوراق داخل اللجنة والمخالفة تعتبر حالة غش‬

ANSWER ALL QUESTIONS

Question 1: [30 Marks]


Assume we have a set of databases for a set of research institutes with a database for each institute, and
we need to build a data warehouse for this set of databases. Each database contains information about each
researcher who can be chief researcher, researcher, or co-researcher. Also, it contains information for post-
graduate students who work to earn Mater's or Ph.D. degrees in any of the departments in the institute.
Additionally, it contains information about the employees in the institute. Each database stores several
information such as the biographical information for the employees, research staff, and students (e.g.,
name, date of birth, address ... etc.), information about the departments, and the facilities in each
department (e.g., labs, equipment … etc.). Employment history of the research staff and employees is stored
in the database as well. Moreover, each database stores information about the degrees (Master’s, PhD …
etc.) including the department that offers the degree for each post-graduate student for the corresponding
institute. Information about the research projects in each department are stored also in the database such
as project title, start date, end date, total budget, the researchers/co-researchers involved in it, the chief
researcher who is responsible for the project. Moreover, each database contains information about the
research publications of each researcher in the corresponding institute such as article title, place of
publishing it, date of publication, and the impact factor of the journal in which he published his article.
Additionally, each database contains the biographical information for each student, his specialization, the
research projects he currently works on, his role in each research project, the time he dedicates for each
research project, and the prior certificates. Additionally, each database contains information about the
research directions in each department in the corresponding institute and who participates in each direction
either from the research staff or from the post-graduate students.
a. Sketch the snowflake schema for the data warehouse.
b. For the developed schema, provide 1D, 2D, and 3D cuboides for the main tables in the data warehouse.
c. Write the corresponding SQL statements that use CUBE and/or ROLLUP to get the 1D, 2D, and 3D
cuboides.

Question 2: [30 Marks]


a. Assume we have four itemsets; X, Y, U, and V and we collected 9 transactions that are as follows: UV,
VX, XU, YV, XYU, XYV, YUV, XUVY, and XUV, assume also the minimum support is 40% and the minimum
confidence is 65%. Use Apriori algorithm to find all association rules among X, Y, U, and V, compute lift
and leverage for these rules.
AIN SHAMS UNIVERSITY, FACULTY OF ENGINEERING
Computer and Systems Engineering Department, 4th Year Computer and Systems Engineering
2nd Semester 2019/2020 Course Code: CSE412 Time Allowed: 2 Hrs.
Selected Topics in Computer Engineering (Elective Course (4))
The Exam Consists of Three Questions in Two Pages. 2/2

b. Consider the dataset given in the shown table, which Purchase Number Unit Total
describes a set of purchase orders in the information Order ID of Items Price Price
system of an online store. 79356 17 10 170
i. Find all outliers (if any) in field “Number of Items”. 52107 93 33 3069
97825 15 42 630
ii. Normalize field “Unit Price” using decimal scaling
81459 931 11 9543
z-score method. 82673 85 68 5780
iii. Identify all noisy data in the field “Total Price”. 59424 55 91 5005

Question 3: [30 Marks]


a. The following table shows a set of examples to be used by a classification system for a car insurance
company. The attributes used by this system are age, car status, housing type, and education. While the
corresponding classification is car insurance category.
Age Car Status Housing Type Education Car Insurance Category
≤ 20 and < 40 New Free Housing College Degree Medium
≤ 40 and < 60 New Free Housing High School High
≤ 60 Crashed before Free Housing College Degree Medium
≤ 60 Old but no crash Renter High School Medium
≤ 40 and < 60 Old but no crash Renting College Degree Small
≤ 20 and < 40 Crashed before Owner College Degree Small
≤ 20 and < 40 Old but no crash Renting High School Medium
≤ 60 Crashed before Owner High School Medium
≤ 20 and < 40 Crashed before Owner High School Medium
≤ 40 and < 60 Crashed before Free Housing Doctorate High
≤ 40 and < 60 New Renter College Degree Small
≤ 60 Crashed before Free Housing High School High
a. Compute the Entropy for the given dataset.
b. Construct the decision tree for this dataset.
c. Design a MapReduce algorithm to compute the decision tree.
d. If we have a college degree 30 years old client with a crashed before car and has free housing, then
find the car insurance category to which this client is entitled when using each of the following
methods, then compare the results you get from each method:
i. Naïve Bayesian classifier method
ii. 1NN approach

b. Consider the dataset shown in the given table. Each Sample X Y Z


sample consists of three attributes X, Y, and Z. 1 11 33 1
Assume we need to group the given dataset into 2 49 78 27
three clusters. Use K-means method to find the 3 25 13 10
Centroid of each cluster and the samples belonging 4 14 29 4
5 28 11 11
to each cluster assuming samples 1, 3, and 7 are the
6 9 37 6
initial Centroids. 7 58 97 33

Wishing you the best of luck…


Examination Committee:
Dr. Ashraf Salem, Dr. Gamal A. Ebrahim, and Dr. Mahmoud I. Khalil Exam Date: Aug. 04, 2020

You might also like