Individual 2022 Assignment New PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Data Mining and Warehousing

Individual Assignment
Algorithm and Mathematics of Data Mining
Submission Deadline: June 24, 2022
Note: Only handwritten answer is accepted

Introduction

This assignment was designed to allow you asses how much you know about the algorithm and
mathematics of clustering, classification and association rule mining. Hence, you are asked to provide the
details (also steps) for each of the following questions.

Task 1: Clustering.

1) Use the k-means clustering algorithm and Euclidean distance to cluster the following 8 examples
into 3 clusters:
Points X1 X2
P1 10 2
P2 5 2
P3 4 8
P4 8 5
P5 5 7
P6 4 6
P7 2 1
P8 9 4
And also assume that Point P1, P4 and P7 are initially selected as a cluster center. The stopping
criteria is when there is no movement.
a) Show all the necessary steps until convergence?
b) Show the grouping after convergence?

2) Apply agglomerative clustering to group the data described in the above table (Question 1) and
show the dendrograms after clustering?

3) Plot all the data points in Question one on a 10X10 square. Then, if the radius is 2 and the
minimum number of point is 2, what are the clusters that DBScan would discover (show your
answer by circling all that belongs to the same cluster on the 10X10 square that you created
earlier).

SET BY GADISA OLANI (PHD), JUNE 2022 1


Task 2: Association Rule Mining

1) Consider two association rules:


Rule 1: B → C,
Rule 2: C → D,

And suppose that both rule satisfy the minimum support and minimum confidence requirements.
Do you think that the rule B → D will also satisfies the minimum confidence and minimum support
requirements? If your answer is yes, justify it by proving it, otherwise show a counterexample.

2) Given the confidence score for the following rules:

C1= Confidence (B➔ C)


C2= Confidence (B➔CD)
C3= Confidence (BD➔C)
If the value of C1=0.6, calculate the possible value of C2 and C3?

3) Given the table of transaction given below, show the results of using the Apriori algorithm with
support threshold s=33.34% and confidence threshold c=60%. Enumerate all the final frequent
item sets following the example provided in the lecture class.

Transaction ID Items
T1 H, B, K
T2 H,B
T3 H,C, D
T4 D,C
T5 D,K
T6 H, C,D

4) Construct the frequent pattern tree for the transaction data given in question 3.

SET BY GADISA OLANI (PHD), JUNE 2022 2


Task 3: Classification

The goal of this task to classify the data using Decision Tree classifier.

a) Given the training data, construct the rule or Decision Tree classifier. Plot the flow chart of the
rule at the end.

Training data
Points x1 x2 y
P1 1 0 0
P2 1 1 0
P3 0 0 1
P4 0 0 1
P5 1 1 0
P6 1 1 0
P7 1 0 0

b) Using the rule that you extracted, predict the class label of P9 (0, 1)?

P9 0 1 ?

SET BY GADISA OLANI (PHD), JUNE 2022 3


Task 4: General Questions

1) When it is necessary to use Data mining?


2) Write the key weakness of DBSCAN clustering? Suggest a potential strategy to solve the DBSCAN
weakness?
3) Compared to the other classification model, why Randomforest and SVM are very successful (explain
your answer scientifically)?
4) What is the Elbow-method? Explain how it can be utilized to search for the optimal number of cluster?

SET BY GADISA OLANI (PHD), JUNE 2022 4

You might also like