Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Data Mining(21CS63)

UNIT 2: PRACTICE QUESTIONS


Course Coordinator: Sandhya B R (Sec A)

1. Consider the training examples shown in Table 4a for a binary classification pro
a) What is the entropy of this collection of training example with rés the
positive class?
b) What are the information gains of A1 and 42 relative to these t examples?
c) What is the best split (between a1 and a2) according to the Gini index?

Table 4a

2. Consider the training examples shown in Table 4.1 for a binary classification problem.

(a)Compute the Gini index for the overall collection of training examples.
(b) Compute the Gini index for the Customer ID attribute.
(c) Compute the Gini index for the Gender attribute.
Data Mining(21CS63)
UNIT 2: PRACTICE QUESTIONS
Course Coordinator: Sandhya B R (Sec A)

(d) Compute the Gini index for the Car Type attribute using multiway split.
(e) Compute the Gini index for the Shirt Size attribute using multiway split.
(f) Which attribute is better, Gender, Car Type, or Shirt Size?
(g) Explain why Customer ID should not be used as the attribute test condition even
though it has the lowest Gini.

3. Consider the following data set for a binary class problem:

a) Calculate the information gain when splitting on A and B. Which attribute would the
decision tree induction algorithm choose?
b) Calculate the gain in the Gini index when splitting on A and B. Which attribute would
the decision tree induction algorithm choose?
4. Consider a training set that contains 100 positive examples and 400 negative
examples. For each of the following candidate rules,

R1: A −→ + (covers 4 positive and 1 negative examples),


R2: B −→ + (covers 30 positive and 10 negative examples),
R3: C −→ + (covers 100 positive and 90 negative examples),
determine which is the best and worst candidate rule according to:
(a) Rule accuracy.
(b) FOIL’s information gain.
(c) The likelihood ratio statistic.

5. Briefly outline the major steps of decision tree classification.


6. Why is tree pruning useful in decision tree induction? What is a drawback of using a
separate set of tuples to evaluate pruning?
7. Given a decision tree, you have the option of (a) converting the decision tree to rules
and then pruning the resulting rules, or (b) pruning the decision tree and then
converting the pruned tree to rules. What advantage does (a) have over (b)?
8. Discuss the difference between rule based and class based ordering schemes.
9. What is the significance of gain ratio.
10. Discuss the working of rule based classifier with suirable example.
Data Mining(21CS63)
UNIT 2: PRACTICE QUESTIONS
Course Coordinator: Sandhya B R (Sec A)

You might also like