Professional Documents
Culture Documents
Automatic Knowledge Acquisition
Automatic Knowledge Acquisition
Automatic Knowledge Acquisition
Contents 5.1 The Bottleneck of Knowledge Aquisition 5.2 Inductive Learning: Decision Trees 5.3 Converting Decision Trees into Rules 5.4 Generating Decision Trees: Information gain
1 2 3 4 5 6 7 8 9
high 1,2,3,4,7
Humidity Wind Play?
1 2 3 4 5 6 7 8 9
high 1,2,3,4,7
Humidity Wind Play?
1 2 3 4 5 6 7 8 9
Wind?
weak 5,6
Case
Wind?
weak 1,3,4,7
Humidity
high 2
Wind Play?
1 2 3 4 5 6 7 8 9
Wind?
strong weak strong
Wind?
weak
Outlook?
overcast rain
yes
no
yes
yes
Case
no
Outlook Temp. Humidity Wind Play?
1 2 3 4 5 6 7 8 9
10
11
12
Smaller than the Experts tree (this is an advantageOccams Razor) Both trees agree on the root and two branches The best (the only?) measure of quality is prediction rate on unseen instances Later, we will use Information Theory to obtain trees as good as this one
13
Whats wrong with this tree? Experts notice it. Non-experts do not, nor would a program.
14
The problem is, the training data had no cases of diabetic women on their first pregnancy who were renally insufficient. The tree DID cover all observed cases, but not all possible cases! The wrong recommendation would be given in these cases.
15
16
We focus on rules for the class NO because there are fewer of them The other class is dened using negation by default (as in Prolog)
17
Simplification of a Rule
18
Simplified rules:
pregnant is implied by this being the patients fiest pregnancy, so it can be dropped. Dropping renal insufficiency actually improves the working of the rules, because those with renal insufficiency and diabetic first pregnancy should also not be treated.
19
20
10
21
22
11
23
24
12
25
13
(renal insuff. & pregnant & diabetes & first-preg) -> no (renal insuff&high press& pregn&diabetes&first-preg) -> no (renal insuff&high press) -> no
27
Gives:
(renal insuff.&diabetes & first-preg) -> no (renal insuff&high press&diabetes&first-preg) -> no (renal insuff&high press) -> no
Looking at predicting accuracy, we see that deleting renal insuff. from the first rule does not change the predictions.
(diabetes & first-preg) -> no (renal insuff&high press&diabetes&first-preg) -> no (renal insuff&high press) -> no
Now, the second rule cannot fire unless the first does. So, we can delete the second rule (it is a subset of the cases of the first).
28
14
As with deleting conditions from a rule, we can apply the same methods to deleting rules: 1. The logical approach: Where one rule is logically implied by another, the implied rule can be dropped: 2. The statistical approach: Where one rule can be dropped without worsening the predictive accuracy of the rule-set as a whole, then delete the rule.
29
30
15
31
5. Automatic Knowledge Acquisition Selecting the best attribute for the root
Random-Tree selects an attribute at random for the root of the tree. This approach tries to select the best attribute for the root. We seek an attribute which most determines the experts decision. ID-3 assesses each attribute in terms of how much it helps to make a decision. Using the attribute splits the cases into smaller subsets. The closer these subsets are to being purely one of the decision classes, the better. The formula used is called Information Gain.
32
16
Now, in each subset, we have more information as to what decision to make. -> Information Gain.
33
17
35
36
18
38
19
39
40
20
41
42
21
43
22