Professional Documents
Culture Documents
Unit-8 - Chapter-18 & 27: Russell Stuart, Norvig Peter, Artificial Intelligence: A Modern Approach, 1995
Unit-8 - Chapter-18 & 27: Russell Stuart, Norvig Peter, Artificial Intelligence: A Modern Approach, 1995
• Chapter-18 & 27
•
•
Inductive learning method
•
•
Inductive learning method
•
•
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:
•
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:
• Trivially, there is a consistent decision tree for any training set with one path to leaf for each
example (unless f nondeterministic in x) but it probably won't generalize to new examples
A set of examples X1,…, X12 for the restaurant domain is shown in Figure 18.5.
The positive examples are ones where the goal WillWait is true (X1, X3,....) and
negative examples are ones where it is false (X2, X5, ...).
Patrons has the highest IG of all attributes and so is chosen by the DTL
algorithm as the root
Example contd.
• Decision tree learned from the 12 examples:
• Analogies:
– Elections combine voters’ choices to pick a good
candidate
– Committees combine experts’ opinions to make better
decisions
• Intuitions:
– Individuals often make mistakes, but the “majority” is
less likely to make mistakes.
– Individuals often have partial knowledge, but a
committee can pool expertise to make better decisions
Ensemble Learning
• Definition: method to select and
combine an ensemble of hypotheses
into a (hopefully) better hypothesis
• Can enlarge hypothesis space
– Perceptron (a simple kind of
neural network)
• linear separator
– Ensemble of perceptrons
• polytope
Bagging
Bagging
• Assumptions:
– Each hi makes error with probability p
– The hypotheses are independent
• Majority voting of n hypotheses:
– k hypotheses make an error:
– Majority makes an error:
• – With n=5, p=0.1 error( majority ) < 0.01
Weighted Majority
• In practice
– Hypotheses rarely independent
– Some hypotheses make fewer errors than others
• Let’s take a weighted majority
• Intuition:
– Decrease weight of correlated hypotheses
– Increase weight of good hypotheses
Boosting
There are N
examples.
There are M
“columns”
(hypotheses),
each of which
has weight zm
What can we boost?
• Advantages
– No need to learn a perfect hypothesis
– Can boost any weak learning algorithm
– Boosting is very simple to program
– Good generalization
• Paradigm shift
– Don’t try to learn a perfect hypothesis
– Just learn simple rules of thumbs and boost them
Boosting Paradigm
• Important topics
• 1) Components of agents
• 2) architectures
• 3)future direction