Ai&ml Unit 4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

what is Machine Learning. ?

The field of machine learning is concerned with the question of how


UNIT IV to construct computer programs that automatically improve with
experience.
Introduction to Machine Learning: Well-Posed Learning Problem, Tom Mitchell.
Designing a Learning system, Perspectives and Issues in Machine
Learning.
or

Concept Learning and The General-to-Specific Ordering: Machine Learning is the study of computer Algorithms that improve
Introduction, A Concept Learning Task, Concept Learning as Search, Automatically through experience.
FIND-S: Finding a Maximally Specific Hypothesis, Version Spaces and Tom Mitchell.
the Candidate Elimination Algorithm, Remarks on Version spaces and
Candidate-Elimination, Inductive Bias.

Working Applications of ML
Traditional Programming
• Classification of mortgages
Data • Predicting portfolio performance
Computer Output
Program • Electrical power control
• Chemical process control
• Character recognition
Data • Face recognition
Machine Learning Computer Program • DNA classification
Output • Credit card fraud detection
• Cancer cell detection
WEBSITES
Sample Applications
• Web search
http://videolectures.net/mlas06_mitchell_itm/
.
• Computational biology
• Finance
• E-commerce
• Space exploration
Applying AI, we wanted to build better and
• Robotics
intelligent machines
• Information extraction
• Social networks
• Debugging
• [Your favorite area]

Machine learning results from many fields including statistics, artificial intelligence, WELL-POSED LEARNING PROBLEMS
philosophy, Information theory, biology, cognitive science, computational
complexity, and control theory.
Computer learning is broadly defined as

AI Philosophy Definition:
statistics
A computer program is said to learn from experience E with
Information theory respect to some class of tasks T and performance measure
machine
P, if its performance at tasks in T, as measured by P,
learning
control theory Biology
improves with experience E.

Computational cognitive science


Complexity
For example, a computer program that learns to play checkers might
Checkers games improve its performance as measured by its ability to win at the class of
tasks involving playing checkers games, through experience obtained
by playing games against itself.

1. A checkers learning problem: 3. A robot driving learning problem:


 Task T: playing checkers  Task T: driving on public four-lane highways using vision sensors
 Performance measure P: percent of games won against opponents  Performance measure P: average distance travelled before an
 Training experience E: playing practice games against itself error (as judged by human overseer)
 Training experience E: a sequence of images and steering
We can specify many learning problems in this fashion, such as learning commands recorded while observing a human driver
to recognize handwritten words, or learning to drive a robotic
automobile autonomously.

2. A handwriting recognition learning problem:  The goal is to define precisely a class of problems that
 Task T: recognizing and classifying handwritten words within encompasses interesting forms of learning, to explore algorithms
images that solve such problems, and to understand the fundamental
 Performance measure P: percent of words correctly classified structure of learning problems and processes.
 Training experience E: a database of handwritten words with given
classifications
DESIGNING A LEARNING SYSTEM DESIGNING A LEARNING SYSTEM

1. Choosing the Training Experience.  let us consider designing a program to learn to play checkers,
with the goal of entering it in the world checkers tournament.
2. Choosing the Target Function
1. Choosing the Training Experience.

3. Choosing a Representation for the Target Function The first design choice we face is to choose the type of training
experience from which our system will learn.

4. Choosing a Function Approximation Algorithm


 The type of training experience available can have a significant
5. The Final Design impact on success or failure of the learner.

 Type of training experience can be


a) Direct training or indirect training.

 The training experience provides direct or indirect feedback b) Teacher or not


regarding the choices made by the performance system. The learner might rely on the teacher to select informative board
states and to provide the correct move for each.
 For example, in learning to play checkers, the system might learn
from direct training examples consisting of individual checkers c) final system performance P must be measured
board states and the correct move for each.
 In our checkers learning scenario, the performance metric P is the
 Alternatively, it might have available only indirect information percent of games the system wins in the world tournament.
consisting of the move sequences and final outcomes of various
games played.
A checkers learning problem:
 Task T: playing checkers
learning from direct training feedback is typically easier than learning  Performance measure P: percent of games won in the world tournament
from indirect feedback.  Training experience E: games played against itself
2. Choosing the Target Function The definition for target value V(b) for an arbitrary board state b
in B, as follows:
 The program needs only to learn how to choose the best move
from among these legal moves.
1. if b is a final board state that is won, then V(b) = 100
function  chooses the best move for any given board state. 2. if b is a final board state that is lost, then V(b) = -100
Let us call this function ChooseMove 3. if b is a final board state that is drawn, then V(b) = 0
4. if b is a not a final state in the game, then V(b) = V(b’), where b' is
ChooseMove : B  M the best final board state that can be achieved starting from b and
board state  move playing optimally until the end of the game
Maps a legal board state to a legal move

Let us call this target function V and again use the notation
V:B to denote that V maps any legal board state from the set B
to some real value (we use to denote the set of real numbers).
If the system can successfully learn such a target function V,
then it can easily use it to select the best move from any current board position.

State Space Search for blue State Space Search black moves for b1 step

V(b)= ? V(b1)= ?

V(b)= maxi V(bi) V(b1)= mini V(bi)

m1 : bb1 m2 : bb2 m3 : bb3 m4 : bb4 m5 : bb5 m6 : bb6


3. Choosing a Representation for the Target Function
simple representation:
Final Board States for any given board state, the function V’ will be calculated as a linear
combination of the following board features:

Black wins: V(b)= 100 X1: the number of black pieces on the board
X2: the number of red pieces on the board

X3: the number of black kings on the board


X4: the number of red kings on the board
Blue wins: V(b)=100
X5: the number of black pieces threatened by red
X6: the number of red pieces threatened by black
Thus, our learning program will represent V’(b) as a linear function of
the form
draw: V(b)=0
where wo through W6 are numerical coefficients, or weights, to be
chosen by the learning algorithm.

Partial design of a checkers learning program: 4. Choosing a Function Approximation Algorithm

 Task T: playing checkers For instance, the following training example describes a board state b in which
 Performance measure P: percent of games won in the world black has won the game (note x2 = 0 indicates that red has no remaining pieces)
tournament and for which the target function value Vtrain(b) is therefore +100.
 Training experience E: games played against itself
 Target function: V: Board 
 Target function representation
Below we describe a procedure that first derives such training
examples from the indirect training experience available to the
learner, then adjusts the weights wi to best fit these training
The first three items above correspond to the specification of the examples.
learning task, whereas the final two items constitute design choices for
the implementation of the learning program.
a) Estimating Training Values

b) Adjusting The Weights


a) ESTIMATING TRAINING VALUES b) ADJUSTING THE WEIGHTS

 Choose the weights wi to best fit the set of training


examples.
• V(b) : true target function
• V’(b) : learned target function  Minimize the squared error E between the train values
• Vtrain(b) : training value and the values predicted by the hypothesis

• Rule for estimating training values:


• Vtrain(b)  V’(Successor(b))
E 
 Vtrainb  V b
ˆ 2

b ,Vtrain b  trainingexamples
where successor(b) is the next board position.

5.The Final Design


 Require an algorithm that
 will incrementally refine weights as new training examples The final design of our checkers learning system can be naturally
become available described by four distinct program modules that represent the
 will be robust to errors in these estimated training values central components in many learning systems.

 Least Mean Squares (LMS) is one such algorithm

LMS weight update rule. 1) Performance System


For each training example (b, Vtrain(b))
2)Critic
3)Generalizer
 Use the current weights to calculate V’(b) 4)Experiment Generator
 For each weight wi update it as


wi  wi   Vtrain b   Vˆ b  xi 
where  is a small constant (e.g. 0.1)
1. The Performance Module: Takes as input a new board
and outputs a trace of the game it played against itself.
Final Design
2. The Critic: Takes as input the trace of a game and
Experiment
New problem Hypothesis outputs a set of training examples of the target function.

Generator
(initial game board) Vˆ

3. The Generalizer: Takes as input training examples and


outputs a hypothesis which estimates the target function.
Performance Generalizer
System
4. The Experiment Generator: Takes as input the current
Solution trace Training examples hypothesis and outputs a new problem (an initial board
(game history) state) for the performance system to explore.
Critic  b ,V b  ,
1 train 1 b2 ,Vtrainb2 , 

PERSPECTIVES AND ISSUES IN MACHINE LEARNING


Figure summary of choices in designing
the checkers learning program.

One useful perspective on machine


Determine Type
Games against
Experts
of Training Experience learning is that it involves searching a
Games against itself Table of
correct Moves … very large space of possible hypotheses to
Determine
Target Function

determine one that best fits the observed
Board → move Board → value

Determine Representation
data and any prior knowledge held by the
Polynomial
Artificial neural
network
of Learned Function
learner.
Linear function of six features …

Determine
Learning Algorithm
Gradient descent Linear Programming …
Complete Design
Issues in Machine Learning
 What algorithms can approximate functions well and when?
 How much training data is sufficient?.
 When and how can prior knowledge held by the learner guide the
process of generalizing from examples? Concept Learning and The General-to-Specific Ordering:
 What is the best strategy for choosing a useful next training
experience, and how does the choice of this strategy alter the Introduction, A Concept Learning Task, Concept Learning as
complexity of the learning problem?
Search, FIND-S: Finding a Maximally Specific Hypothesis,
 What is the best way to reduce the learning task to one or more Version Spaces and the Candidate Elimination Algorithm,
function approximation problems? Put another way, what specific
functions should the system attempt to learn? Can this process Remarks on Version spaces and Candidate-Elimination,
itself be automated? Inductive Bias.
 How can the learner automatically alter its representation to
improve its ability to represent and learn the target function?

A Concept Learning Task For each attribute, the hypothesis will either
 Target Concept
 indicate by a "?' that any value is acceptable for this attribute,
Example “Days on which my friend Aldo enjoys his favorite
 specify a single required value (e.g., Warm) for the attribute, or
water sport”
 indicate by a “Ø " that no value is acceptable.
( find  “Days on which the beach will be crowded” )
 Task  the hypothesis that Aldo enjoys his favourite sport only on cold days
Learn to predict the value of EnjoySport/Crowded for an with high humidity is represented by the expression below as.
arbitrary day
 Training Examples for the Target Concept (?, Cold, High, ?, ?, ?)
Example Sky Air Humidity Wind Water Forecast Enjoy
Temp Sport  The most general hypothesis – every day is a positive example of
1 Sunny Warm Normal Strong Warm Same Yes this concept
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
<?, ?, ?, ?, ?, ?>
4 Sunny Warm High Strong Cool Change Yes
TABLE 2.1  The most specific possible hypothesis – no day is a positive example
Positive and negative training examples for the target concept EnjoySport. of this concept
 6 attributes (Nominal-valued (symbolic) attributes):
< Ø, Ø , Ø, Ø, Ø, Ø >
Sky (SUNNY, RAİNY, CLOUDY), Temp (WARM,COLD), Humidity (NORMAL, HIGH),
Wind (STRONG, WEAK), Water (WARM, COOL), Forecast (SAME, CHANGE)
1. Notation

In the current example, X is the set of all possible days, each


Example hypothesis:
represented by the attributes Sky, AirTemp, Humidity, Wind, Water, and
Sky AirTemp Humidity Wind Water Forecast Forecast.
Sunny, ?, ?, Strong, ?, Same
The concept or function to be learned is called the target concept,
Corresponding to Boolean function: which we denote by c.
Sunny(Sky) ∧ Strong(Wind) ∧ Same(Forecast)
In general, c can be any booleanvalued function defined over the
instances X; that is, c : X + {O, 1).

In the current example, the target concept corresponds to the value of


the attribute EnjoySport

(i.e., c(x) = 1 if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No).

 Instances for which c(x) = 1 are called positive examples, or 2. The Inductive Learning Hypothesis
members of the target concept.

 Instances for which C(x) = 0 are called negative examples, or


nonmembers of the target concept. Definition:
x is from X.
Any hypothesis found to approximate the target
 We use the symbol D to denote the set of available training function well over a sufficiently large set of training
examples.
examples will also approximate the target function
 hypothesis h in H represents a boolean-valued function defined over X; well over other unobserved examples.
that is, h : X  {O, 1).
or
The goal of the learner is to find a hypothesis h An hypothesis that approximates well the training data will also
such that h(x) = c(x) . approximate the target function over unobserved examples
for all x in X.
Concept Learning As Search 1. General-to-Specific Ordering of Hypotheses
consider the two hypotheses for the general-to-specific ordering
 The goal of Concept learning search is to find the hypothesis that
best fits the training examples. h1=<Sunny,?,?,Strong, ?, ?>
h2=<Sunny,?,?, ?, ?, ?>
Sky AirTemp Humidity Wind Water Forecast
<Sunny/Rainy/Cloudy Warm/Cold Normal/High Weak/Strong Warm/Cold h1 versus h2:
Change/Same> -h2 imposes fewer constraints
-h2 classifies more examples as positive
EnjoySport Learning Task -any instance classified as positive by h1 is classified as
Size of the instance space X positive by h2
-h2 is more general than h1
3  2  2  2  2  2 = 96
Syntactically distinct hypotheses (including ?, Ø) Definition: Let hj and hk be boolean-valued functions defined over X.
5  4  4  4  4  4 = 5120 Then hj is moregeneral-than-or-equal-to hk (written hj >=g hk) if
Semantically distinct hypotheses (Ø anywhere means the empty and only if.
set of instances and classifies each possible instance as a negative
example) (x  X) [ (hk(x) = 1) (hj(x) = 1) ]
1 + (4  3  3  3  3  3) = 973

FIGURE 2.1 Instances, hypotheses, and the more - general - than relation. Find-S: Finding a maximally specific hypothesis

Instances X Hypotheses H
 Find-S is guaranteed to output the most specific hypothesis within H
Specific that is consistent with the positive training examples.

 The final hypothesis will also be consistent with the negative


examples
h1 h3
x1
h2
x2 Method
 Begin with the most specific possible hypothesis in H
 Generalize this hypothesis each time it fails to cover an
General
observed positive training example.
x1 = <Sunny, Warm, High, Strong, Cool, Same> h1 = <Sunny, ?, ?, Strong, ?, ?>
x2 = <Sunny, Warm, High, Light, Warm, Same> h2 = <Sunny, ?, ?, ?, ?, ?>
h3 = <Sunny, ?, ?, ?, Cool, ?>

 hypothesis h2 is more general than hl because every instance that


satisfies hl also satisfies h2. Similarly, h2 is more general than h3.
Note that neither hl nor h3 is more general than the other;
FIND-S Algorithm.
1. Initialize h to the most specific hypothesis in H.
2. For each positive training instance x
 For each attribute constraint ai in h Example Sky Air Humidity Wind Water Forecast Enjoy
If the constraint ai in h is satisfied by x Temp Sport
Then do nothing 1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
Else 3 Rainy Cold High Strong Warm Change No
replace ai in h by the next more general 4 Sunny Warm High Strong Cool Change Yes
constraint that is satisfied by x.
3. Output hypothesis h
 By observing the Training examples each from  Table 2.1 . We
form the hypothesis h
 The first step of FIND-S is to initialize h to the most specific hypothesis in H
< Ø, Ø , Ø, Ø, Ø, Ø >
 By observing the first training example from Table 2.1  Replace by the next more general  To complete our trace of FIND-S, the fourth (positive) example leads
constraint that fits the example; namely, the attribute values for this training example. to a further generalization of h.
h  <Sunny, Warm, High, Strong, Cool, Same>
h  < Sunny, Warm, ? , Strong, ? , ? >
 Next, the second training example (also positive in this case) forces the
algorithm to further generalize h, this time substituting a “ ? “ in place of any
attribute value in h that is not satisfied by the new example. The refined
hypothesis in this case is

h  < Sunny, Warm, ? , Strong, Cool, Same >


 Upon encountering the third training example-in this case a negative example-
the algorithm makes no change to h. In fact, the FIND-S algorithm simply
ignores every negative example!

Hypothesis Space Search


by Find-S
Instances X Hypotheses H
x3- h0
h1

h2,3
x1+ x2+

x4+ h4

h0 = <Ø, Ø, Ø, Ø, Ø, Ø>
x1 = <Sunny, Warm, Normal, Strong, Warm, Same>, +
h1 = <Sunny, Warm, Normal, Strong, Warm, Same>

x2 = <Sunny, Warm, High, Strong, Warm, Same>, +


h2 = <Sunny, Warm, ?, Strong, Warm, Same>
x3 = <Rainy, Cold, High, Strong, Warm, Change>, -
h3 = <Sunny, Warm, ?, Strong, Warm, Same>

x4 = <Sunny, Warm, High, Strong, Cool, Change>, +


h4 = <Sunny, Warm, ?, Strong, ?, ?>
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM

 A second approach to concept learning is the CANDIDATE


Algorithm Order Strategy N/P ELIMINATION Algorithm

FIND-S Specific-to- Top-down Positive


 The key idea in the CANDIDATE-ELIMINATION Algorithm is to
general output a description of the set of all hypotheses consistent with the
training examples.
LIST-THEN- General-to- Bottom-up Negative
ELIMINATE Specific Consistent:
A hypothesis h is consistent with a set of training examples D if
CANDIDATE- Bi-directional Bi-directional Both
and only if h(x) = c(x) for each example < x, c(x) > in D.
ELIMINATION
Consistent(h, D)   <x, c(x)>  D h(x) = c(x)
 The CANDIDATE-ELIMINATION Algorithm finds all describable hypotheses that
are consistent with the observed training examples.

 This subset of all hypotheses is called the version space with respect to
the hypothesis space H.

The LIST-THEN-ELIMINATION Algorithm first initializes the


version space to contain all hypotheses in H, then eliminates any
Version space: hypothesis found inconsistent with any training example.
The version space, denoted VSH,D, with respect to hypothesis space
The LIST-THEN-ELIMINATE Algorithm
H and training example D, is the subset of hypotheses from H
which are consistent with the training examples in D.
1.VersionSpace  a list containing every hypothesis in H
2.For each training example, x, c(x)
Remove from VersionSpace any hypothesis h for which
VSH,D  {h  H | Consistent(h, D)} h(x)  c(x)
3.Output the list of hypotheses in VersionSpace.

•Problems
• The hypothesis space must be finite
• Enumeration of all the hypothesis, rather inefficient
A More Compact Representation for Version Spaces A simple way to understand Candidate algorithm
This Version Space, containing all 6 hypotheses can be compactly represented with its
most specific (S) and most general (G) sets.
How to generate all h in VS, given G and S?
x1 = <Sunny, Warm, Normal, Strong, Warm, Same>, +
x2 = <Sunny, Warm, High, Strong, Warm, Same>, +
x3 = <Rainy, Cold, High, Strong, Warm, Change>, -
x4 = <Sunny, Warm, High, Strong, Cool, Change>, +

<Sunny, Warm, ?, Strong, ?, ?>


S:

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>

G: <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?>

Example Version Space


Final Version Space

S4: {<Sunny, Warm, ?, Strong, ?, ?>}

<Sunny,?,?,Strong,?,?> <Sunny, Warm,?,?,?,?> <?,Warm, ?,Strong,?,?>

G4:{<Sunny,?,?,?,?,?> <?,Warm,?,?,?,?>}

Report the version space – all possible hypotheses

64
Candidate Elimination Algorithm
Candidate-Elimination Learning Algorithm:
G  maximally general hypotheses in H
The CANDIDATE-ELIMINATON Algorithm computes the version space S  maximally specific hypotheses in H
containing all hypotheses from H that are consistent with an
observed sequence of training examples.
For each training example d, do
If d is positive
- Initialize G to the set of maximally general hypotheses in H • Remove from G any hypothesis inconsistent with d
- Initialize S to the set of maximally specific hypotheses in H • For each hypothesis s in S that is inconsistent with d
G0  {<?, ?, ?, ?, ?, ?>} • Remove s from S
• Add to S all minimal generalizations h of s such that
S0  {<  ,, , , , >} 1. h is consistent with d, and
2. some member of G is more general than h
• Remove from S any hypothesis that is more general
than another hypothesis in S

Candidate Elimination Algorithm (cont)


 If d is a negative example
• Remove from S every hypothesis inconsistent
with d
• For each hypothesis g in G that is inconsistent
with d
• Remove g from G
• Add to G all minimal specializations h of g such
that
1. h is consistent with d, and
2. some member of S is more specific than h
• Remove from G every hypothesis that is less
general than another hypothesis in G
REMARKS ON VERSION SPACES AND CANDIDATE-ELIMINATION 2) What Training Example Should the Learner Request Next?

1) Will the candidate-elimination Algorithm Converge to the We use the term query to refer to such instances constructed by the
Correct Hypothesis? learner, which are then classified by an external oracle(e.g., nature
 The version space learned by the candidate-elimination Algorithm will converge or a teacher).
toward the hypothesis that correctly describes the target concept, provided
(1) there are no errors in the training examples, and (2) there is some hypothesis
in H that correctly describes the target concept.

• The target concept is exactly learned when the S and G boundary


sets converge to a single, identical, hypothesis.

73

3) How Can Partially Learned Concepts Be Used? Inductive Bias


 Suppose that no additional training examples are available beyond
the four in our example above, but that the learner is now required to a) A biased Hypothesis space
classify new instances that it has not yet observed.

Extending the hypothesis space


Sky AirTemp Humidity Wind Water Forecast EnjoyS
1 Sunny Warm Normal Strong Cool Change YES
2 Cloudy Warm Normal Strong Cool Change YES
3 Rainy Warm Normal Strong Cool Change NO
 No hypothesis consistent with the three examples with the
assumption that the target is a conjunction of constraints
?, Warm, Normal, Strong, Cool, Change is too general
 Target concept exists in a different space H', including
disjunction and in particular the hypothesis
Sky=Sunny or Sky=Cloudy
2) An Unbiased Learner 3) The Futility of Bias-Free Learning
Power set of instances gives unbiased hypothesis.
In the EnjoySport learning task, for example, the size of the instance
space X of days described by the six available attributes is 96.

For instance, the target concept "Sky = Sunny or Sky = Cloudy"


could then be described as
(Sunny, ?, ?, ?, ?, ?) v (Cloudy, ?, ?, ?, ?, ?)
L(xi, Dc) = (EnjoySport = yes, no).
suppose we present three positive examples (xl,x2, x3) and two negative After training, L is asked to classify a new instance xi.
examples (x4, x5) to the learner.
At this point, the S boundary of the version space will contain the hypothesis which
is just the disjunction of the positive examples.

S:{(x1  x2  x3)}
the G boundary will consist of the hypothesis that rules out only
the observed negative examples
G:{(x4  x5)}

Inductive system three learning algorithms, which are listed from weakest to
strongest bias.

• Three learner with three different inductive bias:

1. Rote learner: no inductive bias, just stores examples and is able


to classify only previously observed examples
Equivalent deductive system 2. CandidateElimination: the concept is a conjunction of
constraints.
3. Find-S: the concept is in H (a conjunction of constraints) plus "all
instances are negative unless seen as positive examples”
(stronger bias)
• The stronger the bias, greater the ability to generalize and
classify new instances (greater inductive leaps).

You might also like