Professional Documents
Culture Documents
Colt Tutorial
Colt Tutorial
Colt Tutorial
Vasant Honavar
Artificial Intelligence Research Laboratory
Department of Computer Science
honavar@cs.iastate.edu
Iowa State University, Ames, Iowa 50011
http://www.cs.iastate.edu/~honavar/aigroup.html
Oracle
Examples
Learner
Concept
Samples
Instance Distribution
PAC Learning
The oracle samples the instance space according to
D and provides labeled examples of an unknown
concept C to the learner
The learner is tested on samples drawn from the
instance space according to the same probability
distribution D
The learner's task is to output a hypothesis h from
H that closely approximates the unknown concept
C based on the examples it has encountered
PAC Learning
In the PAC setting, exact learning (zero error
approximation) cannot be guaranteed
In the PAC setting, even approximate learning
(with bounded non-zero error) cannot be
guaranteed 100% of the time
Definition: The error of a hypothesis h with respect
to a target concept C and an instance distribution D
is given by ProbD[ C ( X ) h ( X ) ]
PAC Learning
Definition: A concept class C is said to be PAClearnable using a hypothesis class H if there exists
a learning algorithm L such that for all concepts in
C, for all instance distributions D on an instance
space X, , ( 0 < , < 1) , L, when given
access to the Example oracle, produces, with
probability at least (1 ) , a hypothesis h from
H with error no more than (Valiant, 1984)
(1 )
<
{
}
O
N
ln
3
ln
Sample complexity
size
(
h
)
Nsize
(
c
)
m
the examples and
1 1
m = O lg +
Proof: omitted.
( Nsize(c))
1
1 1
m = O lg + d lg
Corollary: Acyclic, layered multi-layer networks of
s threshold logic units, each with r inputs, has VC
dimension 2 ( r + 1) s lg( es)
Kolmogorov Complexity
Definition: Kolmogorov complexity of an object
relative to a universal Turing machine M is the
length (measured in number of bits) of the
shortest program which when executed on M,
prints out and halts.
K ( ) = min { l ( )| M ( ) = }
Kolmogorov Complexity
Definition: The conditional Kolmogorov complexity
of given is the length of the shortest
program for a universal Turing machine M
which, given , outputs .
Remark: K ( | ) K ( )
Remark: Kolmogorov complexity is machineindependent (modulo an additive constant).
Universal Distribution
Definition: The universal probability distribution over an
instance space X is defined by:
K( X )
where is a normalization
X X DU ( X ) = 2
constant.
Definition: A distribution D is simple if it is multiplicatively
dominated by the universal distribution, that is, there exists
a constant such that DU ( X ) D( X )
Remark: All computable distributions (including gaussian,
poisson, etc. with finite precision parameters) are simple.
X S S K ( X | r ) lg ( sizeof ( r ) )
Concluding remarks
PAC-Easy learning problems lend themselves to a variety
of efficient algorithms.
PAC-Hard learning problems can often be made PAC-easy
through appropriate instance transformation and choice of
hypothesis space
Occam's razor often helps
Weak learning algorithms can often be used for strong
learning
Learning under restricted classes of instance distributions
(e.g., universal distribution) offers new possibilities
Bibliography
1 Honavar, V. http://www.cs.iastate.edu/~honavar/cs673.s96.html
2 Kearns, M.J. & Vazirani, U.V. An Introduction to Computational
Learning Theory. Cambridge, MA: MIT Press. 1994.
3 Langley, P. Elements of Machine Learning. Palo Alto, CA: Morgan
Kaufmann. 1995.
4 Li, M. & Vitanyi, P. Kolmogorov Complexity and its Applications.
New York: Springer-Verlag. 1997.
5 Mitchell, T. Machine Learning. New York: McGraw Hill. 1997.
6 Natarajan, B.K. Machine Learning: A Theoretical Approach. Palo
Alto, CA: Morgan Kaufmann, 1992.