Professional Documents
Culture Documents
Graphical Models - Learning
Graphical Models - Learning
Bilmes,A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models, TR-97021, U.C. Berkeley, April 1998; G. J. McLachlan, T. Krishnan, The EM Algorithm and Extensions, John Wiley & Sons, Inc., 1997; D. Koller, course CS-228 handouts, Stanford University, 2001., N. Friedman & D. Kollers NIPS99.
MINVOLSET
Advanced I WS 06/07
Advanced WS 06/07 I
Outline
PULMEMBOLUS
INTUBATION
KINKEDTUBE
VENTMACH
DISCONNECT
PAP
SHUNT
VENTLUNG PRESS
VENITUBE
MINOVL
FIO2
VENTALV
ANAPHYLAXIS
PVSAT
ARTCO2
TPR
SAO2
INSUFFANESTH
EXPCO2
HYPOVOLEMIA
LVFAILURE
CATECHOL
LVEDVOLUME
Parameter Estimation
CVP
PCWP
CO HRBP
HREKG
HRSAT
BP
Advanced WS 06/07 I
What is Learning?
Advanced WS 06/07 I
Agents are said to learn if they improve their performance over time based on experience.
The problem of understanding intelligence is said to be the greatest problem in science today and the problem for this century as deciphering the genetic code was for the second half of the last onethe problem of learning represents a gateway to understanding intelligence in man and machines. -- Tomasso Poggio and Steven Smale, 2003.
Bayesian Networks
Bayesian Networks
Bottleneck of knowledge aquisition Expensive, difficult Normally, no expert is around Data is cheap ! Huge amount of data avaible, e.g. Clinical tests Web mining, e.g. log files ....
- Learning
- Learning
Bayesian Networks
Introduction Reminder: Probability theory Basics of Bayesian Networks Modeling Bayesian networks Inference (VE, Junction tree) [Excourse: Markov Networks] Learning Bayesian networks Relational Models
Advanced WS 06/07 I
Advanced WS 06/07 I
Conditional independencies & graphical language capture structure of many real-world distributions Graph structure provides much insight into domain
Allows knowledge discovery
Data + Priori Info
E e e
Bayesian Networks
B b b b b
P(A | E,B)
.9 .7 .8 .99 .1 .3 .2 .01
- Learning Bayesian Networks - Learning Bayesian Networks
Learned model can be used for many tasks Supports all the features of probabilistic learning
Model selection criteria Dealing with missing data & hidden variables
e e
Advanced WS 06/07 I
Advanced WS 06/07 I
- Learning
B b b b b
P(A | E,B)
.9 .7 .8 .99 .1 .3 .2 .01
Learning algorithm
e e e e
Bayesian Networks
Advanced WS 06/07 I
Advanced WS 06/07 I
A1 true
A3
A4 true
? ?
...
- Learning
... true
... false
... true
... false
...
... false
... true
...
Bayesian Networks
missing value
Advanced WS 06/07 I
Advanced WS 06/07 I
A1 true
A3
1. Parameter reduction:
2 2 2 2 2 2
X1
118
X2
X3
? ?
...
X1 hidden
X2 H H Y2
4 16
X3
34
true
- Learning
32
... true
... false
Bayesian Networks
missing value
Bayesian Networks
- Learning
16
Y1
Y2
Y3
64
Y1
Y3
Bayesian Networks
- Learning
Real-world data: states of some random variables are missing E.g. medical diagnose: not all patient are subjects to all test Parameter reduction, e.g. clustering, ...
A1 true
A2 true true
A3
A4 true
A5 false false
A6 false false
? ?
Real-world data: states of some random variables are missing E.g. medical diagnose: not all patient are subjects to all test Parameter reduction, e.g. clustering, ...
Advanced WS 06/07 I
Advanced WS 06/07 I
2. Clustering:
hidden Cluster assignment of cluster
X1
X2
...
Xn
attributes
Bayesian Networks
Advanced WS 06/07 I
Advanced WS 06/07 I
Iteration 1
- Learning
Iteration 2
- Learning Bayesian Networks
Bayesian Networks
Bayesian Networks
Advanced WS 06/07 I
Advanced WS 06/07 I
Iteration 5
- Learning
Iteration 25
- Learning
[Slides due to Eamonn Keogh]
Bayesian Networks
Clustering is subjective
Simpson's Family
School Employees
Females
Males
Bayesian Networks
Advanced WS 06/07 I
Advanced WS 06/07 I
Parameter Estimation
be set of data is called a data case
- Learning Bayesian Networks - Learning Bayesian Networks
Let
i overXmX RVs
Selection of arcs New domain with no domain expert Data mining Encompasses to difficult subproblem, Only Structural EM is known
- Learning
Scientific discouvery
fully
Advanced WS 06/07 I
iid - assumption: All data cases are independently sampled from identical distributions Find: X Parameters of CPDs which match the data best
Bayesian Networks
Bayesian Networks
- Learning
observed
Numerical, nonlinear optimization, Multiple calls to BNs, Difficult for large networks
Partially
Advanced WS 06/07 I
Advanced WS 06/07 I
Advanced WS 06/07 I
Bayesian Networks
Advanced WS 06/07 I
Find: ML parameters
- Learning
Bayesian Networks
Bayesian Networks
- Learning
2. Data is equally likely for all parameters 3. All parameters are apriori equally likely
Bayesian Networks
Advanced WS 06/07 I
Advanced WS 06/07 I
Find: ML parameters
Likelihood given the data of the paramteres
Find: ML parameters
Likelihood given the data of the paramteres
- Learning
Bayesian Networks
Advanced WS 06/07 I
Maximum Likelihood
Advanced WS 06/07 I
This is one of the most commonly used estimators in statistics Intuitively appealing
fully
- Learning
Consistent: estimate converges to best possible value as the number of examples grow Asymptotic efficiency: estimate is as close to the true value as possible given a particular training set
?
Numerical, nonlinear optimization, Multiple calls to BNs, Difficult for large networks
Selection of arcs New domain with no domain expert Data mining Encompasses to difficult subproblem, Only Structural EM is known
- Learning Bayesian Networks
Scientific discouvery
Bayesian Networks
Bayesian Networks
- Learning
observed
Partially
Advanced WS 06/07 I
Advanced WS 06/07 I
ML Parameter Estimation
A1 true false ... true A2 true true ... false A3 false true ... false A4 true true ... false A5 false false ... true A6 false false ... true
B b b b b
P(A | E,B)
.9 .7 .8 .99 .1 .3 .2 .01
- Learning - Learning Advanced WS 06/07 I
E e e e e
B b b b b
P(A | E,B)
? ? ? ? ? ? ? ?
Learning algorithm
Network structure is specified
Learner needs to estimate parameters
e e e e
Bayesian Networks
Advanced WS 06/07 I
ML Parameter Estimation
A1 true false A2 true true ... false A3 false true ... false A4 true true ... false A5 false false ... true A6 false false ... true
ML Parameter Estimation
A1 true false A2 true true ... false A3 false true ... false A4 true true ... false A5 false false ... true A6 false false ... true
(iid)
... true
(iid)
... true
- Learning
Bayesian Networks
Bayesian Networks
- Learning
Bayesian Networks
Advanced WS 06/07 I
ML Parameter Estimation
A1 true false A2 true true ... false A3 false true ... false A4 true true ... false A5 false false ... true A6 false false ... true
Advanced WS 06/07 I
ML Parameter Estimation
A1 true false A2 true true ... false A3 false true ... false A4 true true ... false A5 false false ... true A6 false false ... true
(iid)
... true
(iid)
... true
(BN semantics)
- Learning
(BN semantics)
- Learning Advanced WS 06/07 I
Bayesian Networks
Advanced WS 06/07 I
ML Parameter Estimation
A1 true false A2 true true ... false A3 false true ... false A4 true true ... false A5 false false ... true A6 false false ... true
ML Parameter Estimation
A1 true false A2 true true ... false A3 false true ... false A4 true true ... false A5 false false ... true A6 false false ... true
(iid)
... true
(iid)
... true
(BN semantics)
- Learning
(BN semantics)
Bayesian Networks
Bayesian Networks
- Learning
Bayesian Networks
Advanced WS 06/07 I
ML Parameter Estimation
A1 true false A2 true true ... false A3 false true ... false A4 true true ... false A5 false false ... true A6 false false ... true
Advanced WS 06/07 I
ML Parameter Estimation
A1 true false A2 true true ... false A3 false true ... false A4 true true ... false A5 false false ... true A6 false false ... true
(iid)
... true
(iid)
... true
(BN semantics)
- Learning
(BN semantics)
Bayesian Networks
Advanced WS 06/07 I
Decomposability of Likelihood
Advanced WS 06/07 I
If the data set if complete (no question marks) we can maximize each local likelihood function independently, and then combine the solutions to get an MLE solution.
Bayesian Networks
Bayesian Networks
decomposition of the global problem to independent, local sub-problems. This allows efficient solutions to the MLE problem.
- Learning
This constraint implies that the choice on I influences the choice on j (i<>j)
Bayesian Networks
- Learning
Advanced WS 06/07 I
Advanced WS 06/07 I
=1
=1
- Learning
Bayesian Networks
=> MLE is
=> MLE is
Advanced WS 06/07 I
Advanced WS 06/07 I
the parents of V:
observed
- Learning
Selection of arcs New domain with no domain expert Data mining Encompasses to difficult subproblem, Only Structural EM is known
- Learning Bayesian Networks
Bayesian Networks
MLE
Numerical, nonlinear optimization, Multiple calls to BNs, Difficult for large networks
Scientific discouvery
Bayesian Networks
- Learning
fully Partially
Advanced WS 06/07 I
Advanced WS 06/07 I
EM Idea
B b b b b
P(A | E,B)
.9 .7 .8 .99
E e e e e
B b b b b
P(A | E,B)
? ? ? ? ? ? ? ?
Bayesian Networks
2. Count 3. Iterate
Advanced WS 06/07 I
Advanced WS 06/07 I
incomplete data
A true true false true false B true
incomplete data
A B true
?
true false
complete
?
true false
- Learning
Bayesian Networks
Bayesian Networks
- Learning
complete data
expected counts
Bayesian Networks
- Learning
Learning algorithm
Advanced WS 06/07 I
Advanced WS 06/07 I
incomplete data
A B true
incomplete data
A B true
complete
?
true false
complete
?
true false
?
B true N 1.0
- Learning
true A true true false false B true false true false N 1.5 1.5 1.5 0.5 true true false true false false
?=true ?=false
true false
0.5 0.5
1.0 1.0
Bayesian Networks
?=true ?=false
0.5 0.5
maximize
Advanced WS 06/07 I
Advanced WS 06/07 I
incomplete data
A B true
incomplete data
A true true false true false B true
complete
?
true false
?
true false
- Learning
Bayesian Networks
maximize
Bayesian Networks
- Learning
complete data
expected counts
iterate
Bayesian Networks
- Learning
complete data
expected counts
A
complete data
expected counts
Advanced WS 06/07 I
Advanced WS 06/07 I
incomplete data
A B true
incomplete data
A true true false true false B true
complete
?
true false
?
true false
- Learning
Bayesian Networks
maximize
Advanced WS 06/07 I
Advanced WS 06/07 I
Complete-data likelihood
A1 true ? ... true A2 true true ... false A3 ? ? ... ? A4 true ? ... false A5 false false ... true A6 false false ... ?
incomplete data
A B true
incomplete-data likelihood
complete
?
true false
exists with
- Learning Bayesian Networks
complete data
A true true false false B true false true false N 1.5 1.5 1.875 0.125
expected counts
iterate
complete-data likelihood
maximize
Bayesian Networks
Bayesian Networks
- Learning
complete data
expected counts
iterate
Advanced WS 06/07 I
EM Algorithm - Abstract
Expectation Step
Advanced WS 06/07 I
EM Algorithm - Principle
- Learning
Bayesian Networks
Construct an new function based on the current point (which behaves well) Property: The maximum of the new function has a better scoring then the current point.
Advanced WS 06/07 I
EM Algorithm - Principle
Advanced WS 06/07 I
EM Algorithm - Principle
P(Y| )
P(Y| )
new function Q( , k)
- Learning - Learning Bayesian Networks
Current point
Current point
Construct an new function based on the current point (which behaves well) Property: The maximum of the new function has a better scoring then the current point.
Bayesian Networks
Bayesian Networks
- Learning
Maximization Step
P(Y| )
Advanced WS 06/07 I
EM Algorithm - Principle
Advanced WS 06/07 I
EM for Multi-Nominals
P(Y| )
new function Q( , k)
- Learning
Maximum
k+1
Bayesian Networks
Construct an new function based on the current point (which behaves well) Property: The maximum of the new function has a better scoring then the current point.
MLE:
Advanced WS 06/07 I
Advanced WS 06/07 I
the parents of V:
- Learning
Current model
Bayesian Networks
Bayesian Networks
MLE
- Learning
Bayesian Networks
- Learning
Current point
where ENk are the expected counts of X state k in the data, i.e.
Advanced WS 06/07 I
Advanced WS 06/07 I
Expectation
Inference: P(S|X=0,D=1,C=0,B=1)
Current model
Expectation
Inference: P(S|X=0,D=1,C=0,B=1)
Current model
- Learning
Expected counts
S 1 1 0 1 X D C 0 1 0 1 1 0 0 0 0 0 0 0 .. B 1 1 0 1
Expected counts
Bayesian Networks
Advanced WS 06/07 I
Advanced WS 06/07 I
Expectation
Inference: P(S|X=0,D=1,C=0,B=1)
Current model
Expected counts
Bayesian Networks
Bayesian Networks
Maximization
S 1 1 0 1
X D C 0 1 0 1 1 0 0 0 0 0 0 0 ..
B 1 1 0 1
- Learning
Bayesian Networks
Maximization
S 1 1 0 1
X D C 0 1 0 1 1 0 0 0 0 0 0 0 ..
B 1 1 0 1
- Learning
Advanced WS 06/07 I
Monotonicity
Advanced WS 06/07 I
(Dempster, Laird, Rubin 77): the incompletedata likelihood fuction is not decreased after an EM iteration
- Learning
Bayesian Networks
Advanced WS 06/07 I
Advanced WS 06/07 I
EM in Practice
Initial parameters:
Random parameters setting Best guess from other source
Stopping criteria:
Small change in likelihood of data Small change in parameter values
- Learning
Bayesian Networks
Speed up:
various methods to speed convergence
Bayesian Networks
- Learning
Bayesian Networks
(discrete) Bayesian networks: for any initial, nonuniform value the EM algorithm converges to a (local or global) maximum.
- Learning
Advanced WS 06/07 I
Gradient Ascent
Advanced WS 06/07 I
Main result
Parameter estimation is a basic task for learning with Bayesian networks Due to missing values non-linear optimization
EM, Gradient, ... Fully observed data: counting Partially observed data: pseudo counts
- Learning Bayesian Networks
Need to project gradient onto space of legal parameters To get reasonable convergence we need to combine with smart optimization techniques
Bayesian Networks
Cons: