2-BP Basics

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 69

Section 2:

Belief Propagation Basics

1
Outline
• Do you want to push past the simple NLP models (logistic
regression, PCFG, etc.) that we've all been using for 20 years?
• Then this tutorial is extremely practical for you!
1. Models: Factor graphs can express interactions among linguistic
structures.
2. Algorithm: BP estimates the global effect of these interactions on
each variable, using local computations.
3. Intuitions: What’s going on here? Can we trust BP’s estimates?
4. Fancier Models: Hide a whole grammar and dynamic programming
algorithm within a single factor. BP coordinates multiple factors.
5. Tweaked Algorithm: Finish in fewer steps and make the steps faster.
6. Learning: Tune the parameters. Approximately improve the true
predictions -- or truly improve the approximate predictions.
7. Software: Build the model you want!

2
Outline
• Do you want to push past the simple NLP models (logistic
regression, PCFG, etc.) that we've all been using for 20 years?
• Then this tutorial is extremely practical for you!
1. Models: Factor graphs can express interactions among linguistic
structures.
2. Algorithm: BP estimates the global effect of these interactions on
each variable, using local computations.
3. Intuitions: What’s going on here? Can we trust BP’s estimates?
4. Fancier Models: Hide a whole grammar and dynamic programming
algorithm within a single factor. BP coordinates multiple factors.
5. Tweaked Algorithm: Finish in fewer steps and make the steps faster.
6. Learning: Tune the parameters. Approximately improve the true
predictions -- or truly improve the approximate predictions.
7. Software: Build the model you want!

3
Factor Graph Notation
• Variables: X9

ψψ{1,8,9}
{1,8,9}
{1,8,9}

X8

• Factors: ψ{2,7,8}
X7

ψ{3,6,7}

Joint Distribution
X1 ψψψ{1,2}
222 X2 ψ{2,3} X3 ψ{3,4} X

ψψ{1}
111 ψψ{2}
333 ψψ{3}
555 ψ

time flies like a


4
Factors are Tensors
s vp pp …
s vp pp …
s 0 2 .3 X9
s 0 s2 vp
.3 pp …
vp 3 40 22 .3
vp s3 4 2 ψψ{1,8,9}
pp .1 23 14 2 {1,8,9}
{1,8,9}
pp vp
.1 2 1 X8

… pp .1 2 1
• Factors: s
vp … ψ{2,7,8}
pp
X7
v n p d
v 1 6 3 4 ψ{3,6,7}
n 8 4 2 0.1
p 1 3 1 3
d 0.1 8 0 0

X1 ψψψ{1,2}
222 X2 ψ{2,3} X3 ψ{3,4} X
v 3
n 4
p 0.1 ψψ{1} ψψ{2} ψψ{3} ψ
111 333 555
d 0.1
time flies like a
5
Inference
Given a factor graph, two common tasks …
– Compute the most likely joint assignment,
x* = argmaxx p(X=x)
– Compute the marginal distribution of variable Xi:
p(Xi=xi) for each value xi

Both consider all joint assignments.


Both are NP-Hard in general. p(Xi=xi) = sum of
p(X=x) over joint
assignments with
So, we turn to approximations. Xi=xi
6
Marginals by Sampling on Factor Graph
Suppose we took many samples from the distribution over
taggings:

Sample 1: n v p d n

Sample 2: n n v d n

Sample 3: n v p d n

Sample 4: v n p d n

Sample 5: v n v d n

Sample 6: n v p d n

X0 ψ0 X1 ψ2 X2 ψ4 X3 ψ6 X4 ψ8 X5

<START> ψ1 ψ3 ψ5 ψ7 ψ9
time flies like an arrow 7
Marginals by Sampling on Factor Graph
The marginal p(Xi = xi) gives the probability that variable Xi
takes value xi in a random sample

Sample 1: n v p d n

Sample 2: n n v d n

Sample 3: n v p d n

Sample 4: v n p d n

Sample 5: v n v d n

Sample 6: n v p d n

X0 ψ0 X1 ψ2 X2 ψ4 X3 ψ6 X4 ψ8 X5

<START> ψ1 ψ3 ψ5 ψ7 ψ9
time flies like an arrow 8
Marginals by Sampling on Factor Graph
Estimate the n 4/6 n 3/6 p 4/6
marginals as: d 6/6 n 6/6
v 2/6 v 3/6 v 2/6

Sample 1: n v p d n

Sample 2: n n v d n

Sample 3: n v p d n

Sample 4: v n p d n

Sample 5: v n v d n

Sample 6: n v p d n

X0 ψ0 X1 ψ2 X2 ψ4 X3 ψ6 X4 ψ8 X5

<START> ψ1 ψ3 ψ5 ψ7 ψ9
time flies like an arrow 9
How do we get marginals without sampling?

That’s what Belief Propagation is all about!

Why not just sample?


• Sampling one joint assignment is also NP-hard in general.
– In practice: Use MCMC (e.g., Gibbs sampling) as an anytime algorithm.
– So draw an approximate sample fast, or run longer for a “good” sample.

• Sampling finds the high-probability values xi efficiently.


But it takes too many samples to see the low-probability ones.
– How do you find p(“The quick brown fox …”) under a language model?
• Draw random sentences to see how often you get it? Takes a long time.
10
• Or multiply factors (trigram probabilities)? That’s what BP would do.
Great Ideas in ML: Message
_ _ _ _ _ _ _ _ _ _ _ _

Count the soldiers Passing


there's
1 of me
1 2 3 4 5
before before before before before
you you you you you

5 4 3 2 1
behind behind behind behind behind
you you you you you

11
adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing
Count the soldiers
there's Belief:
1 of me Must be
22 + 11 + 3 = 6 of
2 us
before
you

only see 3
my incoming behind
messages you

12
adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing
Count the soldiers
there's Belief: Belief:
1 of me Must be Must be
11 + 1 + 4 = 6 of 22 + 11 + 3 = 6 of
1 before us us
you

only see 4
my incoming behind
messages you

13
adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing
Each soldier receives reports from all branches of tree

3 here

7 here
1 of me

11 here
(= 7+3+1)

14
adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing
Each soldier receives reports from all branches of tree

3 here

7 here
(= 3+3+1)

3 here

15
adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing
Each soldier receives reports from all branches of tree
11 here
(= 7+3+1)

7 here

3 here

16
adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing
Each soldier receives reports from all branches of tree

3 here

7 here

Belief:
Must be
3 here 14 of us

17
adapted from MacKay (2003) textbook
Great Ideas in ML: Message Passing
Each soldier receives reports from all branches of tree

3 here

7 here

Belief:
Must be
3 here 14 of us

wouldn't work correctly


with a 'loopy' (cyclic) graph

18
adapted from MacKay (2003) textbook
Message Passing in Belief Propagation
v 6
n 6 v 1
a 9 n 6
a 3
My other factors
think I’m a noun …
… X Ψ
But my other

variables and I
think you’re a verb
v 6

n 1
a 3

Both of these messages judge the possible values of variable X.


Their product = belief at X = product of all 3 messages to X. 19
Sum-Product Belief Propagation
Variables Factors

ψ2
X2
Beliefs

ψ1 X1 ψ3 X1 ψ1 X3

ψ2
Messages

X2

ψ1 X1 ψ3 X1 ψ1 X3

20
Sum-Product Belief Propagation
Variable Belief

ψ2 v 1
n 2
p 2
v 0.1 v 4
n 3 n 1
p 1 p 0

ψ1 ψ3
X1
v .4
n 6
p 0

21
Sum-Product Belief Propagation
Variable Message

ψ2 v 1
n 2
p 2
v 0.1 v 0.1
n 3 n 6
p 1 p 2
ψ1 ψ3
X1

22
Sum-Product Belief Propagation
Factor Belief
v n
p 4
p 0.1 8
v 8
d 1
d 3 0
n 0.2
n 0
n 1 1

ψ1
X1 X3

v n
p 3.2 6.4
d 24 0
n 0 0

23
Sum-Product Belief Propagation
Factor Belief

ψ1
X1 X3

v n
p 3.2 6.4
d 24 0
n 0 0

24
Sum-Product Belief Propagation
Factor Message
v n
p 0.8 + 0.16 p 0.1 8
d 24 + 0 v 8
d 3 0
n 0.2
n 8 + 0.2
n 1 1

ψ1
X1 X3

25
Sum-Product Belief Propagation
Factor Message

matrix-vector product
(for a binary factor)

ψ1
X1 X3

26
Sum-Product Belief Propagation
Input: a factor graph with no cycles
Output: exact marginals for each variable and factor

Algorithm:
1. Initialize the messages to the uniform distribution.

2. Choose a root node.


3. Send messages from the leaves to the root.
Send messages from the root to the leaves.

4. Compute the beliefs (unnormalized marginals).

5. Normalize beliefs and return the exact marginals.

27
Sum-Product Belief Propagation
Variables Factors

ψ2
X2
Beliefs

ψ1 X1 ψ3 X1 ψ1 X3

ψ2
Messages

X2

ψ1 X1 ψ3 X1 ψ1 X3

28
Sum-Product Belief Propagation
Variables Factors

ψ2
X2
Beliefs

ψ1 X1 ψ3 X1 ψ1 X3

ψ2
Messages

X2

ψ1 X1 ψ3 X1 ψ1 X3

29
CRF Tagging Model

X1 X2 X3

find preferred tags

Could be verb or noun Could be adjective or verb Could be noun or verb

30
CRF Tagging by Belief Propagation

Forward algorithm = Backward algorithm =


message passing belief message passing
v 1.8 (matrix-vector products)
(matrix-vector products)
n 0
a 4.2
message message
α α β β
v 7 v n av 3 v 2v n a v 3
v 0 2 1n 1 nv 1 0 2 1
… n 2
n 2 1 0a 6
n 6 …
a 1 an7 2 1 0
a 1
a 0 3 1 a 0 3 1
v 0.3
n 0
a 0.1

find preferred tags

• Forward-backward is a message passing algorithm.


• It’s the simplest case of belief propagation.
31
So Let’s Review Forward-Backward …

X1 X2 X3

find preferred tags

Could be verb or noun Could be adjective or verb Could be noun or verb

32
So Let’s Review Forward-Backward …
X1 X2 X3

v v v

START n n n END

a a a

find preferred tags


• Show the possible values for each variable
33
So Let’s Review Forward-Backward …
X1 X2 X3

v v v

START n n n END

a a a

find preferred tags


• Let’s show the possible values for each variable
• One possible assignment
34
So Let’s Review Forward-Backward …
X1 X2 X3

v v v

START n n n END

a a a

find preferred tags


• Let’s show the possible values for each variable
• One possible assignment
• And what the 7 factors think of it … 35
Viterbi Algorithm: Most Probable Assignment
X1 X2 X3

)
RT,v v v v
A
( ST
1} ψ{1}(v)
ψ {0 ,

ψ {1
ψ{3,4}(a,END)

,2
(} v
n n n
, n)
START END

, a)
(a
ψ {2 ,
3} ψ{3}(n)

a a a
ψ{2}(a)

find preferred tags


• So p(v a n) = (1/Z) * product of 7 numbers
• Numbers associated with edges and nodes of path
• Most probable assignment = path with highest product
36
Viterbi Algorithm: Most Probable Assignment
X1 X2 X3

)
RT,v v v v
A
( ST
1} ψ{1}(v)
ψ {0 ,

ψ {1
ψ{3,4}(a,END)

,2
(} v
n n n
, n)
START END

, a)
(a
ψ {2 ,
3} ψ{3}(n)

a a a
ψ{2}(a)

find preferred tags


• So p(v a n) = (1/Z) * product weight of one path
37
Forward-Backward Algorithm: Finds Marginals
X1 X2 X3

v v v

START n n n END

a a a

find preferred tags


• So p(v a n) = (1/Z) * product weight of one path
• Marginal probability p(X2 = a)
= (1/Z) * total weight of all paths a 38
Forward-Backward Algorithm: Finds Marginals
X1 X2 X3

v v v

START n n n END

a a a

find preferred tags


• So p(v a n) = (1/Z) * product weight of one path
• Marginal probability p(X2 = a)
= (1/Z) * total weight of all paths n 39
Forward-Backward Algorithm: Finds Marginals
X1 X2 X3

v v v

START n n n END

a a a

find preferred tags


• So p(v a n) = (1/Z) * product weight of one path
• Marginal probability p(X2 = a)
= (1/Z) * total weight of all paths v 40
Forward-Backward Algorithm: Finds Marginals
X1 X2 X3

v v v

START n n n END

a a a

find preferred tags


• So p(v a n) = (1/Z) * product weight of one path
• Marginal probability p(X2 = a)
= (1/Z) * total weight of all paths through
n
41
Forward-Backward Algorithm: Finds Marginals
X1 X2 X3

v v v

START n n n END

a a a

find preferred tags


α2(n) = total weight of these
path prefixes

42
(found by dynamic programming: matrix-vector products)
Forward-Backward Algorithm: Finds Marginals
X1 X2 X3

v v v

START n n n END

a a a

find preferred tags


2(n) = total weight of these
path suffixes

43
(found by dynamic programming: matrix-vector products)
Forward-Backward Algorithm: Finds Marginals
X1 X2 X3

v v v

START n n n END

a a a

find preferred tags


α2(n) = total weight of these 2(n) = total weight of these
path prefixes (a + b + c) path suffixes (x + y + z)
44
Product gives ax+ay+az+bx+by+bz+cx+cy+cz = total weight of paths
Forward-Backward Algorithm: Finds Marginals
X1 X2 X3

Oops! The weight of av path v v


through a state also
includes a weight at that
state.
n
So α(n)∙β(n) isn’t enough.
START n “belief
n that X2 = n” END
α2(n) 2(n)
The extra weight is the
opinion of the unigram
a a a
factor at this variable.

ψ{2}(n)

find preferred tags

total weight of all paths through n

= α2(n)  ψ{2}(n)  2(n)


45
Forward-Backward Algorithm: Finds Marginals

X1 X2 X3

v v “belief
v that X2 = v”

START n n “belief
n that X2 = n” END
α2(v) 2(v)

a a a

ψ{2}(v)

find preferred tags

total weight of all paths through v

= α2(v)  ψ{2}(v)  2(v)


46
Forward-Backward Algorithm: Finds Marginals
v X 1.8
1 X2 X3
n 0
a 4.2
v v “belief
v that X2 = v”
divide
v 0.3 by Z=6 to
n 0
START
get n n “belief
n that X2 = n”END
a 0.7 marginal α2(a) 2(a)
probs
a a “belief
a that X2 = a”

sum = Z
ψ{2}(a) (total probability
of all paths)
find preferred tags

total weight of all paths through a

= α2(a)  ψ{2}(a)  2(a)


47
(Acyclic) Belief Propagation
In a factor graph with no cycles:
1. Pick any node to serve as the root.
2. Send messages from the leaves to the root.
3. Send messages from the root to the leaves.
A node computes an outgoing message along an edge
only after it has received incoming messages along all its other edges.

X8

ψ12
X7

ψ11

X9 X6

ψ13 ψ10

X1 X2 X3 X4 X5

ψ1 ψ3 ψ5 ψ7 ψ9
48
time flies like an arrow
(Acyclic) Belief Propagation
In a factor graph with no cycles:
1. Pick any node to serve as the root.
2. Send messages from the leaves to the root.
3. Send messages from the root to the leaves.
A node computes an outgoing message along an edge
only after it has received incoming messages along all its other edges.

X8

ψ12
X7

ψ11

X9 X6

ψ13 ψ10

X1 X2 X3 X4 X5

ψ1 ψ3 ψ5 ψ7 ψ9
49
time flies like an arrow
Acyclic BP as Dynamic Programming

Subproblem:
ψ12 Inference using just the
Xi factors in subgraph H
F
ψ14 ψ11

X9 X6
H
ψ13
G ψ10

X1 X2 X3 X4 X5

ψ1 ψ3 ψ5 ψ7 ψ9
Figure adapted from50
time flies like an arrow Burkett & Klein (2012)
Acyclic BP as Dynamic Programming

Subproblem:
Inference using just the
Xi factors in subgraph H
ψ11
The marginal of Xi in
X9 X6
H
that smaller model is the
ψ10 message sent to Xi from
subgraph H
X1 X2 X3 X4 X5

Message to
a variable
ψ7 ψ9
51
time flies like an arrow
Acyclic BP as Dynamic Programming

Subproblem:
Inference using just the
Xi factors in subgraph H

ψ14
The marginal of Xi in
X9 X6 that smaller model is the
G message sent to Xi from
subgraph H
X1 X2 X3 X4 X5

Message to
a variable
ψ5
52
time flies like an arrow
Acyclic BP as Dynamic Programming

X 8
Subproblem:
ψ12 Inference using just the
Xi factors in subgraph H
F

The marginal of Xi in
X9 X6 that smaller model is the
ψ13 message sent to Xi from
subgraph H
X1 X2 X3 X4 X5

Message to
a variable
ψ1 ψ3
53
time flies like an arrow
Acyclic BP as Dynamic Programming

Subproblem:
ψ12 Inference using just the
Xi factors in subgraph FH
F
ψ14 ψ11
The marginal of Xi in
X9 X6
H
that smaller model is the
ψ13 ψ10 message sent by Xi
out of subgraph FH
X1 X2 X3 X4 X5

Message from
a variable
ψ1 ψ3 ψ5 ψ7 ψ9
54
time flies like an arrow
Acyclic BP as Dynamic Programming
• If you want the marginal pi(xi) where Xi has degree k, you can think of that summation
as a product of k marginals computed on smaller subgraphs.
• Each subgraph is obtained by cutting some edge of the tree.
• The message-passing algorithm uses dynamic programming to compute the
marginals on all such subgraphs, working from smaller to bigger. So you can
compute all the marginals.
X8

ψ12

X7

ψ14 ψ11

X9 X6

ψ13 ψ10

X1 X2 X3 X4 X5

ψ1 ψ3 ψ5 ψ7 ψ9

time flies like an arrow 55


Acyclic BP as Dynamic Programming
• If you want the marginal pi(xi) where Xi has degree k, you can think of that summation
as a product of k marginals computed on smaller subgraphs.
• Each subgraph is obtained by cutting some edge of the tree.
• The message-passing algorithm uses dynamic programming to compute the
marginals on all such subgraphs, working from smaller to bigger. So you can
compute all the marginals.
X8

ψ12

X7

ψ14 ψ11

X9 X6

ψ13 ψ10

X1 X2 X3 X4 X5

ψ1 ψ3 ψ5 ψ7 ψ9

time flies like an arrow 56


Acyclic BP as Dynamic Programming
• If you want the marginal pi(xi) where Xi has degree k, you can think of that summation
as a product of k marginals computed on smaller subgraphs.
• Each subgraph is obtained by cutting some edge of the tree.
• The message-passing algorithm uses dynamic programming to compute the
marginals on all such subgraphs, working from smaller to bigger. So you can
compute all the marginals.
X8

ψ12

X7

ψ14 ψ11

X9 X6

ψ13 ψ10

X1 X2 X3 X4 X5

ψ1 ψ3 ψ5 ψ7 ψ9

time flies like an arrow 57


Acyclic BP as Dynamic Programming
• If you want the marginal pi(xi) where Xi has degree k, you can think of that summation
as a product of k marginals computed on smaller subgraphs.
• Each subgraph is obtained by cutting some edge of the tree.
• The message-passing algorithm uses dynamic programming to compute the
marginals on all such subgraphs, working from smaller to bigger. So you can
compute all the marginals.
X8

ψ12

X7

ψ14 ψ11

X9 X6

ψ13 ψ10

X1 X2 X3 X4 X5

ψ1 ψ3 ψ5 ψ7 ψ9

time flies like an arrow 58


Acyclic BP as Dynamic Programming
• If you want the marginal pi(xi) where Xi has degree k, you can think of that summation
as a product of k marginals computed on smaller subgraphs.
• Each subgraph is obtained by cutting some edge of the tree.
• The message-passing algorithm uses dynamic programming to compute the
marginals on all such subgraphs, working from smaller to bigger. So you can
compute all the marginals.
X8

ψ12

X7

ψ14 ψ11

X9 X6

ψ13 ψ10

X1 X2 X3 X4 X5

ψ1 ψ3 ψ5 ψ7 ψ9

time flies like an arrow 59


Acyclic BP as Dynamic Programming
• If you want the marginal pi(xi) where Xi has degree k, you can think of that summation
as a product of k marginals computed on smaller subgraphs.
• Each subgraph is obtained by cutting some edge of the tree.
• The message-passing algorithm uses dynamic programming to compute the
marginals on all such subgraphs, working from smaller to bigger. So you can
compute all the marginals.
X8

ψ12

X7

ψ14 ψ11

X9 X6

ψ13 ψ10

X1 X2 X3 X4 X5

ψ1 ψ3 ψ5 ψ7 ψ9

time flies like an arrow 60


Acyclic BP as Dynamic Programming
• If you want the marginal pi(xi) where Xi has degree k, you can think of that summation
as a product of k marginals computed on smaller subgraphs.
• Each subgraph is obtained by cutting some edge of the tree.
• The message-passing algorithm uses dynamic programming to compute the
marginals on all such subgraphs, working from smaller to bigger. So you can
compute all the marginals.
X8

ψ12

X7

ψ14 ψ11

X9 X6

ψ13 ψ10

X1 X2 X3 X4 X5

ψ1 ψ3 ψ5 ψ7 ψ9

time flies like an arrow 61


Loopy Belief Propagation
What if our graph has cycles? • Messages from
different
subgraphs are
no longer
independent!
X8
– Dynamic
programming
ψ12 can’t help. It’s
X7 now #P-hard in
general to
ψ14 ψ11 compute the
X9 X6
exact marginals.
• But we can still
ψ13 ψ10
run BP -- it's a local
X1 ψ2 X2 ψ4 X3 ψ6 X4 ψ8 X5 algorithm so it
doesn't "see the
ψ1 ψ3 ψ5 ψ7 ψ9
cycles."
time flies like an arrow 62
What can go wrong with loopy BP?
F

All 4 factors
on cycle
F F
enforce
equality

63
What can go wrong with loopy BP?
T

All 4 factors
on cycle
T T
enforce
equality

This factor says


upper variable
is twice as likely
to be true as false
(and that’s the true
marginal!)
64
What can go wrong with loopy BP?
T 4 T 2
F 1 F 1 • Messages loop around and around …
T 2
F 1 • 2, 4, 8, 16, 32, ... More and more
convinced that these variables are T!
All 4 factors • So beliefs converge to marginal
on cycle distribution (1, 0) rather than (2/3, 1/3).
enforce
equality
• BP incorrectly treats this message as
T 4 T 2 T 2
separate evidence that the variable is T.
F 1 F 1 F 1
• Multiplies these two messages as if they
were independent.
This factor says • But they don’t actually come from
upper variable T 2 independent parts of the graph.
is twice as likely F 1 • One influenced the other (via a cycle).
to be T as F
(and that’s the true This is an extreme example. Often in practice,
marginal!) the cyclic influences are weak. (As cycles are
long or include at least one weak correlation.) 65
What can go wrong with loopy BP?
Your prior doesn’t think Obama owns it. A rumor is circulating
But everyone’s saying he does. Under a that Obama secretly
owns an insurance
Naïve Bayes model, you therefore believe it. company.
(Obamacare is
actually designed to
maximize his profit.)
T 2048
T 1
F 99
F 99

Obama T 2
owns it F 1

T 2 T 2
T 2
F 1 F 1 Kathy
F 1 says so
Alice
says so A lie told often enough becomes truth.
Bob
… -- Lenin
Charlie
says so says so 66
What can go wrong with loopy BP?
Better model ... Rush can influence conversation.
Actually 4 ways:
– Now there are 2 ways to explain why everyone’s but “both” has a
repeating the story: it’s true, or Rush said it was. low prior and
– The model favors one solution (probably Rush). “neither” has a low
likelihood, so only 2
– Yet BP has 2 stable solutions. Each solution is self- good ways.
reinforcing around cycles; no impetus to switch.
T ???
T 1 T 1 If everyone blames Obama,
F ???
F 99 F 24 then no one has to blame
Rush.
Rush
Obama
says so But if no one blames Rush,
owns it then everyone has to
continue to blame Obama
Kathy (to explain the gossip).
Alice says so
says so A lie told often enough becomes truth.
Bob
… -- Lenin
Charlie
says so says so 67
Loopy Belief Propagation Algorithm
• Run the BP update equations on a cyclic graph
– Hope it “works” anyway (good approximation)
• Though we multiply messages that aren’t independent
• No interpretation as dynamic programming
– If largest element of a message gets very big or small,
• Divide the message by a constant to prevent over/underflow
• Can update messages in any order
– Stop when the normalized messages converge
• Compute beliefs from final messages
– Return normalized beliefs as approximate marginals

68
e.g., Murphy, Weiss & Jordan (1999)
Loopy Belief Propagation
Input: a factor graph with cycles
Output: approximate marginals for each variable and factor

Algorithm:
1. Initialize the messages to the uniform distribution.

2. Send messages until convergence.


Normalize them when they grow too large.

3. Compute the beliefs (unnormalized marginals).

4. Normalize beliefs and return the approximate marginals.

69

You might also like