CS19M016 PGM Assignment1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

IITM-CS6730 : Probabilistic Graphical Models Release Date: Jan 25, 2020

Assignment 1 Due Date : Feb 3, 23:59

Roll No: CS19M016 Name: AVINASH KUMAR SINGH


Collaborators (if any):(CS19S017) SUDHA S
References (if any): various sites

• Use LATEX to write-up your solutions (in the solution blocks of the source LATEX file of this assignment), and
submit the resulting single pdf file at GradeScope by the due date. (Note: As always, no late submissions
will be allowed, other than one-day late submission with 10% penalty! Within GradeScope, indicate the
page number where your solution to each question starts, else we won’t be able to grade it! You can join
GradeScope using course entry code MG8P8R).

• Collaboration is encouraged, but all write-ups must be done individually and independently, and mention
your collaborator(s) if any. Same rules apply for codes written for any programming assignments (i.e.,
write your own code; we will run plagiarism checks on codes).

• If you have referred a book or any other online material for obtaining a solution, please cite the source.
Again don’t copy the source as is - you may use the source to understand the solution, but write-up the
solution in your own words.

1. (7 points) [P ROBABILITY R APID -F IRE (via David Blei’s course)]


(a) (1 point) Consider a probability density p(x | µ, σ2 ) = N(x | µ, σ2 ), where µ ∈ R and σ ∈ R+ .
(i) For some x ∈ R, can p(x) < 0? (ii) For some x ∈ R, can p(x) > 1?

Solution: (i) false


(ii) true

(b) (1 point) In the exponential family of distributions, p(x|θ) = h(x) exp η(θ)> t(x)− a(θ)}, for
x ∈ Rn , θ ∈ Rd , auxiliary measure h(x) : Rn → R, natural parameter function η(θ) : Rd →
Rp , sufficient statistics t(x) : Rn → Rp , and a(θ) : Rd → R. What must a(θ) be for p(x|θ) to be
a valid probability distribution? Why?

R+∞ 
Solution: −∞ h(x) exp η(θ)> t(x)− a(θ)} = 1
R+∞ 
−∞ h(x) exp η(θ)> t(x) } = exp(a(θ))
R+∞ 
log( −∞ h(x) exp η(θ)> t(x) }) = a(θ)

1
(c) (3 points) The exponential family defined above can be used to represent many distributions.
Show how to represent the following: Bernoulli, Poisson, and univariate Gaussian.

Solution:
For Bernoulli distribution :
P(x/π)= πx (1 − π)x
π
= exp log( 1−π )x + log(1 − π)
π
η(θ)T = 1−π
t(x) = x
a(θ) = −log(1 − π)
h(x) = 1.

For Poisson distribution :


x ∗eλ
p(x/λ) = λ1−π
p(x/λ) = x!1 exp {xlogλ − λ}
η(θ)T = logλ
t(x) = x
a(θ) = λ
h(x) = x!1

For univariate Gaussian :1


1
2
p(x/µ, σ ) = √2πσ exp − 2σ2 (x − µ)2

p(x/µ, σ2 ) = √12π exp σµ2 x − 2σ1 2 x2 − 2σ1 2 µ2 − logσ
2
 
µ/σ
η(θ)T =
−1/2σ2
 
x
t(x) = 2
x
µ2
a(θ) = 2σ 2 + logσ
1
h(x) = √2π

(d) (2 points) You have a jar of 1, 000 coins. 999 are fair coins, and the remaining coin will always
land heads. You take a single coin out of the jar and flip it 10 times in a row, all of which land
heads. What is the probability your next toss with the same coin will land heads? Explain your
answer. How would you call this probability in Bayesian jargon?

Solution:
When we flip the coin again, the coin could be unfair or fair. With the given information,

2
we have to decide
1. Probability of the coin being fair P(F|Flips)
2. Probability of the coin being unfair P(U|Flips)

Using Bayes rule,

P(Flips/F)∗P(F)
P(F/Flips) = P(Flips)

0.510 ∗.999
P(F/Flips) = 0.510 ∗.999+1∗.001
= .493

Now we find : P(U|Flips) = 1 − P(F|Flips) = 1 − .493 = .507

Now, the probability of the next coin turning up heads


=
P(Hf ) ∗ P(F|Flips) + P(Hu ) ∗ P(U|Flips) = 0.5 ∗ 0.493 + 1 ∗ 0.507 = 0.753 = .753

%probability = 75.3

2. (7 points) [C ONDITIONAL F REEDOM] Let P be a joint distribution defined over n random variables
X = X1 , X2 , . . . , Xn Let X, Y, Z, W be sets of random variables, each being a subset of X .
(a) (1 point) Prove from first principles (definition of conditional independence in KF book) that
(X ⊥ Y | Y)?

Solution: Definition of conditional independence:


P(X ⊥ y | Z) is true if P(X | y, Z) = P(X | Z)
consider P(X | Y, Y)
which is equal to P(X | Y) = P(X ⊥ Y | Y)

(b) (1 point) Are the statements (X ⊥ Y, Z | Z) and (X ⊥ Y | Z) equivalent? Explain your answer.

Solution: Let P(X ⊥ Y, Z | Z) holds


P(X, Y, Z | Z) = P(X ⊥ Z) ∗ P(Y, Z | Z)
P
P(X, Y | Z) = z P(X, Y, z | Z)
P
= P(X/Z) ∗ z P(Y, z | Z)
= P(X | Z)(Y | Z) = P(X ⊥ Y | Z)

(c) (3 points) Prove the intersection property holds for any positive distribution P, assuming X, Y, Z, W
are disjoint. Show where you used positivity of the distribution and set disjoint requirements
in the proof.

3
Solution: consider (X ⊥ Y | Z, W) and (X ⊥ W | Z, Y)
using above equations we can write :

P(X | Y, Z, W) = P(X | Z, W) = P(X | Z, Y)


P(X,W|Z)
P(W|Z)
= P(X,Y|Z)
P(Y|Z)
positivity of the distribution has been used in above equation :
P(X, W | Z) ∗ P(Y | Z) = P(X, Y | Z) ∗ P(W | Z)
Summing over W:
P P
W P(X, W | Z) ∗ P(Y | Z) = W P(X, Y | Z) ∗ P(W | Z)
P(X | Z) ∗ P(Y | Z) = P(X, Y | Z)
= (X ⊥ Y | Z)
Given : (X ⊥ W | Y, Z)
using contraction, (X ⊥ Y, W | Z)

(d) (2 points) Provide an example non-positive distribution P where the intersection property
doesn’t hold.
Note that P(X|Z = z) is undefined if P(Z = z) = 0, so be sure to consider the definition of
conditional independence when proving these properties.

Solution:

3. (10 points) [TAKING A C HANCE WITH G RAPHS]


(a) (3 points) Prove that every Directed Acyclic Graph (DAG) G admits at least one topological
ordering.

Solution: step 1 . let’s first prove there is atleast one node in DAG which has no incoming
edge
(BY CONTRADICTION) let’s assume all the nodes have incoming edges,starting from any
node we will eventually reach same node , i.e.there is cycle. That is contradiction because
cycle is not possible in DAG.
Base Case: for n=1 , topological sorting exists.
Hypothesis: Let topological sorting exists for n-1 vertices.
Induction step: Given DAG G with n+1 nodes
Find a node v with no incoming edges:
G − {v} is a DAG, since deleting v cannot create cycles.
By inductive hypothesis (IH), G − {v} has a topological ordering.
Create topological ordering for G:
– Place v first; then append topological ordering of G − {v}.

4
– This is valid since v has no incoming edges.
The proof is done.

(b) (3 points) Prove that any P that factorizes according to a DAG G is in fact a valid probability
distribution (i.e., is non-negative and adds up to one over all values that its variables can take,
given that the local conditional probability distributions are all valid).

Solution:

(c) (4 points) Prove for a Bayesian network in general that the factorization view implies the (local
conditional) independences view.

Solution:

4. (3 points) [T RYING OUT D - SEPARATION] Consider a distribution over 8 random variables X1 , ..., X8
for the Bayesian network given below:

Give the largest set of random variables that is independent of the random variable:
(a) (1 point) X3 .

Solution:
(X3 ⊥ X2 , X5 , X7 )

(b) (1 point) X3 , conditioned on X1 .

5
Solution:
(X3 ⊥ X2 , X4 , X5 , X7 , , X8 | X1 )

(c) (1 point) X3 , conditioned on X1 and X4 .

Solution:
(X3 ⊥ X2 , X5 , X7 , , X8 | X1 , X4 )

5. (3 points) [F ROM DOMAIN TO STRUCTURE] Construct a Bayesian network for the following signal-
ing pathway. “The enzyme E1 is capable of phosphorylation and activates the lipid L1 without any
signalling from protein P1. P1 is encoded by the PTEN gene. Mutations of this gene are a major
step in the development of many cancers. L1 recruits oncoprotein P2, which is activated by the ki-
nases PK1 and PK2. PK1 and PK2 are generated by L1. P2 activates downstream anabolic signaling
pathways required for cell growth and survival. Once activated, P2 inactivates the pro-apoptotic
proteins C and B, whose correct functioning is required for the normal growth of cells in the cen-
tral nervous system.” Provide the BN network DAG, and write down the expression for the joint
probability density of random variables {E1, L1, P1, P2, PK1, PK2, C, B, CellGrowth}.

Solution:

P(e1, L1, P1, P2, PK1, PK2, C, B, CellGrowth) =


P(E1)*P(L1/E1)*P(PK1/L1)*P(PK2/L1)*P(P2/PK1,PK2)*P(C/P2)*P(B/P2)*P(Cell
Growth/B,C,P1)

6
6. (6 points) [G RAPH ↔ CI S ↔ D ISTRIBUTION] Consider the directed graphs and the probability dis-
tribution shown below. A, B, C, D are all binary random variables.
(a) (3 points) For each graph shown below, list all possible conditional independence statements
implied by it (Note: no need to write CIs trivially implied by the Decomposition property i.e.,
given two CIs that hold (X ⊥ Y, W | Z) and (X ⊥ Y | Z), simply write only the first one).

Solution:
for fig1:
(C ⊥ D | A, B), (A ⊥ B | φ)
for fig2:
(A ⊥ B | φ), (A ⊥ D | φ), (C ⊥ D | φ), (B ⊥ C | φ)

(b) (3 points) For the distribution given below (as a table of the joint probabilities of all the 16
configurations) list all conditional independences satisfied by it. Which of the graphs G above
is an I-map for this distribution P( i.e., I(G) ⊆ I(P))?

C, D = 0, 0 C, D = 0, 1 C, D = 1, 0 C, D = 1, 1
A, B = 0, 0 0.5 0 0 0
A, B = 0, 1 0 0 0 0
A, B = 1, 0 0 0 0 0
A, B = 1, 1 0 0 0 0.5

Solution: There is no conditional independences satisfied by it and None of above graphs


are I - MAP for this distribution

7. (4 points) [R EASONING / INFERENCE BY PENCIL] Write the expression for the joint probability den-
sity of {I, D, G, S, L} for the BN below and use it to find:
(a) (1 point) What is the probability of a low IQ student to get a strong recommendation by taking
an easy course?

7
Solution:
P(I,D,G,S,L) = P(I)*P(D)*(G/D,I)*P(S/I)*P(L/G)
P P
P(I = i0, L = l1, D = d0) = G S P(I, D, G, S, L)
= 0.7 ∗ 0.6 ∗ 1 ∗ (.3 ∗ .9 + .4 ∗ .24 + .3 ∗ .01) = .51786

(b) (1 point) Given that a student has a weak recommendation letter and g3 grade, what is the
probability for him/her to have high IQ?

Solution:
P(I=i1 ,L=l0 ,G=g3 )
P(I = i1 /L = l0 , G = g3 ) = P(L=l0 ,G=g3 )
P P
P(I=i1 ,L=l0 ,G=g3 ,D,S)
= PDP SP 0 3
I D S P(I,D,S,L=l ,G=g )

.3∗(.6∗.02+.4∗.2)∗.99 .0276
= .99∗(.42∗.3+.28∗.7+.18∗.02∗∗.12∗.2)
= .3496
= .07894

(c) (2 points) Find probability of IQ being i1 given that grade is g2 . What will be this probability
if you also know that the course difficulty is d1 .
Note that these queries are examples of causal, evidential and intercausal reasoning respec-
tively.

8
Solution:
P(I=i1 ,G=g2 )
P(I = i1/G = g2) = P(G=g2 )
P P P
P(I=i1 ,L,G=g2 ,D,S)
= PDP SP LP 2
I D S L P(I,D,S,L,G=g )

.3∗1∗(.6∗.08+.4∗.3) .0504
= (.42∗.4+.28∗.25+.18∗.08+.12∗.3)
= .2884
= .17475

If we know D = d1 also then,

P(I=i1 ,G=g2 ,D=d1 )


P(I = i1 /G = g2 , D = d1 ) = P(G=g2,D=d1 )
P P
P(I=i1 ,L,G=g2 ,D=d1 ,S) .3∗.4∗.3∗1∗1 .036
= PSP LP 1 2 = = = .3396
I S L P(I,D=d ,S,L,G=g ) 1∗.4∗(.7∗.25+.3∗.3) .106

(d) (3 points) (Bonus) As seen in class, “explaining away” (in c above) is a special case of the
more general intercausal reasoning where different causes of the same effect can interact in
any general way. Can you come up with examples, either in this BN or any other BN that is
toy yet realistic, where the reverse of “explaining away” is true? To be precise, solve Exercise
3.4 in KF book to claim the bonus.

You might also like