Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Université de Liège

2022 - 2023

ELEN0060-2 Information and coding theory

Project 1:

Clancy Turlagh - s220024


Petrenko Maxime - s225365

March 15, 2023


Contents
1 Part 1
1.1 Question 1 - Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Question 2 - Joint Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Question 3 - Conditional Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Question 4 - Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Question 5 - X, Y and Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Part 2 3
2.1 Question 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Question 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Question 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 Question 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.5 Question 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.6 Question 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Part 3 3
3.1 Question 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Question 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Question 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.4 Question 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.5 Question 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1 Part
1.1 Question 1 - Entropy
The mathmatical formula we are using to measure bit (or shannon) entropy of X is the following:
X
H(X) = − P (x) log2 P (x)
x

To implement this in numpy with an input of P(X), the probability distribution of a discrete
variable x, we use numpy’s inbuilt vectorized operations to compute:

1. the log2 P (X = x) for each value in the vector

2. multiply the log value with the probabiltiy of x

3. sum up the value of the previous multiplications

However there are some other things we have to address in the code which arn’t a problem in the
formula. If we get a probability with 0 values in it, numpy sees the log calculation as log2 (0) which it
cannot do, however we know it is in the context of 0 log2 (0) which should always give a 0 and not a
NaN.
Intuitively, the entropy measures the amount of randomness in a probability distribution. Probability
distributions where P(X=a) = 0.99 and P(X=b)= 0.01 are not really as random as the information
you get from them is typically quite predictable. A high entropy distribution like P(X=a1) = 0.1,
P(X=a2)=0.1, ... , P(X=a10)=0.1 is far more unpredictable as if you were to sample the distribution,
it is not as predictable to know what values you would get.

1.2 Question 2 - Joint Entropy


The formula for the joint entrophy (in shannon units) is:
XX
H(X, Y ) = − P (xy) log2 P (xy)
x y

To do this in python with numpy, we use the exact same computations that we have used in Ques-
tion 1, with the difference being that the input in question 1 was a 1d vector of size n, we now have
an input 2d matrix of nxm. Funnily enough, since there is no restrictions on input dimension, in our
code: entropy(Pxy) = joint_entropy(Pxy).

The key difference between the joint entropy and the entropy of a single variable is that the joint
entropy takes into account the dependence between the variables. If the variables are independent,
their joint entropy is the sum of their individual entropies. However, if they are dependent, their joint
entropy will be lower than the sum of their individual entropies, reflecting the fact that some of the
uncertainty in the joint distribution can be explained by the dependence between the variables.

1.3 Question 3 - Conditional Entropy


The mathmatical formulation we use for the conditional entropy (in shannon units) is:
X X
H(X|Y ) = − PY (y) PX|Y (x|y) log2 PX|Y (x|y)
y x

To do this in numpy we:

1. Get P(Y) from P(X,Y) by summing P(X,Y) across all values of x, leaving us with P(Y)

1
2. For each row in P(X,Y=y), we divide by the corresponding value of P(Y=y) to get the conditional
probability (through bayes). Sadly, this does not work for numpy’s vectorised operations, as if
we did Pxy / Py, it would cycle through values of Py as it is going across the matrix, which we
do not want. While it could be done with for loops, it is inelegant, so we found a way to do the
calculation by transposing the vector Py into a column vector, which makes it work)

3. get the log of the conditional entrophy P(x,y)/P(y)

4. Then we do the calculations as before, multiplying the values P(x,y) at each point in the matrix,
and summing the matrix

An equivalent way to do this computation is to use the joint entropy and entropy:

H(X|Y ) = H(X, Y ) − H(Y )

1.4 Question 4 - Mutual Information


The mathmatical formulation we use to define mutual information is:
X X P (x, y)
I(X; Y ) = P (x, y) log2
x∈X y∈Y
P (x)P (y)

However, there is another formulation that is the exact same but in the form:

I(X; Y ) = H(X) + H(Y ) − H(X, Y )

Since we are lazy computer scientists, we can use the functions we created earlier to have a simpler
computation of the mutual information.

To do this in our code we did the following:

1. Sum P(X,Y) across the rows to get P(X), and sum P(X,Y) across the rows to get P(Y)

2. Input P(X,Y) into the joint_entropy to get H(X,Y)

3. Input P(X) and P(Y) into entropy function to get H(X) and H(Y)

4. Do H(X) - H(Y) - H(X,Y)

Since all of the coding specificities have been done in the earlier functions, it is a simple code. The
mutual information can be written many other ways such as:

I(X; Y ) = H(X) − H(X|Y ) = H(Y ) − H(Y |X) = H(X, Y ) − H(X|Y ) − H(Y |X)

These expressions show that mutual information is symmettric, as in I(X;Y) = I(Y;X). It shows us
how much information X can tell us about Y if we know X, and how much information Y can tell us
about X if we know Y.

1.5 Question 5 - X, Y and Z


The conditional joint entropy can be computed as follows:
XXX
H(X, Y |Z) = − P (x, y, z) log P (x, y|z)
x y z

The method is really the same as the conditional entropy of H(X|Y) where we replace Y by Z, and
X by XY. Here is how we computer it on numpy:

1. Seperate the XYZ matrix into Z and XY matricies

2
2. Calculate the P(x,y,z)/P(z) and get the log for each value in the matrix

3. sum up all the values together to get the entropy

The conditional mutual information can be computed as follows:

I(X; Y |Z) = H(X|Z) + H(Y |Z) − H(X, Y |Z)

While there are many ways to compute it, this way relies on functions we have already created,
making it easier for us. How we implemented it in the code was as follows:

1. Decompose Pxyz into Pxz, Pyz

2. Use the previous functions joint entropy, and conditional joint entropy to get the values for
H(X|Z), H(Y|Z), H(X,Y|Z)

3. Use the entropy’s to get the mutual information

2 Part 2
2.1 Question 6
2.2 Question 7
2.3 Question 8
2.4 Question 9
2.5 Question 10
2.6 Question 11

3 Part 3
3.1 Question 12
The entropy for the first slot would be:
X
H(X) = − P (x) log2 P (x)
x

Where P(X=x) is 1/6 for each colour, and there are in total 6 possible values.
X
H(X) = − 1/6 log2 (1/6) = − log2 (1/6)
6

Which approximates to 2.585 .

For all 4 slots, there would be in total 64 combinations of colours, as you can put the same colour
multiple times. Since each colour has the same probability of being selected, it means that each
indidivual combination of 4 colours has a proabbility of occuring 1/64 times. Using the formula above
we can determine the entropy as:
X
H(X) = − 1/1296 log2 (1/1296) = − log2 (1/1296)
1296

Which is exactly 4 times as large as the intial calculation of 1 peg, and approximates to 10.339.

Since the calculation of entropy for 4 pegs together is 4 times as large as the entropy for 1 peg,
it means that I(slot 1; slot 2; slot 3; slot 4) = 0, as there is no information gained from knowing the

3
other pegs. This means that the 4 slots are independant from each other.

This stands to reason, as if you were playing the game, knowing 1 peg doesnt help you predict the
other pegs. However if the game did not have replacement, and by choosing one colour for the first
slot meant that you had less colours to choose for the next slots, then there would be some information
gain from knowing other slots.

3.2 Question 13
Even though the following result does not give which peg was correctly placed, it does not matter as
it would affect each peg the same.

For the peg which was correctly placed, the entropy would be: 0, as the outcome is now certain.
For the other pegs, the entrophy would decrease, as you have essentially ruled out one colour, meaning
there are less possible choices to choose from.

The new entropy for the incorrectly placed pegs are:


X
H(P eg) = − 1/5 log2 (1/5) = − log2 (1/5) = 2.322
5

Since now there are only 5 colours to choose from, each with a new probability of 1/5.
The possible combinations for the rest of the game would be 53 , as now there are only 5 colours
to choose from, and now since we know that 1 of the pegs is correct, there are only 3 pegs to choose
from The entropy for the game would be:
X
H(Game) = − 1/125 log2 (1/125) = − log2 (1/125) = 6.96
125

This guess has brought the entrophy from 10.339 to 6.96, which has created a reduction of 3.379
bits in the total entropy of the game.

3.3 Question 14
In this formulation, the entropy for the correctly placed pin is 0. The entrophy for slot where the
incorrectly placed pin was placed is
X
H(P eg) = − 1/5 log2 (1/5) = − log2 (1/5) = 2.322
5

as in this place we know that it can only be out of five colours.


For the final two slots, we still have the same entropy as before at 2.322, as we cannot be certain
what color it is.
While the entropy for each slot is the same, the entropy of the game is reduced as we now have
some mutual information.
In the previous question, there were 125 remaining code possibilites after the results of the first
guess. This was due to the fact that there were 3 slots that had 5 colours each. However after this
guess, there is less. To be more precise there is now 125 - no. of combinations that do not have
red,blue, or yellow (which is 33 ).

This means for the entropy of the game, we will have 198 combinations of 1/198 probability for
each outcome, which gives the entropy of:
X
H(Game) = − 1/98 log2 (1/98) = − log2 (1/98) = 6.615
98

This gives a larger information gain when compared to the last question, which makes sense as we
have more information on the outcome of the game, meaning that the entropy should be lower.

4
3.4 Question 15
The maximum entropy of the system is when there we have no possible information on the game, e.g.
when a single move has not been played.
In this case, with C colours and S slots, we know that we have in total C S combinations of secret
codes. Therefore the maximum entropy would be for a series of C S different probabilites each with a
value of P(X=x) = 1/C S .
Therefore we can calculate the entropy as:
X
H(Game) = − 1/C S log2 (1/C S ) = − log2 (1/C S ) = log(C S )
CS

Since log2 (C S ) can be rewritten as S log2 (C) we can easily infer that if the number of slots increases
by 1, the entropy of the game increases by log2 (C) while if the number of colours increases by 1, the
entropy of the game increases by S log2 (1 + 1/C)

3.5 Question 16
If the goal is to solve the game in the minimum number of guesses, we should find a strategy that
maximises the reduction in entropy at each guess. One way to do it would be similar to how the some
people found a strategy to minimise the number of guesses in the word game Wordle. How to do it is
to do the following steps:

1. First define the number of possible combinations it could be based off previous guesses (in the
case of there being no guesses yet, the number of possible codes it could be is 64 ).

2. For each of the possible combinations it could be, we select one, and then calculate its entropy
against all possible hidden codes combinations

3. We take the expected value of entropy from all of the possible outcomes (as we assume all codes
are equally likely to happen)

4. We store that expected entropy, and repeat with all possible words

5. At the end, we take the guess with the lowest possible expected entropy, as this would maximise
information gain from our guess.

6. We repeat the process at each stage until we get a guess with 0 expected entrophy, which will be
our correct answer

You might also like