Reasoning About Uncertainty Entropy

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Reasoning about uncertainty Entropy

Use this to measure information/uncertainty: the entropy of a


How much information is in the data? How uncertain are we about variable is the weighted average of optimal bit representation size
the value? i.e. the average size of an optimally encoded message
Example 1: coin toss ⇒ “heads” or “tails” – or just 1 or 0 (a bit) X
H(X) = − Pr[x] log2 (Pr[x])
Example 2: roll with fake die (always 6) – no uncertainty, no x∈X
information gained by rolling. Note:
Example 3: K with Pr[x] = 1/2, Pr[y ] = Pr[z] = 1/4 – how much • Base of log is not important – changes value by constant
information/uncertainty is there about K? Compact representation: factor.
x by 0, y by 10, z by 11. Short codes for frequent messages. • Pr[xj ] = 0 is OK (limy →0 log2 (y ) = 0).
On average, 1/2 · 1 + 1/4 · 2 + 1/4 · 2 = 3/2 bits needed to
• Higher entropy makes frequency analysis harder
represent a value.
• Compression increases entropy

1 2

Entropy: example Entropy: fun facts


Use the Silly crypto systemPexample, calculate entropies for
M, K, C: recall H(X) = − x∈X Pr[x] log2 (Pr[x]) and
M = {a, b} with Pr[a] = 1/4, Pr[b] = 3/4,
For X such that X has n elements,
K = {x, y , z} with Pr[x] = 1/2, Pr[y ] = Pr[z] = 1/4,
C = {1, 2, 3, 4}. • 0 ≤ H(X) ≤ log2 (n)
• H(X) = log2 (n) only when Pr[xi ] = 1/n for all xi ∈ X .
• H(X) = 0 only when Pr[xi ] = 1 for some xi ∈ X (and 0 for
H(M) ≈ 0.81 rest)
H(K) = 1.5
H(C) ≈ 1.85 Joint entropy: the uncertainty of simultaneous values.
H(X, Y) ≤ H(X) + H(Y), with equality only when X and Y are
independent.
Reflect: what would be ideal?
• m ∈ {a, b} – ideally H(M) should be 1?
• c ∈ {1, 2, 3, 4} – ideally H(C) should be 2

3 4
Conditional entropy Conditional entropy: fun facts
Entropy of X given Y = y :
X
H(X|y ) = − Pr[x|y ] log2 (Pr[x|y ])
x∈X
• H(X, Y) = H(Y) + H(X|Y).
Conditional entropy (or equivocation): weighted average of the • H(X|Y) ≤ H(X) with equality iff X and Y are independent.
above for each y ∈ Y :
We can not increase uncertainty about X by extra knowledge of Y.
X X X
H(X|Y) = Pr[y ]H(X|y ) = Pr[y ](− Pr[x|y ] log2 (Pr[x|y ])) Example: a random 32-bit integer has H(X ) = 32, but if we know
y ∈Y y ∈Y x∈X that it is odd, the uncertainty is reduced by 1 bit.
XX
= − Pr[y ]Pr[x|y ] log2 (Pr[x|y ])
y ∈Y x∈X

i.e, the average amount of information about X which is not


revealed by Y.

5 6

Use entropy in cryptanalysis Key equivocation


For a cryptosystem hM, C, K, E, Di,
H(K|C) = H(K) + H(M) − H(C).
Proof: H(K, M, C) = H(C|(K, M)) + H(K, M) (by first fun fact).
The key and plaintext determine ciphertext uniquely (since
The key equivocation H(K|C) is the uncertainty about the key c = Ek (m)), so H(C|(K, M)) = 0. Thus H(K, M, C) = H(K, M),
when knowing the ciphertext. and since K and M are independent, H(K, M) = H(K) + H(M).
So H(K, M, C) = H(K) + H(M).
• if H(K|C) = 0: no uncertainty, can be broken Similarly since key and cipher determine plaintext uniquely
• usually limn→∞ H(K|Cn ): the longer the message, the easier (m = Dk (c)), so H(K, M, C) = H(K, C) (but K, C not
to break independent).
So

H(K|C) = H(K, C) − H(C) (by fun fact 1)


= H(K, M, C) − H(C) (by above)
= H(K) + H(M) − H(C) (by above)

7 8
Silly crypto revisited Spurious keys

A spurious key is a key which is not the correct one but which
produces a “meaningful” message.
For the silly crypto example, H(M) ≈ 0.81, H(K) = 1.5,
H(C) ≈ 1.85, so H(K|C) ≈ 0.46 (by above). Example: shift cipher, c=”WNAJW”. Shift 5 gives “river”, shift 22
Exercise: verify, using definitions. gives “arena”.

Goal: find a bound on expected number of spurious keys for a


cipher.

9 10

Language entropy and redundancy Crypto


For a natural language L, and Pn a random variable of language
strings of length n (i.e. n-grams),
• the entropy (per letter) of L is
The number of spurious keys s n is the number of keys which, relate
H(Pn ) a given ciphertext to a plaintext with a non-zero probability:
HL = lim
n→∞ n X
sn = ( Pr[c] · |K (c)|) − 1
c∈C n
• the redundancy of L is
HL where K (c) = {k ∈ K : ∃m ∈ M n s.t. Ek (m) = y and Pr[m] > 0}.
RL = 1 − H(K|Cn ) = H(K) + H(Pn ) − H(Cn ).
log2 |P|
H(Pn ) ≈ nHL = n(1 − RL ) log2 |P| (for large n), and
H(Cn ) ≤ n log2 |C |, so if |C | = |P| we have
Note:
H(K|Cn ) ≥ H(K) − nRL log2 |P|.
• a random language would have entropy log2 |P| (e.g.
log2 26 ≈ 47), and thus redundancy 0.
• for English, H(P) ≈ 4.19, H(P2 )/2 ≈ 3.9, but for longer texts
1.0 ≤ HL ≤ 1.5, and HL = 1.25 gives RL ≈ 0.75.
11 12
Crypto cont Unicity distance
H(K|Cn ) ≥ H(K) − nRL log2 |P|.
The unicity distance n0 is the value of n where the expected
And
number of spurious keys s n becomes zero, i.e. the average amount
of ciphertext required to uniquely determine the key given enough
X X
H(K|Cn ) = Pr[c]H(K|c) ≤ Pr[c] log2 |K (c)|
c∈C n c∈C n computing time.
X
≤ log2 Pr[c] · |K (c)| log2 |K |
c∈C n n0 ≈
RL log2 |P|
= log2 (s n + 1)
Examples:
• substitution cipher with |P| = 26 and |K | = 26!; using
So log2 (s n + 1) ≥ H(K) − nRL log2 |P|, and if keys are equiprobable
RL = 0.75 for English, n0 ≈ 25 (but RL valid only for long
|K | messages).
sn ≥ −1
|P|nRL • one-time-pad: s n never approaches 0! Unconditionally secure!

13 14

You might also like