Professional Documents
Culture Documents
Introduction To Information Theory: Part 4-A
Introduction To Information Theory: Part 4-A
Part 4-A
Assignment#2 Results
Adventures of Tom Sawyer (Mark Twain)
0.25
(374,590 characters)
0.2
0.1
0.05
0
SPC e t a o n h i s r d l u w m y g c f b p k v j x q z
9/30/2019 2
1
9/30/2019
9/30/2019 3
H=0.8118
L=1
9/30/2019 4
2
9/30/2019
H=0.8118
L=1 L/2=0.84375
9/30/2019 5
9/30/2019 6
3
9/30/2019
9/30/2019 7
Lempel-Ziv Coding
• Sequences of text repeat patterns (words, phrases, etc)
9/30/2019 8
4
9/30/2019
9/30/2019 9
0 0 T T THIS-THESIS-IS-THE-THESIS.
9/30/2019 10
5
9/30/2019
0 0 T T THIS-THESIS-IS-THE-THESIS.
0 0 H H TH I S - T H E S I S - I S - T H E - T H E S I S .
9/30/2019 11
0 0 T T THIS-THESIS-IS-THE-THESIS.
0 0 H H TH I S - T H E S I S - I S - T H E - T H E S I S .
0 0 I I THIS-THESIS-IS-THE-THESIS.
9/30/2019 12
6
9/30/2019
0 0 T T THIS-THESIS-IS-THE-THESIS.
0 0 H H TH I S - T H E S I S - I S - T H E - T H E S I S .
0 0 I I THIS-THESIS-IS-THE-THESIS.
0 0 S S THIS-THESIS-IS-THE-THESIS.
9/30/2019 13
0 0 T T THIS-THESIS-IS-THE-THESIS.
0 0 H H TH I S - T H E S I S - I S - T H E - T H E S I S .
0 0 I I THIS-THESIS-IS-THE-THESIS.
0 0 S S THIS-THESIS-IS-THE-THESIS.
0 0 - - THIS-THESIS-IS-THE-THESIS.
9/30/2019 14
7
9/30/2019
0 0 T T THIS-THESIS-IS-THE-THESIS.
0 0 H H TH I S - T H E S I S - I S - T H E - T H E S I S .
0 0 I I THIS-THESIS-IS-THE-THESIS.
0 0 S S THIS-THESIS-IS-THE-THESIS.
0 0 - - THIS-THESIS-IS-THE-THESIS.
5 2 E E THIS-THE SIS-IS-THE-THESIS.
9/30/2019 15
0 0 T T THIS-THESIS-IS-THE-THESIS.
0 0 H H TH I S - T H E S I S - I S - T H E - T H E S I S .
0 0 I I THIS-THESIS-IS-THE-THESIS.
0 0 S S THIS-THESIS-IS-THE-THESIS.
0 0 - - THIS-THESIS-IS-THE-THESIS.
5 1 I SI THIS-THESI S-IS-THE-THESIS.
9/30/2019 16
8
9/30/2019
0 0 T T THIS-THESIS-IS-THE-THESIS.
0 0 H H TH I S - T H E S I S - I S - T H E - T H E S I S .
0 0 I I THIS-THESIS-IS-THE-THESIS.
0 0 S S THIS-THESIS-IS-THE-THESIS.
0 0 - - THIS-THESIS-IS-THE-THESIS.
5 1 I SI THIS-THESI S-IS-THE-THESIS.
9/30/2019 17
0 0 T T THIS-THESIS-IS-THE-THESIS.
0 0 H H TH I S - T H E S I S - I S - T H E - T H E S I S .
0 0 I I THIS-THESIS-IS-THE-THESIS.
0 0 S S THIS-THESIS-IS-THE-THESIS.
0 0 - - THIS-THESIS-IS-THE-THESIS.
5 1 I SI THIS-THESI S-IS-THE-THESIS.
9/30/2019 18
9
9/30/2019
0 0 T T THIS-THESIS-IS-THE-THESIS.
0 0 H H TH I S - T H E S I S - I S - T H E - T H E S I S .
0 0 I I THIS-THESIS-IS-THE-THESIS.
0 0 S S THIS-THESIS-IS-THE-THESIS.
0 0 - - THIS-THESIS-IS-THE-THESIS.
5 1 I SI THIS-THESI S-IS-THE-THESIS.
14 6 . THESIS. THIS-THESIS-IS-THE–THESIS.
9/30/2019 19
Lempel-Ziv Coding
• Sequences of text repeat patterns (words, phrases, etc)
• Achieves optimal rate of transmission in the long run w/o using probability dist.
9/30/2019 20
10
9/30/2019
Decode
Message
0 0 I
0 0 -
0 0 M
3 1 S
1 1 -
5 5 L
5 3 Y
9/30/2019 21
Decode
Message
0 0 I
I
0 0 -
0 0 M
3 1 S
1 1 -
5 5 L
5 3 Y
9/30/2019 22
11
9/30/2019
Decode
Message
0 0 I
I
0 0 -
I-
0 0 M
3 1 S
1 1 -
5 5 L
5 3 Y
9/30/2019 23
Decode
Message
0 0 I
I
0 0 -
I-
0 0 M
I-M
3 1 S
1 1 -
5 5 L
5 3 Y
9/30/2019 24
12
9/30/2019
Decode
Message
0 0 I
I
0 0 -
I-
0 0 M
I-M
3 1 S
I-MIS
1 1 -
5 5 L
5 3 Y
9/30/2019 25
Decode
Message
0 0 I
I
0 0 -
I-
0 0 M
I-M
3 1 S
I-MIS
1 1 -
I-MISS-
5 5 L
5 3 Y
9/30/2019 26
13
9/30/2019
Decode
Message
0 0 I
I
0 0 -
I-
0 0 M
I-M
3 1 S
I-MIS
1 1 -
I-MISS-
5 5 L
I-MISS-MISS-L
5 3 Y
9/30/2019 27
Decode
Message
0 0 I
I
0 0 -
I-
0 0 M
I-M
3 1 S
I-MIS
1 1 -
I-MISS-
5 5 L
I-MISS-MISS-L
5 3 Y
I-MISS-MISS-LISSY
9/30/2019 28
14
9/30/2019
9/30/2019 29
CHANNEL
• Information Source
• Transmitter
• Channel
• Receiver
• Destination
9/30/2019 30
15
9/30/2019
Information Channel
Input Output
Channel
X Y
9/30/2019 31
Information Channel
Input Output
Channel
X Y
Cholesterol Levels Condition
of Arteries
9/30/2019 32
16
9/30/2019
Information Channel
Input Output
Channel
X Y
Symptoms Diagnosis
or Test results
9/30/2019 33
Information Channel
Input Output
Channel
X Y
Geological Structure Presence
of oil deposits
9/30/2019 34
17
9/30/2019
Information Channel
Input Output
Channel
X Y
Opinion Poll Next President
9/30/2019 35
Perfect Communication
(Discrete Noiseless Channel)
0 0
X Y
Transmitted Received
Symbol Symbol
1 1
9/30/2019 36
18
9/30/2019
NOISE
9/30/2019 37
Motivating Noise…
0 0
X Y
Transmitted Received
Symbol Symbol
1 1
9/30/2019 38
19
9/30/2019
Motivating Noise…
f = 0.1, n = ~10,000
1-f
0 0
f
f
1 1
1-f
9/30/2019 39
Motivating Noise…
Message: $5213.75
Received: $5293.75
20
9/30/2019
Message :$5213.75
Transmission 1: $ 5 2 9 3 . 7 5
Transmission 2: $ 5 2 1 3 . 7 5
Transmission 3: $ 5 2 1 3 . 1 1
Transmission 4: $ 5 4 4 3 . 7 5
Transmission 5: $ 7 2 1 8 . 7 5
9/30/2019 41
9/30/2019 42
21
9/30/2019
1. Guesswork is involved.
But it will almost never be wrong!
2. There is overhead.
A LOT of it!
9/30/2019 43
1 − 𝑝
0 0
𝑝 𝑌
𝑋
Transmitted Received
Symbol 𝑝 Symbol
1 1
1 − 𝑝
9/30/2019 44
22
9/30/2019
1 − 𝑝
0 0
𝑝 𝑌
𝑋
Transmitted Received
Symbol 𝑝 Symbol
1 1
1 − 𝑝
9/30/2019 45
𝑥1 p(𝑦1 |𝑥1 )
𝑥𝑠
𝑦𝑟
𝑠 × 𝑟 transition probabilities
𝑠 input symbols 𝑟 output symbols
9/30/2019 46
23
9/30/2019
1 − 𝑝 1 −𝑞 1 − 𝑝 1 − 𝑞 + 𝑝𝑞 𝑝(0|0)
0 0 0 0 0
𝑝 𝑞
1 − 𝑝 𝑞 + (1 − 𝑞)𝑝
𝑝 𝑞
1 1 1 1 1 𝑝(1|1)
1 − 𝑝 1 − 𝑝 1 − 𝑞 + 𝑝𝑞
1 −𝑞
9/30/2019 47
References
• Eugene Chiu, Jocelyn Lin, Brok Mcferron, Noshirwan Petigara, Satwiksai Seshasai: Mathematical Theory
of Claude Shannon: A study of the style and context of his work up to the genesis of information theory.
MIT 6.933J / STS.420J The Structure of Engineering Revolutions
• Luciano Floridi, 2010: Information: A Very Short Introduction, Oxford University Press, 2011.
• Luciano Floridi, 2011: The Philosophy of Information, Oxford University Press, 2011.
• James Gleick, 2011: The Information: A History, A Theory, A Flood, Pantheon Books, 2011.
• Zhandong Liu , Santosh S Venkatesh and Carlo C Maley, 2008: Sequence space coverage, entropy of
genomes and the potential to detect non-human DNA in human samples, BMC Genomics 2008, 9:509
• David Luenberger, 2006: Information Science, Princeton University Press, 2006.
• David J.C. MacKay, 2003: Information Theory, Inference, and Learning Algorithms, Cambridge University
Press, 2003.
• Claude Shannon & Warren Weaver, 1949: The Mathematical Theory of Communication, University of
Illinois Press, 1949.
• W. N. Francis and H. Kucera: Brown University Standard Corpus of Present-Day American English, Brown
University, 1967.
• Edward L. Glaeser: A Tale of Many Cities, New York Times, April 10, 2010. Available at:
http://economix.blogs.nytimes.com/2010/04/20/a-tale-of-many-cities/
• Alan Rimm-Kaufman, The Long Tail of Search. Search Engine Land Website, September 18, 2007.
Available at: http://searchengineland.com/the-long-tail-of-search-12198
48
24