Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

COE 343 – Information theory and coding

Lecture 4: Coding of Information

Dr. E. T. Tchao
Quotes about Shannon
• ”What is information? Sidestepping questions about
meaning, Shannon showed that it is a measurable
commodity”.

• ”Today, Shannon’s insight help shape virtually all


systems that store, process, or transmit information
in digital form, from compact discs to computers,
from facsimile machines to deep space probes”.

• ”Information theory has also infilitrated fields


outside communications, including linguistics,
psychology, economics, biology, even the arts”.
Information Sources
Channel
Information (errors) Information
Symbols Encoding signal signal Decoding Symbols Destin-
Source 
Source/Channel + noise Channel/Source ation
s1,…,sq s1,…,sq

Noise
Example: Morse Code
telegraph wire
transmitter receiver
dots, dashes ∙─_
A, …, Z Encoding Decoding A, …, Z
spaces
keyer recognizer
shortwave radio

Example: ASCII Code

seven-bit Telephone seven-bit terminal


Character keyboard modem modem character
blocks wire blocks screen
Why Code information

• The general reasons for coding information are:

Coding for compressing data

Coding for ensuring the quality of the transmission in a


noisy condition

Coding for secrecy


Stochastic sources

• A source outputs symbols X1, X2, ...


• Each symbol take its value from an alphabet A = (a1,
Example 1: A text is a sequence of
a2Example
, …). 2: A (digitized) grayscale image is a
symbols each taking its value from the
sequence
• Model: P(X1,…,Xof symbols
) assumedeach
to taking
be known its value
N alphabet for all
from the
combinations. alphabet A = (0,1) or A = (0, …, 255).
A = (a, …, z, A, …, Z, 1, 2, …9, !, ?, …).

Source X1, X2, …


Two Special Cases
1. The Memoryless Source
 Each symbol independent of the previous
ones.
 P(X1, X2, …, Xn) = P(X1) ¢ P(X2) ¢ … ¢ P(Xn)
2. The Markov Source
 Each symbol depends on the previous one.
 P(X1, X2, …, Xn) = P(X1) ¢ P(X2|X1) ¢ P(X3|X2)
¢ … ¢ P(Xn|Xn-1)
The Markov Source

• A symbol depends only on the previous symbol, so the


source can be modelled by a state diagram.

0.7 b

a 0.5 A ternary source with


1.0 alphabet A = (a, b, c).
0.3
0.2 c
0.3
The Markov Source

• Assume we are in state a, i.e., Xk = a.


• The probabilities for the next symbol are:

0.7 b
P(Xk+1 = a | Xk = a) = 0.3

a 0.5 P(Xk+1 = b | Xk = a) = 0.7


1.0
0.3 P(Xk+1 = c | Xk = a) = 0
0.2 c
0.3
The Markov Source

• So, if Xk+1 = b, we know that Xk+2 will


equal c.

0.7 b
P(Xk+2 = a | Xk+1 = b) = 0

a 0.5 P(Xk+2 = b | Xk+1 = b) = 0


1.0
0.3 P(Xk+2 = c | Xk+1 = b) = 1
0.2 c
0.3
Definition of terms
Non-singular code: A code of a discrete
information source is said to be non-singular when
different source symbols map to different
codewords.

Non-Ambiguous codes: A code of a discrete


source is said to be non ambiguous if and only if
each sequence of codewords uniquely corresponds
to a single message
Prefix-free Set

• Let T be a subset of {0,1}*.

• Definition:
• T is prefix-free if for any distinct x,y 2 T,
• if |x| < |y|, then x is not a prefix of y

• Example:
• {000, 001, 1, 01} is prefix-free
• {0, 01, 10, 11, 101} is not.
Prefix-free Code for S
• Let S be any set.

• Definition: A prefix-free code for S is a prefix-free


set T and a 1-1 “encoding” function f: S -> T.

• The inverse function f-1 is called the “decoding


function”.

• Example: S = {apple, orange, mango}.


• T = {0, 110, 1111}.
f(apple) = 0, f(orange) = 1111, f(mango) = 110.
What is so cool
about prefix-free
codes?
Sending sequences of
elements of S over a
communications channel

Let T be prefix-free and f be an encoding


function. Wish to send <x1, x2, x3, …>

Sender: sends f(x1) f(x2) f(x3)…


Receiver: breaks bit stream into elements
of T and decodes using f-1
Sending information on a channel
• Example: S = {apple, orange, mango}.
• T = {0, 110, 1111}.
f(apple) = 0, f(orange) = 1111, f(mango) = 110.

• If we see
• 00011011111100…
• we know it must be
• 0 0 0 110 1111 110 0 …
• and hence
• apple apple apple mango orange mango
apple …
Morse Code is not Prefix-free!

• SOS encodes as …---…

A .- F ..-. K -.- P .--. U ..- Z --..


B -... G --. L .-.. Q --.- V ...-
C -.-. H .... M -- R .-. W .--
D -.. I .. N -. S ... X -..-
E. J .--- O --- T- Y -.--
Morse Code is not Prefix-free!

• SOS encodes as …---…

• Could decode as: IAMIE

A .- F ..-. K -.- P .--. U ..- Z --..


B -... G --. L .-.. Q --.- V ...-
C -.-. H .... M -- R .-. W .--
D -.. I .. N -. S ... X -..-
E. J .--- O --- T- Y -.--
Unless you use pauses

• SOS encodes as … --- …

A .- F ..-. K -.- P .--. U ..- Z --..


B -... G --. L .-.. Q --.- V ...-
C -.-. H .... M -- R .-. W .--
D -.. I .. N -. S ... X -..-
E. J .--- O --- T- Y -.--
Properties
Any prefix-free code is non-ambiguous

There exist some non-ambiguous codes which are


not prefix-free
A codeword is said to be instantaneously decodable
if and only if each codeword in any string of code
words can be decoded as soon as its end is reached

A code is instantaneously decodable if and only if it


is prefix-free
Coding tree

• A coding tree is a n-ary tree, the arcs of which are


labelled with letters of a given alphabet of size n, in
such a way that each letter appears at most once of
a given node.
N-ARY TREES FOR CODING
0 1

0 1 0 1

0 1 0 1

An n-ary tree is a tree in which each


interior node has arity n.
Representing prefix-free codes

A = 100
0 1
B = 010
C = 101 0 1 0 1
D = 011
É F
É = 00 0 1 0 1

F = 11
B D A C

“CAFÉ” would encode as 1011001100


Example 0 1

0 1 0 1

É 0 1 0 1 F

B D A C
• If you see: 1000101000111011001100

• can decode as:


0 1

0 1 0 1

É 0 1 0 1 F

B D A C
• If you see: 1000101000111011001100

• can decode as: A


0 1

0 1 0 1

É 0 1 0 1 F

B D A C
• If you see: 1000101000111011001100

• can decode as: AB


0 1

0 1 0 1

É 0 1 0 1 F

B D A C
• If you see: 1000101000111011001100

• can decode as: ABA


0 1

0 1 0 1

É 0 1 0 1 F

B D A C
• If you see: 1000101000111011001100

• can decode as: ABAD


0 1

0 1 0 1

É 0 1 0 1 F

B D A C

• If you see: 1000101000111011001100

• can decode as: ABADC


0 1

0 1 0 1

É 0 1 0 1 F

B D A C
• If you see: 1000101000111011001100

• can decode as: ABADCA


0 1

0 1 0 1

É 0 1 0 1 F

B D A C
• If you see: 1000101000111011001100

• can decode as: ABADCAF


0 1

0 1 0 1

É 0 1 0 1 F

B D A C
• If you see: 1000101000111011001100

• can decode as: ABADCAFÉ


Prefix-free codes
are yet another
representation of a
decision tree.

Theorem:

S has a decision tree of depth d

if and only if

S has a prefix-free code with all


codewords bounded by length d
0 1

0 1 0 1

É F 0 1 0 1
0 1
B D A C 0 1
B D A C
É G F
H

Theorem:

S has a decision tree of depth d

if and only if

S has a prefix-free code with all


codewords bounded by length d
Let S be any prefix-free code.

Kraft Inequality:
w2 S D-li · 1

There exists a D-ary prefix-free


code of N codewords and whose
codeword length are the positive
integers l1, l2,……lN
if and only if w2 S D-li · 1

You might also like