2020 - 2 - Info Theoretic Models - Notes

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 37

Information theoretic

models
MIE523H1F
Anthony Soung Yee
anthony.soungyee@utoronto.ca
Topics covered
• Human as an information processing
channel
• Quantification of information

Weaver Shannon

2
Communication theory and
human information processing

3
Communication theory and
human information processing

4
Basic principle
The Human Information Processor can
be modelled as:

A noisy limited-capacity
communication system

5
Basic principle
• For any tasks, there is an absolute limit
to the capacity to “transmit” information
• Maximum absolute quantity of
information
• Maximum rate of information
transmission
• When working at capacity, one can
increase speed only at the expense of
accuracy (and vice versa) – a.k.a.
8
“speed-accuracy tradeoff”
https://cdn-images-
1.medium.com/max/2000/1*qqz45XGsMFdUa
wK7Kf7zfQ.jpe

Source: https://medium.com/@louiciano/7-manfaat-dan-5-resiko-yang-perlu-anda-
10 ketahui-memakai-aplikasi-transportasi-online-b3ec969d0ed6
in· for· ma·
tion
/ˌinfərˈmā SH(ə)n/: The reduction of
uncertainty

11
“The weather will be sunny
tomorrow”

12
Source: https://thenounproject.com/ratch0013/collection/weather/?oq=weather&cidx=0
Factors that influence amount
of information in source
1. Number of possible events
2. Probability of events occurring
3. Sequential constraints

13
1. Number of possible
events

The higher number of possible events,


the more information is contained
within each event.
14
Source: https://thenounproject.com/ratch0013/collection/weather/?oq=weather&cidx=0
2.Probabilities of those events
It will rain today in...
• Toronto, Canada
• Amsterdam, Nederlands
• Death Valley, USA

14
15
16
3.Sequel constraints or context
• The statement “It snowed in
Calgary on September 9, 2014”
might contain a lot of information.

• But knowing that it did snow, the


statement “It snowed in Calgary
for the third day in a row” contains
relatively less information.

17
Definition of “information”
• Shannon defined it as the reduction of
uncertainty
• Factors that influence amount of
information in source:
1. Number of possible events
2. Probability of events
occurring
3. Sequential constraints

18
More formally…
Factors that influence amount of information
in source:
1. Number of possible events (N)
2. Probability of events occurring (pi)
(a.k.a. distributional constraints)
3. Sequential constraints (a.k.a.
redundancy)

Average amount of info in source for a set


of events:
19
1. Number of possible events
• For N equally likely events, Have = log2N bits
• E.g. Coin flip Have = log22 = 1 bit

20
1. Number of possible events

Have is the average minimum number of


True-False questions that you would have to
be asked in order to reduce all uncertainty about
which of the equally likely events has occurred

e.g. For N = 8:

21
Pop quiz
Which information source
contains a greater Have?

23
A B
2. Probabilities of those events
• Define “surprisal” function as Hi =

• Weighted average information


from all N events is: Hi

0 1 pi
In other words
• High probability events > low information content
• Low probability events > high information content
23
Pop quiz
• For N = 4, with equally likely
outcomes, Have = log2N = log2(4)
= 2 bits

• If outcomes are not equally likely, is


Have:
• Higher
• Lower
• The same?
24
Weighted average
• Average information, Have conveyed by group of
events
• Example: 4 events with equal probabilities
(N = 4)
• Have = log2N = log2(4) = 2 bits

• When event probabilities are unequal


• Example: p {1/2, 1/4, 1/8, 1/8}

25
Weighted average
• Average information, Have conveyed by group of
events
• Example: 4 events with equal probabilities
(N = 4)
• Have = log2N = log2(4) = 2 bits

• When event probabilities are unequal


• Example: p {1/2, 1/4, 1/8, 1/8}

26
27
For two possible outcomes
(N = 2)
Maximum entropy

28
Information redundancy
Redundancy: percentage reduction of info
• Hmax is the theoretical maximum,
based on equal probabilities
• Have is reduced compared to Hmax

29
Information redundancy
• From earlier example of N=4
• Hmax = 2 bits
• Have = 1.75 bits

• % redundancy = (1 – 1.75/2) x
100 = 12.5%

30
Duh??

31
32
3. Sequential constraints (context)
• Depending on the context, a stimulus
may be more or less informative
• E.g. In June, it snowed in Calgary
for third day in a row
• Using conditional probabilities Pi | x

33
Information redundancy for
the English language

34
Psaele raed tihs out luod
Aoccdrnig to a rscheearch at Cmabrigde
Uinervtisy, it dseno't mtaetr in waht oerdr
the ltteres in a wrod are, the olny
iproamtnt tihng is taht the frsit and lsat
ltteer be in the rghit pclae.
The rset can be a taotl mses and you can
sitll raed it whotuit a pboerlm.
Tihs is bcuseae the huamn mnid deos
raed
not ervey lteter by istlef, but the wrod as
a wlohe. Azanmig huh?
35
Example: Have for English language
Based on actual
frequencies of letter
occurrences:

36
Using letter N-grams

37
Redundancy of English language letters
• English language is highly redundant
• Hmax = 4.7 bits
• Taking into account distributional and
sequential constraints, for letters of the
alphahet - Have ~ 1.5 bits

= (1- 1.5/4.7) = 0.68 = 68%

• You could do a similar analysis for


redundancy of English words
38
Summary
• Modeling human info. processor as a
limited-capacity noisy
communication system
• Defined “Information”
• Defined Have, how to compute it as
well as information redundancy

39

You might also like