Professional Documents
Culture Documents
L2 Review of Basics
L2 Review of Basics
L2 Review of Basics
Probability basics
Information theory basics
Additional properties of random variables
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
Outline
Probability basics
Random variables, CDF, pdf, moments
Covariance, joint CDF and pdf
Information theory basics
Information, entropy, differential entropy
Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy
Additional properties of random variables
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
Outline
Probability basics
Random variables, CDF, pdf, moments
Covariance, joint CDF and pdf
Information theory basics
Information, entropy, differential entropy
Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy
Additional properties of random variables
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
and
Z
(2.1)
F (y ) = Pr ( y ) =
f (v )dv
where
F0 (x) = f (x)
and
F0 (y ) = f (y )
(2.2)
Outline
Probability basics
Information theory basics
Additional properties of random variables
and
Z
f (x)dx = 1
and
0 f (y )
f (x)dx = 1
(2.3)
f (x) = lim
h0
= lim
h0
F (x + h/2) F (x h/2)
h
P [ x + h/2] P [ x h/2]
h
= lim
h0
(2.4)
The larger f (x) the larger the probability that falls close to x.
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
(k)
m
(1)
(m ).
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
(x E [])k f (x)dx
(2.6)
Outline
Probability basics
Information theory basics
Additional properties of random variables
Mean =
(1)
M
n
1X
=
i
n
(2.7)
i=1
(2)
Variance = m
Standard Deviation
1 X
(1)
=
(i M )2
n1
i=1
(STD) = Variance
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
(2.8)
(2.9)
Outline
Probability basics
Information theory basics
Additional properties of random variables
n
1X
Skewness =
n
(1)
i M
i=1
(2) 3/2
= m
(Normalized) Kurtosis =
n
1X
n
i=1
(2) 2
= m
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
!3
STD
(3)
(2.10)
!
(1) 4
i M
STD
(4)
m 3
3
(2.11)
Outline
Probability basics
Information theory basics
Additional properties of random variables
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
c = E [( M )( M(1) )]
(2.13)
or
(2.14)
(2.15)
(2.16)
Outline
Probability basics
Information theory basics
Additional properties of random variables
(2.17)
(2.18)
where
f, (x, y ) =
2
F, (x, y )
xy
(2.19)
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
Marginal distributions
Z
F (x)
F (y )
= F, (x, )
= F, (, y )
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Z y
f, (u, v )dudv
(2.20)
f, (u, v )dudv
(2.21)
Outline
Probability basics
Information theory basics
Additional properties of random variables
Statistical independence
A pair of joint random variables and are statistically independent iff
their joint CDF factorizes into the product of the marginal CDFs:
F, (x, y ) = F (x) F (y )
(2.22)
F, (x, y ) =
Z x
(2.23)
f, (u, v )dudv
Z y
f (u)du
f (v )dv
= F (x) F (y )
(2.24)
Outline
Probability basics
Information theory basics
Additional properties of random variables
Conditional probability
Conditional probability
F| (x|y ) = F, (x, y )/F (y )
f| (x|y ) = f, (x, y )/f (y )
(2.25)
(2.26)
f| (x|y )f (y )dy
Z
x f| (x|y )dx
E [|] =
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
Outline
Probability basics
Random variables, CDF, pdf, moments
Covariance, joint CDF and pdf
Information theory basics
Information, entropy, differential entropy
Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy
Additional properties of random variables
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
Information
Let X be a discrete random variable, taking its values from
H
Pn= {x1 , x2 , . . . , xn }. Let pi be the probability of xi , pi 0 and
i=1 pi = 1. The amount of information in observing the event X = xi
was defined by Shannon as
I (xi ) = loga (pi )
(2.27)
(2.28)
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
Shannon entropy
The expected value of I (xi ) over the entire set of events H is the entropy
of X :
H(X ) =
n
X
pi log (pi )
(2.29)
i=1
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
Shannon entropy
(2.30)
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
Differential entropy
The differential entropy of a continuous random variable.
A quantity analogous to the entropy of a discrete random variable can be
defined for continuous random variables.
Reminder: The random variable is continuous if its distribution
function, F (x) is continuous.
Definition: Let f (x) = F0 (x) be the pdf of . The set S : {x|f (x) > 0}
is called the support set of . The differential entropy of is defined as
Z
h() = f (x)logf (x)dx
= E [logf (x)]
(2.32)
S
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
(2.33)
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
over all pdfs f (x) of the r.v. , subject to the following constraints:
f (x) 0
(2.35)
f (x)dx = 1
(2.36)
f (x)gi (x)dx = i
for i = 2, . . . , m
(2.37)
where f (x) = 0 holds outside of the support set of , and i are the
prior knowledge about .
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
J(f ) =
(i = 1, 2, . . . , m) to formulate the
m
X
f (x)logf (x) + 1 f (x) +
i gi (x)f (x) dx
(2.38)
i=2
m
X
i gi (x) = 0
(2.39)
i=2
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
(2.40)
i=2
Example: If the prior knowledge consists of the mean and the variance
2 of then
Z
f (x)(x )2 dx = 2
(2.41)
Outline
Probability basics
Information theory basics
Additional properties of random variables
(2.42)
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
(2.44)
Outline
Probability basics
Information theory basics
Additional properties of random variables
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
Mutual information
Mutual information is the measure of the amount of information one
random variable contains about another.
Z Z
f, (x, y )
I (; ) =
f, (x, y )log
dxdy
(2.46)
f
(x)f (y )
The concept of mutual information is of profound importance in the
design of self-organizing systems, where the objective is to develop an
algorithm that can learn an input-output relationship of interest on the
basis of the input patterns alone. The Maximum mutual information
principle of Linsker (1988) states that the synaptic connections of a
multilayered ANN develop such as to maximize the amount of information
that is preserved when signals are transformed at each processing stage
of the network, subject to certain constraints. This is based on earlier
ideas that came from the recognition that encoding of data from a scene
for the purpose of redundancy reduction is related to the identification of
specific features in the scene (by the biological perceptual machinery).
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
f, (x, y )
dxdy
f
(x)f (y )
Z Z
f, (x, y )
=
f, (x, y )log
dxdy
f (y )
Z Z
I (; ) =
f, (x, y )log
= h() h(|)
= h() h(|)
= h() + h() h(, )
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
(2.47)
Outline
Probability basics
Information theory basics
Additional properties of random variables
I (; ) = I (; )
I (; ) 0
(2.48)
(2.49)
(2.50)
density #2
The larger the K-L distane the less independent and are, because f, (x, y ) and f (x)f (y ) are farther apart
= I is large. See also diagram in CF 2/2 or in Haykin p.494 (fig. 10.1) showing insightful relations among
mutual information, entropy, conditional and joint entropy.
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
Outline
Probability basics
Random variables, CDF, pdf, moments
Covariance, joint CDF and pdf
Information theory basics
Information, entropy, differential entropy
Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy
Additional properties of random variables
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
(2.51)
E [g ()h()] =
Z
=
= E [g ()]E [h()]
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
(2.52)
Outline
Probability basics
Information theory basics
Additional properties of random variables
Uncorrelatedness vs independece
Statistical independence is a stronger property than uncorrelatedness.
and are uncorrelated iff
E [] = E []E []
(2.53)
(2.54)
or equivalently,
Rxy = E [xyT ] = E [x]E [yT ] = Mx(1) My(1)
(2.55)
Outline
Probability basics
Information theory basics
Additional properties of random variables
(2.56)
Mx(1) = 0 and Cx = Rx = I
(2.57)
i.e., having zero mean and unit covariance (and hence unit correlation)
matrix are called white. Whiteness is a stronger property than
uncorrelatedness but weaker than statistical independence.
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
Outline
Probability basics
Information theory basics
Additional properties of random variables
(
f (x) =
1
ba
Gaussian:
f (x) =
x [a, b]
elsewhere
2
2
1
e (x) /2
2
(2.58)
(2.59)
|x|
e
2
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
>0
(2.60)
Outline
Probability basics
Information theory basics
Additional properties of random variables
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
f (x) =
|x|
e
2
>0
Outline
Probability basics
Information theory basics
Additional properties of random variables
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi
(2.62)