L2 Review of Basics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Outline

Probability basics
Information theory basics
Additional properties of random variables

Review of Statistics and Information Theory


Basics
Lecture 2, COMP / ELEC / STAT 502
c E. Merenyi

January 12, 2012

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Outline

Probability basics
Random variables, CDF, pdf, moments
Covariance, joint CDF and pdf
Information theory basics
Information, entropy, differential entropy
Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy
Additional properties of random variables

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Random variables, CDF, pdf, moments


Covariance, joint CDF and pdf

Outline

Probability basics
Random variables, CDF, pdf, moments
Covariance, joint CDF and pdf
Information theory basics
Information, entropy, differential entropy
Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy
Additional properties of random variables

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Random variables, CDF, pdf, moments


Covariance, joint CDF and pdf

CDF of random variable


Let , be continuous scalar random variables (r.v.-s).
(Cumulative) probability distribution function (CDF)
Z x
F (x) = Pr ( x) =
f (u)du

and
Z

(2.1)

F (y ) = Pr ( y ) =

f (v )dv

where
F0 (x) = f (x)

and

F0 (y ) = f (y )

(2.2)

and exist almost everywhere for continuous r.v.-s. by definition.


F (x) is monotoniously non-decreasing; right-continuous, and
F () = 0, F () = 1.
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Random variables, CDF, pdf, moments


Covariance, joint CDF and pdf

pdf of random variable


Probability density function (pdf)
f (x) and f (y ) are the pdfs of and , respectively. Then
0 f (x)
Z

and
Z

f (x)dx = 1

and

0 f (y )
f (x)dx = 1

(2.3)

and f (x) and f (y ) are continuous almost everywhere.


Interpretation of f (x) as probability:

f (x) = lim

h0

= lim

h0

F (x + h/2) F (x h/2)
h
P [ x + h/2] P [ x h/2]
h

= lim

h0

P [x h/2) < x + h/2]


h

(2.4)

The larger f (x) the larger the probability that falls close to x.

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Random variables, CDF, pdf, moments


Covariance, joint CDF and pdf

Moments of random variable


(k)

The kth moment M of is defined as the expected value of the kth


power of the variable:
Z
Z
(k)
M = E [ k ] =
x k dF (x) =
x k f (x)dx
(2.5)

The kth central moment,

(k)
m

is defined similarly except for a shift to

(1)
(m ).

obtain zero mean


Z
Z
(k)
k
m =
(x E []) dF (x) =

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

(x E [])k f (x)dx

(2.6)

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Random variables, CDF, pdf, moments


Covariance, joint CDF and pdf

Moments of random variable


Quantities of interest related to the first four moments, as estimated
from a sample, are

Mean =

(1)
M

n
1X
=
i
n

(2.7)

i=1

(2)

Variance = m

Standard Deviation

1 X
(1)
=
(i M )2
n1
i=1

(STD) = Variance

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

(2.8)
(2.9)

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Random variables, CDF, pdf, moments


Covariance, joint CDF and pdf

Moments of random variable

n
1X
Skewness =
n

(1)

i M

i=1
(2) 3/2

= m
(Normalized) Kurtosis =

n
1X
n

i=1
(2) 2

= m

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

!3

STD
(3)

(2.10)
!
(1) 4

i M
STD
(4)

m 3

3
(2.11)

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Random variables, CDF, pdf, moments


Covariance, joint CDF and pdf

Moments of random variable

Kurtosis is often used as a quantitative measure of non-Gaussianity. If


kurt() denotes the Kurtosis of ,

< 0 for subgaussian or platykurtic distribution


kurt() = = 0 for gaussian or mesokurtic distribution
(2.12)

> 0 for supergaussian or leptokurtic distribution

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Random variables, CDF, pdf, moments


Covariance, joint CDF and pdf

Covariance of random variables


The covariance of random variables and is defined as
(1)

c = E [( M )( M(1) )]

(2.13)

or in general, for the elements xi of random vector x


cij = E [(xi Mx(1)
)(xj Mx(1)
)]
i
j

or

Cx = E [(x Mx(1) )(x Mx(1) )T ]

(2.14)
(2.15)

Similarly, the cross-covariance of random vectors x, y is


Cxy = E [(x Mx(1) )(y My(1) )T ]
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

(2.16)

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Random variables, CDF, pdf, moments


Covariance, joint CDF and pdf

Joint CDF and pdf of random variables


Joint CDF and pdf of (continuous) random variables and
F, (x, y ) = Pr ( x y )
Z x Z y
F, (x, y ) =
f, (u, v )dudv

(2.17)
(2.18)

where
f, (x, y ) =

2
F, (x, y )
xy

(2.19)

exists almost everywhere, and it is the joint pdf of and . Properties of


F, (x, y ) and f, (x, y ) are analogous to eqs (2.1)(2.3).

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Random variables, CDF, pdf, moments


Covariance, joint CDF and pdf

Marginal distributions

Z
F (x)
F (y )

= F, (x, )
= F, (, y )

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Z y

f, (u, v )dudv

(2.20)

f, (u, v )dudv

(2.21)

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Random variables, CDF, pdf, moments


Covariance, joint CDF and pdf

Statistical independence
A pair of joint random variables and are statistically independent iff
their joint CDF factorizes into the product of the marginal CDFs:
F, (x, y ) = F (x) F (y )

(2.22)

For two jointly continuous random variables and , equivalently,


f, (x, y ) = f (x) f (y )

F, (x, y ) =

Z x

(2.23)

f, (u, v )dudv
Z y
f (u)du
f (v )dv

= F (x) F (y )

(2.24)

(Prove the other direction at home.)


c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Random variables, CDF, pdf, moments


Covariance, joint CDF and pdf

Conditional probability

Conditional probability
F| (x|y ) = F, (x, y )/F (y )
f| (x|y ) = f, (x, y )/f (y )

(2.25)
(2.26)

Obviously, F| (x|y ) = F (x) iff and are independent.


Equivalently, f| (x|y ) = f (x) iff and are independent.
Z
f (x) =

f| (x|y )f (y )dy

Z
x f| (x|y )dx

E [|] =

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Outline

Probability basics
Random variables, CDF, pdf, moments
Covariance, joint CDF and pdf
Information theory basics
Information, entropy, differential entropy
Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy
Additional properties of random variables

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Information
Let X be a discrete random variable, taking its values from
H
Pn= {x1 , x2 , . . . , xn }. Let pi be the probability of xi , pi 0 and
i=1 pi = 1. The amount of information in observing the event X = xi
was defined by Shannon as
I (xi ) = loga (pi )

(2.27)

then I is given in bits,


2
If a = e
then I is given in nats,

10 then I is given in digits.

(2.28)

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Shannon entropy
The expected value of I (xi ) over the entire set of events H is the entropy
of X :
H(X ) =

n
X

pi log (pi )

(2.29)

i=1

(Note that X in H(X ) is not an argument of a function but the label of


the random variable.) The convention 0 log 0 = 0 is used. The entropy
is the average amount of information gained by observing one event from
all possible events for the random variable X (the average information
per message), the average length of the shortest description of a random
variable, the amount of uncertainty, a measure of the randomness of X .

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Shannon entropy

Some basic properties of H(X ):


H(X ) = 0

iff pi = 1 for some i and pk = 0 for k 6= i.

(2.30)

(If an event occurs with probability 1 then there is no uncertainty.)


0 H(X ) log |H| with equality iff pi = 1/|H| = 1/n for all i. (2.31)
(Uniform distribution carries maximum uncertainty.)

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Differential entropy
The differential entropy of a continuous random variable.
A quantity analogous to the entropy of a discrete random variable can be
defined for continuous random variables.
Reminder: The random variable is continuous if its distribution
function, F (x) is continuous.
Definition: Let f (x) = F0 (x) be the pdf of . The set S : {x|f (x) > 0}
is called the support set of . The differential entropy of is defined as
Z
h() = f (x)logf (x)dx
= E [logf (x)]
(2.32)
S

where S is the support set of . The lower case letter h is used to


distinguish differential entropy from the discrete (or absolute) entropy H.

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Discrete vs differential entropy

The differential entropy of a continuous variable can be derived as a


limiting case for the discrete entropy of the quantized version of . (See,
for example, Cover and Thomas, p. 228.) If = 2n is the bin size of
the quantization, and the quantized version of is denoted by then
H( ) + log h() as n ( 0)

(2.33)

That is, the absolute entropy of an n-bit quantization of a continuous


random variable is approximately h() log 2n = h() + n. It follows
that the absolute entropy of a continuous (non-quantized) variable is .

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

The Maximum Entropy principle

Consider the following: suppose we have a stochastic system with a set


of known states but unknown probabilities, and we also know some
constraints on the distribution of the states. How can we choose the
probability model that is optimum in some sense among the possibly
infinite number of models that may satisfy the constraints?
The Maximum Entropy principle (due to Jaynes, 1957) states that
When an inference is made on the basis of incomplete information, it
should be drawn from the probability ditribution that maximizes the
entropy, subject to constraints on the distribution.

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Finding the Maximum Entropy distribution


Solving this problem, in general, involves the maximization of the
differential entropy
Z
h() =
f (x)logf (x)dx
(2.34)

over all pdfs f (x) of the r.v. , subject to the following constraints:

f (x) 0

(2.35)

f (x)dx = 1

(2.36)

f (x)gi (x)dx = i

for i = 2, . . . , m

(2.37)

where f (x) = 0 holds outside of the support set of , and i are the
prior knowledge about .
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Finding the Maximum Entropy distribution


Using the Lagrange multipliers i
objective function
Z

J(f ) =

(i = 1, 2, . . . , m) to formulate the



m
X
f (x)logf (x) + 1 f (x) +
i gi (x)f (x) dx

(2.38)

i=2

Differentiation of the integrand with respect to f and setting the results


to 0 produces
1 logf (x) + 1 +

m
X

i gi (x) = 0

(2.39)

i=2

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Finding the Maximum Entropy distribution


Solving this for f (x) yields the maximum entropy distribution for the
given constraints:


m
X
f (x) = exp 1 + 1 +
i gi (x)

(2.40)

i=2

Example: If the prior knowledge consists of the mean and the variance
2 of then
Z
f (x)(x )2 dx = 2
(2.41)

and from the constraints it follows that g2 (x) = (x )2 and 2 = 2 .


The Maximum Entropy distribution for this case is the Gaussian
distribution as shown on page 28 of Colin Fyfe 2 (CF2), or Haykin,
Chapter 10, page 491.
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Joint differential entropy

Joint differential entropy (C-T 230)


Z Z
f, (u, v )logf, (u, v )dudv
h(, ) =

(2.42)

For more details on joint differential entropy, see C-T, p. 230.

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Conditional differential entropy


In CF 2/2, you have seen the formula of the conditional entropy of
discrete random variable X given Y . It represents the amount of
uncertainty remaining about the system input X after the system output
Y has been observed. Analogous to that is the conditional differential
entropy for continuous random variables:
Z Z
f, (x, y )logf, (x|y )dxdy
(2.43)
h(|) =

Since f, (x|y ) = f, (x, y )/f (y )


h(|) = h(, ) h().

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

(2.44)

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Relative entropy (Kullback-Leibler distance)


The relative entropy is an entropy based measure of the difference
between two pdfs f (x) and g (x), which is defined as
Z
f (x)
dx
(2.45)
D(f k g ) =
f (x)log
g
(x)

Note that D(f k g ) is finite only if the support set of f is contained in


the support set of g . The K-L distance is a measure of the penalty
(error) for assuming g (x) when in reality the density function of is
f (x). The convention 0 log 00 = 0 is used.
It follows (can be proven) that D(f k g ) 0 with equality iff f (x) =
g (x) almost everywhere. The relative entropy or K-L distance is also
called the K-L divergence.

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Mutual information
Mutual information is the measure of the amount of information one
random variable contains about another.
Z Z
f, (x, y )
I (; ) =
f, (x, y )log
dxdy
(2.46)
f
(x)f (y )

The concept of mutual information is of profound importance in the
design of self-organizing systems, where the objective is to develop an
algorithm that can learn an input-output relationship of interest on the
basis of the input patterns alone. The Maximum mutual information
principle of Linsker (1988) states that the synaptic connections of a
multilayered ANN develop such as to maximize the amount of information
that is preserved when signals are transformed at each processing stage
of the network, subject to certain constraints. This is based on earlier
ideas that came from the recognition that encoding of data from a scene
for the purpose of redundancy reduction is related to the identification of
specific features in the scene (by the biological perceptual machinery).
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Properties of Mutual Information


Z

f, (x, y )
dxdy
f
(x)f (y )

Z Z
f, (x, y )
=
f, (x, y )log
dxdy
f (y )

Z Z

f, (x, y )logf (x)dxdy



Z Z
= h(|)
(
f, (x, y )dy )logf (x)dx

Z
= h(|)
f (x)logf (x)dx

I (; ) =

f, (x, y )log

= h() h(|)
= h() h(|)
= h() + h() h(, )
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

(2.47)

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Information, entropy, differential entropy


Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy

Properties of Mutual Information

I (; ) = I (; )
I (; ) 0

(2.48)
(2.49)

I (; ) = D(f, (x, y ) k f (x)f (y ))

(2.50)

The last equation expresses mutual information as a special case of the


K-L distance.
I (; ) = D(f, (x, y ) k f (x)f (y ))
| {z } | {z }
density #1

density #2

The larger the K-L distane the less independent and are, because f, (x, y ) and f (x)f (y ) are farther apart
= I is large. See also diagram in CF 2/2 or in Haykin p.494 (fig. 10.1) showing insightful relations among
mutual information, entropy, conditional and joint entropy.

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Outline

Probability basics
Random variables, CDF, pdf, moments
Covariance, joint CDF and pdf
Information theory basics
Information, entropy, differential entropy
Maximum Entropy principle
Further entropy definitions
Mutual information and relation to entropy
Additional properties of random variables

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Sufficient condition of independece


Statistically independent random variables satisfy the following:
E [g ()h()] = E [g ()]E [h()]

(2.51)

where g () and h() are any absolutely integrable functions of and ,


respectively. This follows from
Z

E [g ()h()] =

Z
=

g (u)h(v )f, (u, v )dudv


Z
g (u)f (u)du
h(v )f (v )dv

= E [g ()]E [h()]

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

(2.52)

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Uncorrelatedness vs independece
Statistical independence is a stronger property than uncorrelatedness.
and are uncorrelated iff
E [] = E []E []

(2.53)

In general two random vectors x, y are uncorrelated iff


Cxy = E [(x Mx(1) )(y My(1) )T ] = 0

(2.54)

or equivalently,
Rxy = E [xyT ] = E [x]E [yT ] = Mx(1) My(1)

(2.55)

where Rxy is the correlation matrix of x, y. For zero mean the


correlation matrix is the same as the covariance matrix.
c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Uncorrelatedness, contd, and whiteness


As a special case, the components of a random vector x are mutually
uncorrelated iff
Cx = E [(x Mx(1) )(x Mx(1) )T ]
= D = Diag (c11 , c22 , . . . , cnn )
= Diag (12 , 22 , . . . , n2 )

(2.56)

Mx(1) = 0 and Cx = Rx = I

(2.57)

Random vectors with

i.e., having zero mean and unit covariance (and hence unit correlation)
matrix are called white. Whiteness is a stronger property than
uncorrelatedness but weaker than statistical independence.

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Density functions of frequently used distributions


Uniform:

(
f (x) =

1
ba

Gaussian:
f (x) =

x [a, b]
elsewhere

2
2
1
e (x) /2
2

(2.58)

(2.59)

where and 2 denote the mean and variance, respectively.


Laplacian:
f (x) =

|x|
e
2

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

>0

(2.60)

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Exponential power family of density functions

These density functions are special cases of the generalized gaussian or


exponential power family that has the general formula (for zero mean):


|x|
f (x) = C exp
(2.61)
E [|x| ]
As it is easy to verify, = 2 yields the Gaussian, = 1 the Laplacian,
and the uniform density function. The values of < 2 give rise
to supergaussian densities, > 2 to subgaussian ones.
1
(x)2 /2 2
f (x) =
e
2

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

f (x) =

|x|
e
2

>0

Review of Statistics and Information Theory Basics

Outline
Probability basics
Information theory basics
Additional properties of random variables

Uncorrelated but not independent example

Example of uncorrelated but not independent random variables:


Assume that and are discrete valued and their joint probabilities are
0.25 for any of (0,1), (0,-1), (1,0) and (-1,0). Then it is easy to calculate
that and are uncorrelated. However, they are not independent since
E [ 2 2 ] = 0 6= 0.25 = E [ 2 ]E [ 2 ]

c E. Mer
Lecture 2, COMP / ELEC / STAT 502
enyi

(2.62)

Review of Statistics and Information Theory Basics

You might also like