Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Stochastic Processes for Physicists

Understanding Noisy Systems

Chapter 1: A review of probability theory


Paul Kirk
Division of Molecular Biosciences, Imperial College London

19/03/2013

1.1 Random variables and mutually exclusive events


Random variables
Suppose we do not know the precise value of a variable, but may

have an idea of the relative likelihood that it will have one of a


number of possible values.
Let us call the unknown quantity X .
This quantity is referred to as a random variable.

Probability
6-sided die. Let X be the value we get when we roll the die.
Describe the likelihood that X will have each of the values 1, . . . , 6

by a number between 0 and 1: the probability.


If Prob(X = 3) = 1, then we will always get a 3.
If Prob(X = 3) = 2/3, then we expect to get a 3 about two-thirds

of the time.
Paul Kirk

1 of 22

1.1 Random variables and mutually exclusive events


Mutually exclusive events
The various values of X are an example of mutually exclusive

events.
X can take precisely one of the values between 1 and 6.
Mutually exclusive probabilities sum
of probability
theory = 4).
Prob(X = 3 or X A
= review
4) = Prob(X
= 3) + Prob(X

Paul Kirk

2 of 22

1.1 Random variables and mutually exclusive events


Note: in mathematics texts it is customary to denote the unknown
quantity using a capital letter, say X , and a variable that specifies one
of the possible values that X may have as the equivalent lower-case
letter, x. We will use this convention in this chapter, but in the
following chapters we will use a lower-case letter for both the unknown
quantity and the values it can take, since it causes no confusion.

So, rather than writing Prob(X = 3) or Prob(X = x), we will (in later
chapters) tend to write Prob(3) or Prob(x).

Paul Kirk

3 of 22

1.1 Random variables and mutually exclusive events


Note: in mathematics texts it is customary to denote the unknown
quantity using a capital letter, say X , and a variable that specifies one
of the possible values that X may have as the equivalent lower-case
letter, x. We will use this convention in this chapter, but in the
following chapters we will use a lower-case letter for both the unknown
quantity and the values it can take, since it causes no confusion.

So, rather than writing Prob(X = 3) or Prob(X = x), we will (in later
chapters) tend to write Prob(3) or Prob(x).

Warning: may cause confusion.

Paul Kirk

3 of 22

1.1 Random variables and mutually exclusive events


Continuous random variables
For continuous random variables, the probability for X to be within

a range is found by integrating the probability density function.


R
Prob(a < X < b) = ab P(x)dx
2
A review of probability theory

Figure 1.1. An illustation of summing the probabilities of mutually exclusive


events, both for discrete and continuous random variables.
Paul Kirk

4 of 22

1.1 Random variables and mutually exclusive events

Expectation
The expectation of an arbitrary function, f (X ), with respect to the

probability density function P(X ) is


Z
hf (X )iP(X ) =
P(x)f (x)dx.

The mean or expected value of X is hX i.

Paul Kirk

5 of 22

1.1 Random variables and mutually exclusive events


Variance
The variance of X is the expectation of the squared difference from

the mean
Z
V [X ] =
P(x)(x hX i)2 dx
Z

=
P(x)(x 2 + hX i2 2xhX i)dx
Z
Z
Z

2
2
=
P(x)x dx +
P(x)hX i dx
P(x)2xhX idx


 

Z
Z
2
2
= hX i + hX i
P(x)dx 2hX i
P(x)xdx

= hX i + hX i 2hX i

= hX 2 i hX i2
Paul Kirk

6 of 22

1.2 Independence

Independence
Independent probabilities multiply
For independent variables, PX ,Y (x, y ) = PX (x)PY (y )
For independent variables, hXY i = hX ihY i

Paul Kirk

7 of 22

1.3 Dependent random variables

Dependence
If X and Y are dependent, then PX ,Y (x, y ) does not factor as the

product of PX (x) and PY (y )


If we know PX ,Y (x, y ) and want to know PX (x), then it is obtained

by integrating out (or marginalising) the other variable:


Z
PX (x) =
PX ,Y (x, y )dy

Paul Kirk

8 of 22

1.3 Dependent random variables


Conditional probability densities
The probability density for X given that we know that Y = y is

written P(X = x|Y = y ) or P(x|y ) and is referred to as the


conditional probability density for X given Y .
PX |Y (X = x|Y = y ) =

PX ,Y (X = x, Y = y )
.
PY (Y = y )

Explanation: To see how to calculate this conditional probability, we


note first that P(x, y ) with y = a gives the relative probability for
different values of x given that Y = a. To obtain the conditional
probability density for X given that Y = a, all we have to do is divide
P(x, a) by its integral over all values of x.

Paul Kirk

9 of 22

1.4 Correlations and correlation coefficients


The covariance of X and Y is:

cov (X , Y ) = h(X hX i)(Y hY i)i = hXY i hX ihY i.


Idea:
1. How can we define what it means for a value x to be bigger than
usual? Well, we can see if x > hX i i.e. if x hX i > 0.
2. Similarly, we can say that a value x is smaller than usual if x < hX i
i.e. if x hX i < 0.
3. If x tends to be bigger than usual when y is bigger than usual,
then h(X hX i)(Y hY i)i will be > 0
The correlation is just a normalised version of the covariance,

which takes values in the range 1 to 1:


CXY =

Paul Kirk

hXY i hX ihY i
p
V [X ]V [Y ]
10 of 22

1.5 Adding random variables together


When we have two continuous random variables, X and Y , with
probability densities PX and PY , it is often useful to be able to
calculate the probability density of the random variable whose value is
the sum of them: Z = X + Y . It turns out that the probability
density for Z is given by
PZ (z) =

Paul Kirk

PX (s

z)PY (s)ds = PX PY

11 of 22

1.5 Adding random variables together


When we have two continuous random variables, X and Y , with
probability densities PX and PY , it is often useful to be able to
calculate the probability density of the random variable whose value is
the sum of them: Z = X + Y . It turns out that the probability
density for Z is given by
PZ (z) =

Paul Kirk

PX (s

z)PY (s)ds = PX PY

11 of 22

1.5 Adding random variables together


When we have two continuous random variables, X and Y , with
probability densities PX and PY , it is often useful to be able to
calculate the probability density of the random variable whose value is
the sum of them: Z = X + Y . It turns out that the probability
density for Z is given by
PZ (z) =
Z

PX (s

z)PY (s)ds = PX PY

PX ,Y (z s, s)ds =

PZ (z) =

Paul Kirk

PX ,Y (s, z s)ds

11 of 22

1.5 Adding random variables together


When we have two continuous random variables, X and Y , with
probability densities PX and PY , it is often useful to be able to
calculate the probability density of the random variable whose value is
the sum of them: Z = X + Y . It turns out that the probability
density for Z is given by
PZ (z) =
Z

PX (s

z)PY (s)ds = PX PY

PX ,Y (z s, s)ds =

PZ (z) =

PX ,Y (s, z s)ds

If X and Y are independent, this becomes:


Z
PZ (z) =
PX (z s)PY (s)ds = PX PY

Paul Kirk

11 of 22

1.5 Adding random variables together


If X1 and X2 are random variables and X = X1 + X2 , then
hX i = hX1 i + hX2 i,
and if X1 and X2 are independent, then
V [X ] = V [X1 ] + V [X2 ].

Mysterious (?) assertion


Averaging the results of a number of independent measurements
produces a more accurate result. This is because the variances of the
different measurements add together. does this make sense?

Paul Kirk

12 of 22

1.5 Adding random variables together


Explanation
Assume all measurements have expectation, , and variance, 2 .
By the independence assumption, the variance of the average is:
" N
#
 
N
X Xn
X
Xn
V
=
V
.
N
N
n=1

n=1

Moreover,
 
 
 2    2
E Xn2 (E [Xn ])2
Xn
Xn
Xn
V [Xn ]
V
=
.
=E
E
=
N
N2
N
N2
N2
So,
"
V

N
X
Xn
n=1

Paul Kirk

#
=

N
X
V [Xn ]
n=1

N2

N
1 X 2 2
= 2
.
=
N
N
n=1

13 of 22

1.6 Transformations of a random variable


Key assertion: If Y = g (X ), then:
Z

x=b

hf (Y )i =

y =g (b)

PX (x)f (g (x))dx =
x=a

Paul Kirk

PY (y )f (y )dy .
y =g (a)

14 of 22

1.6 Transformations of a random variable


Key assertion: If Y = g (X ), then:
Z

x=b

hf (Y )i =

y =g (b)

PX (x)f (g (x))dx =
x=a

PY (y )f (y )dy .
y =g (a)

Given this assumption, everything else falls out automatically:


Z x=b
Z y =g (b)
dx
hf (Y )i =
PX (x)f (g (x))dx =
PX (g 1 (y ))f (y ) dy
dy
x=a
y =g (a)
Z y =g (b)
PX (g 1 (y ))
f (y )dy .
=
0 1 (y ))
y =g (a) g (g

Paul Kirk

14 of 22

1.6 Transformations of a random variable


Key assertion: If Y = g (X ), then:
Z

x=b

hf (Y )i =

y =g (b)

PX (x)f (g (x))dx =
x=a

PY (y )f (y )dy .
y =g (a)

Given this assumption, everything else falls out automatically:


Z x=b
Z y =g (b)
dx
hf (Y )i =
PX (x)f (g (x))dx =
PX (g 1 (y ))f (y ) dy
dy
x=a
y =g (a)
Z y =g (b)
PX (g 1 (y ))
f (y )dy .
=
0 1 (y ))
y =g (a) g (g
General result (for invertible g ):
PY (y ) =
Paul Kirk

PX (g 1 (y ))
.
|g 0 (g 1 (y ))|
14 of 22

1.7 The distribution function

The probability distribution function, which we will call D(x), of a


random variable X is defined as the probability that X is less than or
equal to x. Thus
Z x
D(x) = Prob(X x) =
P(z)dz

In addition, the fundamental theorem of calculus tells us that


P(x) =

Paul Kirk

d
D(x).
dx

15 of 22

1.8 The characteristic function


The characteristic function is defined as the Fourier transform of the
probability density.
Z
(s) =
P(x) exp (isX )dx.

The inverse transform gives:


P(x) =

1
2

(s) exp (isx)ds.

The Fourier transform of the convolution of two functions, P(x) and


Q(x), is the product of their Fourier transforms, P (s) and Q (s).
For discrete random variables, the characteristic function is a sum.
In general (for both discrete and continuous r.v.s), we have:
P (s) = hexp (isx)iP(X ) .
Paul Kirk

16 of 22

1.9 Moments and cumulants


Moment generating function (departure from the book)
The moment generating function is defined as:
M(t) = hexp(tX )i,
where X is a random variable, and the expectation is with respect to
some density P(X ), so that
Z
M(t) =
exp(tx)P(x)dx


Z 
1
=
1 + tx + t 2 x 2 + ... P(x)dx
2!

1
1
= 1 + tm1 + t 2 m2 + . . . + t r mr + . . .
2!
r!
where mr = hX r i is the r -th (raw) moment.
Paul Kirk

17 of 22

1.9 Moments and cumulants


Moment generating function (continued)
M(t) = 1 + tm1 +

1 2
1
t m2 + . . . + t r mr + . . .
2!
r!

It follows from the above expansion that:


M(0) = 1
M 0 (0) = m1
M 00 (0) = m2
..
.
M (r ) (0) = mr

Paul Kirk

18 of 22

1.9 Moments and cumulants


Cumulant generating function (continued departure from book)
The log of the moment generating function is called the cumulant
generating function,
R(t) = ln(M(t)).
By the chain rule of differentiation, we can write down the derivatives
of R(t) in terms of the derivatives of M(t), e.g.
R 0 (t) =
00

R (t) =

M 0 (t)
M(t)
M(t)M 00 (t) (M 0 (t))2
(M(t))2

Note that R(0) = 1, R 0 (0) = M 0 (0) = m1 = ,


R 00 (0) = M 00 (0) (M 0 (0))2 = m2 m12 = 2 , . . .
These are the cumulants.
Paul Kirk

19 of 22

1.9 Moments and cumulants


The moments can be calculated from the derivatives of the
characteristic function, evaluated at s = 0. We can see this by
expanding the characteristic function as a Taylor series:
(s) =

X
(n) (0)s n
n=0

n!

where (n) (s) is the n-th derivative of (s). But we also have:
*
+
n
X (isX )n
X
i hX n is n
isX
(s) = he i =
=
n!
n!
n=0

Equating the 2 expressions, we get: hX n i =

Paul Kirk

n=0

(n) (0)
in

20 of 22

1.9 Moments and cumulants

Cumulants
The n-th order cumulant of X , is the n-th derivative of the log of the
characteristic function.
For independent random variables, X and Y , if Z = X + Y then the
n-th cumulant of Z is the sum of the n-th cumulants of X and Y .
The Gaussian distribution is also the only absolutely continuous
distribution all of whose cumulants beyond the first two (i.e. other
than the mean and variance) are zero.

Paul Kirk

21 of 22

1.10 The multivariate Gaussian


Let x = [x! , . . . , xN ]> , then the general form of the Gaussian pdf is:


1
1
> 1
exp (x ) (x ) ,
P(x) = p
2
(2)N det()
where is the mean vector and is the covariance matrix.
All higher moments of a Gaussian can be written in terms of the
means and covariances. Defining X X hX i, for a 1-dimensional
Gaussian we have:
hX 2n i =

(2n 1)!(V [X ])n


2n1 (n 1)!

hX 2n1 i = 0

Paul Kirk

22 of 22

You might also like