Kirk20130318 Book Club StochProcesses

Stochastic Processes for Physicists
Understanding Noisy Systems
Chapter 1: A review of probability theory

Paul Kirk
Division of Molecular Biosciences, Imperial College London
19/03/2013
1.1 Random variables and mutually exclusive events

Random variables
Suppose we do not know the precise value of a variable, but may
have an idea of the relative likelihood that it will have one of a

number of possible values.
Let us call the unknown quantity X .
This quantity is referred to as a random variable.
Probability
6-sided die. Let X be the value we get when we roll the die.
Describe the likelihood that X will have each of the values 1, . . . , 6
by a number between 0 and 1: the probability.

If Prob(X = 3) = 1, then we will always get a 3.
If Prob(X = 3) = 2/3, then we expect to get a 3 about two-thirds
of the time.
Paul Kirk
1 of 22

Mutually exclusive events
The various values of X are an example of mutually exclusive
events.
X can take precisely one of the values between 1 and 6.
Mutually exclusive probabilities sum
of probability
theory = 4).
Prob(X = 3 or X A
= review
4) = Prob(X
= 3) + Prob(X
Paul Kirk
2 of 22

Note: in mathematics texts it is customary to denote the unknown
quantity using a capital letter, say X , and a variable that specifies one
of the possible values that X may have as the equivalent lower-case
letter, x. We will use this convention in this chapter, but in the
following chapters we will use a lower-case letter for both the unknown
quantity and the values it can take, since it causes no confusion.
So, rather than writing Prob(X = 3) or Prob(X = x), we will (in later
chapters) tend to write Prob(3) or Prob(x).
Paul Kirk
3 of 22

Note: in mathematics texts it is customary to denote the unknown
quantity using a capital letter, say X , and a variable that specifies one
of the possible values that X may have as the equivalent lower-case
letter, x. We will use this convention in this chapter, but in the
following chapters we will use a lower-case letter for both the unknown
quantity and the values it can take, since it causes no confusion.
So, rather than writing Prob(X = 3) or Prob(X = x), we will (in later
chapters) tend to write Prob(3) or Prob(x).
Warning: may cause confusion.
Paul Kirk
3 of 22

Continuous random variables
For continuous random variables, the probability for X to be within
a range is found by integrating the probability density function.

R
Prob(a < X < b) = ab P(x)dx
2
A review of probability theory
Figure 1.1. An illustation of summing the probabilities of mutually exclusive

events, both for discrete and continuous random variables.
Paul Kirk
4 of 22
Expectation
The expectation of an arbitrary function, f (X ), with respect to the
probability density function P(X ) is

Z
hf (X )iP(X ) =
P(x)f (x)dx.
The mean or expected value of X is hX i.
Paul Kirk
5 of 22

Variance
The variance of X is the expectation of the squared difference from
the mean
Z
V [X ] =
P(x)(x hX i)2 dx
Z
=
P(x)(x 2 + hX i2 2xhX i)dx
Z
Z
Z
2
2
=
P(x)x dx +
P(x)hX i dx
P(x)2xhX idx

Z
Z
2
2
= hX i + hX i
P(x)dx 2hX i
P(x)xdx
= hX i + hX i 2hX i
= hX 2 i hX i2
Paul Kirk
6 of 22
1.2 Independence
Independence
Independent probabilities multiply
For independent variables, PX ,Y (x, y ) = PX (x)PY (y )
For independent variables, hXY i = hX ihY i
Paul Kirk
7 of 22
1.3 Dependent random variables
Dependence
If X and Y are dependent, then PX ,Y (x, y ) does not factor as the
product of PX (x) and PY (y )

If we know PX ,Y (x, y ) and want to know PX (x), then it is obtained
by integrating out (or marginalising) the other variable:

Z
PX (x) =
PX ,Y (x, y )dy
Paul Kirk
8 of 22
1.3 Dependent random variables

Conditional probability densities
The probability density for X given that we know that Y = y is
written P(X = x|Y = y ) or P(x|y ) and is referred to as the

conditional probability density for X given Y .
PX |Y (X = x|Y = y ) =
PX ,Y (X = x, Y = y )
.
PY (Y = y )
Explanation: To see how to calculate this conditional probability, we

note first that P(x, y ) with y = a gives the relative probability for
different values of x given that Y = a. To obtain the conditional
probability density for X given that Y = a, all we have to do is divide
P(x, a) by its integral over all values of x.
Paul Kirk
9 of 22
1.4 Correlations and correlation coefficients

The covariance of X and Y is:
cov (X , Y ) = h(X hX i)(Y hY i)i = hXY i hX ihY i.

Idea:
1. How can we define what it means for a value x to be bigger than
usual? Well, we can see if x > hX i i.e. if x hX i > 0.
2. Similarly, we can say that a value x is smaller than usual if x < hX i
i.e. if x hX i < 0.
3. If x tends to be bigger than usual when y is bigger than usual,
then h(X hX i)(Y hY i)i will be > 0
The correlation is just a normalised version of the covariance,
which takes values in the range 1 to 1:

CXY =
Paul Kirk
hXY i hX ihY i
p
V [X ]V [Y ]
10 of 22
1.5 Adding random variables together

When we have two continuous random variables, X and Y , with
probability densities PX and PY , it is often useful to be able to
calculate the probability density of the random variable whose value is
the sum of them: Z = X + Y . It turns out that the probability
density for Z is given by
PZ (z) =
Paul Kirk
PX (s
z)PY (s)ds = PX PY
11 of 22

PZ (z) =
Paul Kirk
PX (s
z)PY (s)ds = PX PY
11 of 22

PZ (z) =
Z
PX (s
z)PY (s)ds = PX PY
PX ,Y (z s, s)ds =
PZ (z) =
Paul Kirk
PX ,Y (s, z s)ds
11 of 22

PZ (z) =
Z
PX (s
z)PY (s)ds = PX PY
PX ,Y (z s, s)ds =
PZ (z) =
PX ,Y (s, z s)ds
If X and Y are independent, this becomes:

Z
PZ (z) =
PX (z s)PY (s)ds = PX PY
Paul Kirk
11 of 22

If X1 and X2 are random variables and X = X1 + X2 , then
hX i = hX1 i + hX2 i,
and if X1 and X2 are independent, then
V [X ] = V [X1 ] + V [X2 ].
Mysterious (?) assertion

Averaging the results of a number of independent measurements
produces a more accurate result. This is because the variances of the
different measurements add together. does this make sense?
Paul Kirk
12 of 22

Explanation
Assume all measurements have expectation, , and variance, 2 .
By the independence assumption, the variance of the average is:
" N
#

N
X Xn
X
Xn
V
=
V
.
N
N
n=1
n=1
Moreover,

2 2
E Xn2 (E [Xn ])2
Xn
Xn
Xn
V [Xn ]
V
=
.
=E
E
=
N
N2
N
N2
N2
So,
"
V
N
X
Xn
n=1
Paul Kirk
#
=
N
X
V [Xn ]
n=1
N2
N
1 X 2 2
= 2
.
=
N
N
n=1
13 of 22
1.6 Transformations of a random variable

Key assertion: If Y = g (X ), then:
Z
x=b
hf (Y )i =
y =g (b)
PX (x)f (g (x))dx =
x=a
Paul Kirk
PY (y )f (y )dy .
y =g (a)
14 of 22

Z
x=b
hf (Y )i =
y =g (b)
PX (x)f (g (x))dx =
x=a
PY (y )f (y )dy .
y =g (a)
Given this assumption, everything else falls out automatically:

Z x=b
Z y =g (b)
dx
hf (Y )i =
PX (x)f (g (x))dx =
PX (g 1 (y ))f (y ) dy
dy
x=a
y =g (a)
Z y =g (b)
PX (g 1 (y ))
f (y )dy .
=
0 1 (y ))
y =g (a) g (g
Paul Kirk
14 of 22

Z
x=b
hf (Y )i =
y =g (b)
PX (x)f (g (x))dx =
x=a
PY (y )f (y )dy .
y =g (a)
Given this assumption, everything else falls out automatically:

Z x=b
Z y =g (b)
dx
hf (Y )i =
PX (x)f (g (x))dx =
PX (g 1 (y ))f (y ) dy
dy
x=a
y =g (a)
Z y =g (b)
PX (g 1 (y ))
f (y )dy .
=
0 1 (y ))
y =g (a) g (g
General result (for invertible g ):
PY (y ) =
Paul Kirk
PX (g 1 (y ))
.
|g 0 (g 1 (y ))|
14 of 22
1.7 The distribution function
The probability distribution function, which we will call D(x), of a

random variable X is defined as the probability that X is less than or
equal to x. Thus
Z x
D(x) = Prob(X x) =
P(z)dz
In addition, the fundamental theorem of calculus tells us that

P(x) =
Paul Kirk
d
D(x).
dx
15 of 22
1.8 The characteristic function

The characteristic function is defined as the Fourier transform of the
probability density.
Z
(s) =
P(x) exp (isX )dx.
The inverse transform gives:

P(x) =
1
2
(s) exp (isx)ds.
The Fourier transform of the convolution of two functions, P(x) and

Q(x), is the product of their Fourier transforms, P (s) and Q (s).
For discrete random variables, the characteristic function is a sum.
In general (for both discrete and continuous r.v.s), we have:
P (s) = hexp (isx)iP(X ) .
Paul Kirk
16 of 22
1.9 Moments and cumulants

Moment generating function (departure from the book)
The moment generating function is defined as:
M(t) = hexp(tX )i,
where X is a random variable, and the expectation is with respect to
some density P(X ), so that
Z
M(t) =
exp(tx)P(x)dx

Z
1
=
1 + tx + t 2 x 2 + ... P(x)dx
2!
1
1
= 1 + tm1 + t 2 m2 + . . . + t r mr + . . .
2!
r!
where mr = hX r i is the r -th (raw) moment.
Paul Kirk
17 of 22

Moment generating function (continued)
M(t) = 1 + tm1 +
1 2
1
t m2 + . . . + t r mr + . . .
2!
r!
It follows from the above expansion that:

M(0) = 1
M 0 (0) = m1
M 00 (0) = m2
..
.
M (r ) (0) = mr
Paul Kirk
18 of 22

Cumulant generating function (continued departure from book)
The log of the moment generating function is called the cumulant
generating function,
R(t) = ln(M(t)).
By the chain rule of differentiation, we can write down the derivatives
of R(t) in terms of the derivatives of M(t), e.g.
R 0 (t) =
00
R (t) =
M 0 (t)
M(t)
M(t)M 00 (t) (M 0 (t))2
(M(t))2
Note that R(0) = 1, R 0 (0) = M 0 (0) = m1 = ,

R 00 (0) = M 00 (0) (M 0 (0))2 = m2 m12 = 2 , . . .
These are the cumulants.
Paul Kirk
19 of 22

The moments can be calculated from the derivatives of the
characteristic function, evaluated at s = 0. We can see this by
expanding the characteristic function as a Taylor series:
(s) =
X
(n) (0)s n
n=0
n!
where (n) (s) is the n-th derivative of (s). But we also have:
*
+
n
X (isX )n
X
i hX n is n
isX
(s) = he i =
=
n!
n!
n=0
Equating the 2 expressions, we get: hX n i =
Paul Kirk
n=0
(n) (0)
in
20 of 22
Cumulants
The n-th order cumulant of X , is the n-th derivative of the log of the
characteristic function.
For independent random variables, X and Y , if Z = X + Y then the
n-th cumulant of Z is the sum of the n-th cumulants of X and Y .
The Gaussian distribution is also the only absolutely continuous
distribution all of whose cumulants beyond the first two (i.e. other
than the mean and variance) are zero.
Paul Kirk
21 of 22
1.10 The multivariate Gaussian

Let x = [x! , . . . , xN ]> , then the general form of the Gaussian pdf is:

1
1
> 1
exp (x ) (x ) ,
P(x) = p
2
(2)N det()
where is the mean vector and is the covariance matrix.
All higher moments of a Gaussian can be written in terms of the
means and covariances. Defining X X hX i, for a 1-dimensional
Gaussian we have:
hX 2n i =
(2n 1)!(V [X ])n

2n1 (n 1)!
hX 2n1 i = 0
Paul Kirk
22 of 22

Kirk20130318 Book Club StochProcesses

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kirk20130318 Book Club StochProcesses

Uploaded by

Copyright:

Available Formats

Stochastic Processes for Physicists

Understanding Noisy Systems

Chapter 1: A review of probability theory

1.1 Random variables and mutually exclusive events

have an idea of the relative likelihood that it will have one of a

by a number between 0 and 1: the probability.

1.1 Random variables and mutually exclusive events

1.1 Random variables and mutually exclusive events

1.1 Random variables and mutually exclusive events

Warning: may cause confusion.

1.1 Random variables and mutually exclusive events

a range is found by integrating the probability density function.

Figure 1.1. An illustation of summing the probabilities of mutually exclusive

1.1 Random variables and mutually exclusive events

probability density function P(X ) is

The mean or expected value of X is hX i.

1.1 Random variables and mutually exclusive events

1.3 Dependent random variables

product of PX (x) and PY (y )

by integrating out (or marginalising) the other variable:

1.3 Dependent random variables

written P(X = x|Y = y ) or P(x|y ) and is referred to as the

Explanation: To see how to calculate this conditional probability, we

1.4 Correlations and correlation coefficients

cov (X , Y ) = h(X hX i)(Y hY i)i = hXY i hX ihY i.

which takes values in the range 1 to 1:

1.5 Adding random variables together

1.5 Adding random variables together

1.5 Adding random variables together

1.5 Adding random variables together

If X and Y are independent, this becomes:

1.5 Adding random variables together

Mysterious (?) assertion

1.5 Adding random variables together

1.6 Transformations of a random variable

1.6 Transformations of a random variable

Given this assumption, everything else falls out automatically:

1.6 Transformations of a random variable

Given this assumption, everything else falls out automatically:

1.7 The distribution function

The probability distribution function, which we will call D(x), of a

In addition, the fundamental theorem of calculus tells us that

1.8 The characteristic function

The inverse transform gives:

(s) exp (isx)ds.

The Fourier transform of the convolution of two functions, P(x) and

1.9 Moments and cumulants

1.9 Moments and cumulants

It follows from the above expansion that:

1.9 Moments and cumulants

Note that R(0) = 1, R 0 (0) = M 0 (0) = m1 = ,

1.9 Moments and cumulants

Equating the 2 expressions, we get: hX n i =

1.9 Moments and cumulants

1.10 The multivariate Gaussian

(2n 1)!(V [X ])n

You might also like