Expectation

IV: Expectation
A modern crash course in intermediate Statistics and Probability
Paul Rognon
Barcelona School of Economics

Universitat Pompeu Fabra
Universitat Politècnica de Catalunya
1 / 19
Expectation of a random variable
If X is a random variable then the expectation of X , if it exists, is

P
 R x fX (x) for discrete variables
E(X ) := R
 x f (x)dx for continuous variables
R X
For example:
• if X ∼ Bern(p) then E(X ) = p,
• if X ∼ N(0, 1) then E(X ) = 0.
Properties of the expectation

P P
• Linearity: E( i ai Xi ) = i ai E(Xi ).
• If X , Y independent, then E(XY ) = E(X )E(Y ).
2 / 19
Higher order moments
k-th moment of X (k ≥ 1)
E(X k ) is the k-th moment, if it exists.
E(X − E(X ))k is the k-th central moment, if it exists.
Variance
The variance var(X ) is the the second central moment
• var(X ) = E[(X − E(X ))2 ] = E(X 2 ) − (E(X ))2
• var(aX + b) = a2 var(X )
ai2 var(Xi )
P P
• (Xi )i...k independent then var( i ai Xi ) = i
p
The standard deviation is defined as sd(X ) = σx = var(X )
Expectation g (X )
For any mesurable function g , if the expectation E (g (X )) exists,
Z X
E(g (X )) = g (x)fX (x)dx or E(g (X )) = g (x)fX (x)dx
R R
3 / 19
Exercises
Compute expectations of:

a. exponential variable
b. U(0, 1)
Compute the variance of:

a. U(0, 1)
b. Exp(λ)
4 / 19
Covariance
The covariance cov(X , Y ) between two random variables X and Y is
defined by:
cov(X , Y ) = E[(X − E(X ))(Y − E(Y ))] = E(XY ) − E(X )E(Y )
It is a measure of dependence:
• X⊥
⊥ Y =⇒ cov(X , Y ) = 0
• cov(X , Y ) ̸= 0 =⇒ X , Y are not independent
The correlation ρ(X , Y ) is a standardized covariance and measure of
dependence
cov(X , Y )
ρ(X , Y ) = p ∈ [−1, 1].
var(X )var(Y )
Nota Bene: If X , Y independent then cov(X , Y ) = ρ(X , Y ) = 0 but the

converse is typically not true. Exceptions: X , Y are binary or Gaussian.
5 / 19
Properties of the covariance
• var(X ) = cov(X , X ) ≥ 0
• Symmetry: cov(X , Y ) = cov(Y , X )
• cov(X + a, Y + b) = cov(X , Y )
• Bilinearity: cov(aX + bY , Z ) = a cov(X , Z ) + b cov(Y , Z )
• var(X + Y ) = var(X ) + var(Y ) + 2cov(X , Y ), and
var(X − Y ) = var(X ) + var(Y ) − 2cov(X , Y )
• morePgenerally, P
var ( i ai Xi ) = i ai2 var (Xi ) + 2
PP
i<j ai aj cov (Xi , Xj )
6 / 19
Exercise
There are two coins: one fair coin and one coin where head has
probability 2/3. We pick at random one coin and toss it twice. Let X be
the Bernoulli variable such that X = 1 if we picked the fair coin, T1 and
T2 be the results of the two tosses. We define T1 and T2 as Bernoulli
random variables that take the value 1 if the result is head, 0 if tail.
1. Compute var(T1 ) and var(T2 ).

2. Compute cov(T1 , T2 ), are T1 and T2 independent?
3. We add gains to our experiments. If we get head in the first toss, we
earn 1/2 and if get head in the second toss, we earn 1/4. Let G be
the total gain in the two tosses. Compute var(G ).
4. We repeat the complete experiment with gain 3 times. Compute the
variance of the total gain over the three runs.
7 / 19
Sample moments
Let X1 , . . . , Xn be independent copies of X (random sample).
1 Pn
Sample mean: X̄n = n i=1 Xi
1 Pn
Sample variance: Sn = n−1 i=1 (Xi − X̄ )2
The sample mean is an estimator of the population mean, and the

sample variance is an estimator of the population variance.
Expectation and variance of sample moments
If ∀i E(Xi ) = µ and var(Xi ) = σ 2 , then
σ2
• E(X̄n ) = µ, var(X̄n ) = n ,
• E(Sn ) = σ 2 .
Distribution under normality

2
If ∀i Xi ∼ N(µ, σ) then X̄n ∼ N(µ, σn ), (n − 1) σSn2 ∼ χ2n−1 .
8 / 19
Moments of a random vector
Mean vector
Let X = (X1 , . . . , Xp ) be a vector of random variables with joint
distribution fX (x).
The expectation µ = E(X ) of X is a vector whose i-th entry is
µi = E(Xi ).
The expectation is linear: for A ∈ Rm×p , E(AX ) = A E(X )
The (variance) covariance matrix
The (variance) covariance matrix var(X ) of X is defined as:
var(X ) = E (X − µ)(X − µ)T

var(X ) is a positive semidefinite p × p matrix Σ whose (i, j)-th entry is

Σij = cov(Xi , Xj )
for A ∈ Rm×p , var(AX ) = A · var(X ) · AT .
9 / 19
Exercise
In our coin tosses example, we define the random vector T as

T = (T1 , T2 )
1. Find the mean vector of T .

2. Find the covariance matrix of T .
3. Use properties of the covariance matrix to recover the variance of G .
10 / 19
Multivariate sample moments
Suppose that X ∈ R n×p is a matrix of n replications of a random vector
X = (X1 , . . . , Xp ). Denote the rows of X by x (1) , . . . , x (n) .
• The sample mean vector is
n
1 X (i) 1 T
x̄ = x = 1 X ∈ Rp .
n n n
i=1
• The sample covariance (variance) matrix is
n
1 X (i)
S = (x − x̄)(x (i) − x̄)T .
n−1
i=1
The diagonal entries are sample variances of Xi and the off-diagonal
entries are sample covariances (estimators of cov(Xi , Xj )).
If X is centred so that x̄ = 0p then this simplifies to

n
1 X (i) (i) T 1
S = x (x ) = XTX
n−1 n−1
i=1
11 / 19
Basic inequalities
Markov’s inequality
If X ≥ 0 and E(X ) < ∞, then for every t > 0
E(X )
P(X ≥ t) ≤ .
t
Chebyshev’s inequality (follows from the Markov’s inequality)

Let E(X ) = µ, var(X ) = σ 2 , then for every t > 0
σ2
P(|X − µ| ≥ t) ≤
t2

and, in particular, P X σ−µ ≥ t ≤ 1
.

t2
Jensen’s inequality
If g : R → R convex, then E(g (X )) ≥ g (E(X )).
12 / 19
Exercise
1. We say X has an exponential distribution if fX (x) = λe −λx 11R+ (x).

Give a lower bound for P(X ≥ 13 ).
2. Using the Markov’s inequality, prove the Chebyshev’s inequality.
3. Let Z ∼ N (0, 1). Give an upper bound on P(|Z | ≥ 1.64).
4. Let Z ∼ N (2, 1). Give a lower bound for E(Z 2 ).
13 / 19
Conditional expectation
Let X , Y have joint distribution fX ,Y (x, y ) and conditional fX |Y (x|y ).

Then the conditional expectation E(X |Y = y ) is the expectation of X
with respect to the conditional distribution X |Y = y .
P
 x x fX |Y (x|y ) for discrete variables
E(X |Y = y ) = .
R x f (x|y )dx for continuous variables
R X |Y
Note that when Y is not set at a fixed value y , E(X |Y ) is a function of

Y and so a random variable!
Example: two binary variables

Suppose fX ,Y (0, 0) = 0.4, fX ,Y (0, 1) = 0.2, fX ,Y (1, 0) = 0.1,
fX ,Y (1, 1) = 0.3. Find E(X |Y = 0).
14 / 19
Properties of conditional expectation
Let r be a measurable real function.

• E(r (X )|X ) = r (X )
• If X and Y are independent, E(r (X )|Y ) = E(r (X ))
• E(r (Y )X |Y ) = r (Y )E(X |Y ).
• E(a X + b Y |Z ) = a E(X |Z ) + b E(Y |Z )

• E E(X |Y ) = E(X )

• more generally, E E(r (X , Y )|Y ) = E(r (X , Y ))
15 / 19
Exercise
Recall our coin tosses example. There are two coins: one fair coin and
one coin where head has probability 2/3. We pick at random one coin
and toss it twice.
Let X be the Bernoulli variable such that X = 1 if we picked the fair
coin, T1 and T2 be the results of the two tosses. We define T1 and T2 as
Bernoulli random variables that take the value 1 if the result is head, 0 if
tail.
If we get head in the first toss, we earn 1/2 and if get head in the second
toss, we earn 1/4. Let G be the total gain in the two tosses.
1. Find the distribution of E(T1 |X )

2. Find the distribution of E(X |T1 )
3. Verify that E(E(X |T1 )) = E(X )
4. Find E(G |X = 0)
16 / 19
Moment generating function
The moment generating function when it exists is defined as:
P
 i e txi P (X = xi ) for a discrete variable
t·X
MX (t) = E(e ) = R
 e t·x f (x)dx for a continuous variable
R X
We have:
dMX
dt (0) = E(X ), and
more generally, M (k) (0) = E[X k ].
If X = (X1 , . . . , Xp ) is a random vector we define

Z
MX (t) = E(exp(⟨X , t⟩)) = exp(⟨X , t⟩)fX (x)dx.
Rp
Derivatives evaluated at t = 0n give the corresponding moments.

NB: For a continuous variable, MX is the Laplace transform of the density.
17 / 19
Characteristic function
The characteristic function is defined for all ω as:
P iωx
iωX e k P (X = xk ) , if X is discrete
ϕX (ω) = E(e ) = R ∞k iωx
−∞ e fX (x)dx, if X is continuous
Properties
P
• if X1 , . . . , Xn are independent random variables and S = i Xi ,
then:
n
Y
ϕS (ω) = ϕXk (ω)
k=1
• ϕX specifies uniquely the probability law of X . Two random

variables have the same characteristic function if and only if they
have the same distribution function.
NB: For a continuous variable, ϕX is the Fourier transform of the density.

18 / 19
Exercise
1. Find the moment generating function and the characteristic function

of a Bernoulli random variable with parameter p.
2. Find the moment generating function and the characteristic function
of a random variable with exponential distribution with parameter λ.
3. Use the moment generation function to recover the mean of a
Bernoulli random variable with parameter p and the mean of a
random variable with exponential distribution with parameter λ ( λ1 ).
19 / 19

Expectation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Expectation

Uploaded by

Copyright:

Available Formats

IV: Expectation

A modern crash course in intermediate Statistics and Probability

Barcelona School of Economics

If X is a random variable then the expectation of X , if it exists, is

Properties of the expectation

Compute expectations of:

Compute the variance of:

cov(X , Y ) = E[(X − E(X ))(Y − E(Y ))] = E(XY ) − E(X )E(Y )

Nota Bene: If X , Y independent then cov(X , Y ) = ρ(X , Y ) = 0 but the

1. Compute var(T1 ) and var(T2 ).

The sample mean is an estimator of the population mean, and the

Distribution under normality

var(X ) = E (X − µ)(X − µ)T

var(X ) is a positive semidefinite p × p matrix Σ whose (i, j)-th entry is

for A ∈ Rm×p , var(AX ) = A · var(X ) · AT .

In our coin tosses example, we define the random vector T as

1. Find the mean vector of T .

If X is centred so that x̄ = 0p then this simplifies to

Chebyshev’s inequality (follows from the Markov’s inequality)

1. We say X has an exponential distribution if fX (x) = λe −λx 11R+ (x).

Let X , Y have joint distribution fX ,Y (x, y ) and conditional fX |Y (x|y ).

Note that when Y is not set at a fixed value y , E(X |Y ) is a function of

Example: two binary variables

Let r be a measurable real function.

1. Find the distribution of E(T1 |X )

If X = (X1 , . . . , Xp ) is a random vector we define

Derivatives evaluated at t = 0n give the corresponding moments.

• ϕX specifies uniquely the probability law of X . Two random

NB: For a continuous variable, ϕX is the Fourier transform of the density.

1. Find the moment generating function and the characteristic function

You might also like