Elements of Probability Theory

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Short Guides to Microeconometrics Kurt Schmidheiny Elements of Probability Theory 2

Fall 2022 Unversität Basel

1 Random Variables and Distributions

A random variable is a variable whose values are determined by a prob-


Elements of Probability Theory ability distribution. This is a casual way of defining random variables
which is sufficient for our level of analysis. For more advanced probability
theory, a random variable will be defined as a real-valued function over
some probability space.
Contents In section 1 to 3, a random variable is denoted by capital letters, e.g.
X, whereas its realizations are denoted by small letters, e.g. x.
1 Random Variables and Distributions 2
1.1 Univariate Random Variables and Distributions . . . . . . 2
1.1 Univariate Random Variables and Distributions
1.2 Bivariate Random Variables and Distributions . . . . . . 3
A univariate discrete random variable is a variable that takes a countable
2 Moments 5 number K of real numbers with certain probabilities. The probability
2.1 Expected Value or Mean . . . . . . . . . . . . . . . . . . . 5 that the random variable X takes the value xk among the K possible
2.2 Variance and Standard Deviation . . . . . . . . . . . . . . 6 realizations is given by the probability distribution
2.3 Higher order Moments . . . . . . . . . . . . . . . . . . . . 7
2.4 Covariance and Correlation . . . . . . . . . . . . . . . . . 7 P (X = xk ) = P (xk ) = pk
2.5 Conditional Expectation and Variance . . . . . . . . . . . 8
with k = 1, 2, ..., K. K may be ∞ in some cases. This can also be written
3 Random Vectors and Random Matrices 9 as 

 p1 if X = x1

4 Important Distributions 10  p2

if X = x2
P (xk ) = ..
4.1 Univariate Normal Distribution . . . . . . . . . . . . . . . 10 

 .
4.2 Bivariate Normal Distribution . . . . . . . . . . . . . . . . 10 

pK if X = xK
4.3 Multivariate Normal Distribution . . . . . . . . . . . . . . 11
Note that
K
X
pk = 1.
k=1

A univariate continuous random variable is a variable that takes a


continuum of values in the real line. The distribution of a continuous ran-
dom variable X can be characterized by a density function or probability

Version: 28-9-2022, 08:53


3 Short Guides to Microeconometrics Elements of Probability Theory 4

density function (pdf ) f (x). The nonnegative function f (x) is such that or joint probability density function, f (x, y). The nonnegative function
Z x2 f (x, y) is such that
P (x1 ≤ X ≤ x2 ) = f (x)dx. Z x2 Z y2
x1
P (x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ) = f (x, y)dydx
x1 y1
defines the probability that X takes a value in the interval [x1 , x2 ]. Note
that there is no chance that X takes exactly the value x, P (X = x) = 0. defines the probability that X and Y take values in the interval [x1 , x2 ]
The probability that X takes any value on the real line is and [y1 , y2 ], respectively. Note that
Z ∞ Z ∞Z ∞
f (x)dx = 1. f (x, y)dydx = 1.
−∞ −∞ −∞

The distribution of a univariate random variable X is alternatively The marginal density function or marginal probability density function
described by the cumulative distribution function (cdf ) is given by Z ∞
f (x) = f (x, y)dy
F (x) = P (X < x). −∞

such that
The cdf of a discrete random variable X is Z x2
X X P (x1 ≤ X ≤ x2 ) = P (x1 ≤ X ≤ x2 , −∞ ≤ Y ≤ ∞) = f (x)dx.
F (x) = P (X = xk ) = pk . x1
xk ≤x xk ≤x
The conditional density function or conditional probability density func-
and of a continuous random variable X tion with respect to the event {Y = y} is given by
Z x
F (x) = f (t)dt f (x, y)
f (y|x) =
−∞ f (x)
F (x) has the following properties: provided that f (x) > 0. Note that
Z ∞
• F (x) is monotonically nondecreasing
f (y|x)dy = 1.
−∞
• F (−∞) = 0 and F (∞) = 1.
Two random variables X and Y are called independent, if and only if
• F (x) is continuous to the left
f (x, y) = f (x) · f (y)
1.2 Bivariate Random Variables and Distributions
If X and Y are independent, then:
A bivariate continuous random variable is a variable that takes a contin-
• f (y|x) = f (y)
uum of values in the plane. The distribution of a bivariate continuous
random variable (X, Y ) can be characterized by a joint density function • P (x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ) = P (x1 ≤ X ≤ x2 ) · P (y1 ≤ Y ≤ y2 )
5 Short Guides to Microeconometrics Elements of Probability Theory 6

More generally, if a finite set of n continuous random variables X1 , X2 , X3 , ..., Xn The following rules hold in general, i.e. for discrete, continuous and
are mutually independent, then mixed types of random variables:

f (x1 , x2 , x3 , ..., xn ) = f (x1 ) · f (x2 ) · f (x3 ) · ... · f (xn ). • E[α] = α

• E[αX + βY ] = αE[X] + βE[Y ]


2 Moments
Pn Pn
• E [ i=1 Xi ] = i=1 E[Xi ]
2.1 Expected Value or Mean
• E[X Y ] = E[X] E[Y ] if X and Y are independent
The expected value or mean of a discrete random variable with probability
distribution P (xk ) and k = 1, 2, ..., K is defined as where α ∈ R and β ∈ R are constants.
K
X
E[X] = xk P (xk ) 2.2 Variance and Standard Deviation
k=1
The variance of a univariate random variable X is defined as
if the series converges absolutely.
V[X] = E (X − E[X])2 = E[X 2 ] − (E[X])2
 
The expected value or mean of a continuous univariate random variable
with density function f (x) is defined as
Z ∞ The variance has the following properties:
E[X] = xf (x)dx
−∞ • V[X] ≥ 0
if the integral exists. • V[X] = 0 if and only if X = E[X]
For a random variable Z which is a continuous function φ of a discrete
random variable X, we have: The following rules hold in general, i.e. for discrete, continuous and mixed
K types of random variables:
X
E[Z] = E[φ(X)] = φ(xk )P (xk )
• V[αX + β] = α2 V[X]
k=1

• V[X + Y ] = V[X] + V[Y ] + 2Cov[X, Y ]


For a random variable Z which is a continuous function φ of the con-
tinuous random variables X and Y , we have: • V[X − Y ] = V[X] + V[Y ] − 2Cov[X, Y ]
Z ∞ Pn  Pn
E[Z] = E[φ(X)] = φ(x)f (x)dx • V i=1 Xi = i=1 V[Xi ] if Xi and Xj independent for all i 6= j
−∞
Z ∞Z ∞ where α ∈ R and β ∈ R are constants.
E[Z] = E[φ(X, Y )] = φ(x, y)f (x, y)dx dy
−∞ −∞
Instead of the variance, one often considers the standard deviation

σX = VX.
7 Short Guides to Microeconometrics Elements of Probability Theory 8

2.3 Higher order Moments We say that

The j-th moment around zero is defined as • X and Y are uncorrelated if ρ = 0


E (X − E[X])j .
 
• X and Y are positively correlated if ρ > 0

2.4 Covariance and Correlation • X and Y are negatively correlated if ρ < 0

The Covariance between two random variables X and Y is defined as:


2.5 Conditional Expectation and Variance
 
Cov[X, Y ] = E (X − E[X])(Y − E[Y ])
Let (X, Y ) be a bivariate discrete random variable and P (yk |X) the con-
= E[XY ] − E[X]E[Y ] ditional probability of Y = yk given X. Then the conditional expected
   
= E (X − E[X])Y = E X(Y − E[Y ]) value or conditional mean of Y given X is
The following rules hold in general, i.e. for discrete, continuous and K
X
mixed types of random variables: E[Y |X] = EY |X [Y ] = yk P (yk |X).
k=1
• Cov[αX + γ, βY + µ] = αβCov[X, Y ]
Let (X, Y ) be a bivariate continuous random variable and f (y|x) the
• Cov[X1 + X2 , Y1 + Y2 ] conditional density of Y given X. Then the conditional expected value or
= Cov[X1 , Y1 ] + Cov[X1 , Y2 ] + Cov[X2 , Y1 ] + Cov[X2 , Y2 ] conditional mean of Y given X is
• Cov[X, Y ] = 0 if X and Y are independent Z ∞
E[Y |X] = EY |X [Y ] = yf (y|X)dy.
where α ∈ R, β ∈ R, γ ∈ R and µ ∈ R are constants. −∞

The correlation coefficient between two random variables X and Y is The law of iterated means or law of iterated expecations holds in gen-
defined as: eral, i.e. for discrete, continuous or mixed random variables:
Cov[X, Y ]
ρX,Y =
σX σY  
EX E[Y |X] = E[Y ].
where σX and σY denote the corresponding standard deviations. The
correlation coefficient has the following property: The conditional variance of Y given X is given by
• −1 ≤ ρX,Y ≤ 1 2
V[Y |X] = E (Y − E[Y |X])2 X = E Y 2 X − (E[Y |X]) .
   
The following rule holds:
• ραX+γ,βY +µ = ρX,Y The law of total variance is

• ρX,Y = 0
   
if X and Y are independent V[Y ] = EX V [Y |X] + VX E[Y |X] .
where α ∈ R and β ∈ R are constants.
9 Short Guides to Microeconometrics Elements of Probability Theory 10

3 Random Vectors and Random Matrices The following rules hold:

In this section we denote matrices (random or non-random) by bold cap- • V[a0 x] = a0 V[x] a
ital letters, e.g. X and vectors by small letters, e.g. x.
• V[Ax] = A V[x] A0
Let x = (x1 , . . . , xn )0 be a (n × 1)-dimensional vector such that each
element xi is a random variable. Let X be a (n × k)-dimensional matrix where the (m × n) dimensional matrix A with m ≤ n has full row rank.
such that each element xij is a random variable. Let a = (a1 , . . . , an )0 If the variance-Covariance matrix V[x] is positive definite (p.d.) then
be a n × 1-dimensional vector of constants and A a (m × n) matrix of all random elements and all linear combinations of its random elements
constants. have strictly positive variance:
The expectation of a random vector, E[x), and of a random matrix,
V [a0 x] = a0 V [x] a > 0 for all a 6= 0.
E[X), summarize the expected values of its elements, respectively:
   
E[x1 ] E[x11 ] E[x12 ] . . . E[x1k ] 4 Important Distributions
 E[x2 ]   E[x21 ] E[x22 ] . . . E[x2k ] 
   
E[x] =  .  and E[X] =  .
   .. .. ..  .
 ..   .. . . .  4.1 Univariate Normal Distribution
E[xn ] E[xn1 ] E[xn2 ] . . . E[xnk ] The density of the univariate normal distribution is given by:
The following rules hold: 1 1 x−µ 2
f (x) = √ e− 2 ( σ ) .
σ 2π
• E[a0 x] = a0 E[x]
The normal distribution is characterized by the two parameters µ and
• E[Ax] = AE[x]
σ. The mean of the normal distribution is E[X] = µ and the variance
• E[AX] = AE[X] V[X] = σ 2 . We write X ∼ N(µ, σ 2 ).
The univariate normal distribution with mean µ = 0 and variance
• E[tr(X)] = tr(E[X]) for X a quadratic matrix 2
σ = 1 is called the standard normal distribution N(0, 1).
The variance-Covariance matrix of a random vector, V(x), summarizes
all variances and Covariances of its elements: 4.2 Bivariate Normal Distribution
0 0
V[x] = E (x − E[x]) (x − E[x]) = E[xx0 ] − (E[x])([E[x]) The density of the bivariate normal distribution is

 
V[x1 ] Cov[x1 , x2 ] . . . Cov[x1 , xn ] 1
f (x, y) = p
 Cov[x2 , x1 ] V[x2 ] . . . Cov[x2 , xn ] 2πσX σY 1 − ρ2
 
= .. .. .. .. . ( " 2  2 #)
.
 
x − µX y − µY x − µX y − µY

. . . 1
exp − − 2ρ
 
+ .
Cov[xn , x1 ] Cov[xn , x2 ] . . . V[xn ] 2(1 − ρ2 ) σX σY σX σY
11 Short Guides to Microeconometrics Elements of Probability Theory 12

If (X, Y ) follows a bivariate normal distribution, then: A n-dimensional random variable x is multivariate normally distributed
with mean µ and variance-Covariance matrix Σ, x ∼ N(µ, Σ) if its density
• The marginal densities f (x) and f (y) are univariate normal.
is:
• The conditional densities f (x|y) and f (y|x) are univariate normal.
 
−n/2 −1/2 1 0 −1
f (x) = (2π) (det Σ) exp − (x − µ) Σ (x − µ) .
2
• E[X] = µX , V[X] = σX
2
, E[Y ] = µY , V[Y ] = σY2 .
Let x ∼ N(µ, Σ) and A a (m × n) matrix with m ≤ n and m linearly
• The correlation coefficient between X and Y is ρX,Y = ρ.
independent rows then we have
• E[Y |X] = µY + ρ σσX
Y
(X − µX ) and V[Y |X] = σY2 2
(1 − ρ ).
Ax ∼ N(Aµ, AΣA0 ).
The above properties characterize the normal distribution. It is the only
distribution with all these properties. References
Further important properties:
Amemiya, Takeshi (1994), Introduction to Statistics and Econometrics,
• If (X, Y ) follows a bivariate normal distribution, then aX + bY is Cambridge: Harvard University Press.
also normally distributed: Hayashi, Fumio (2000), Econometrics, Princeton: Princeton University
2 Press. Appendix A.
aX + bY ∼ N(µX + µY , σX + σY2 + 2ρσX σY ).
Stock, James H. and Mark W. Watson (2020), Introduction to Economet-
The reverse implication is not true. rics, 4th Global ed., Pearson. Chapter 2.

• If X and Y are bivariate normally distributed with Cov[X, Y ] = 0,


then X and Y are independent.

4.3 Multivariate Normal Distribution

Let x = (x1 , . . . , xn )0 be a n-dimensional vector such that each element


xi is a random variable. In addition let E[x] = µ = (µ1 , . . . , µn ) and
V[x] = Σ with
 
σ11 σ12 ... σ1n
 σ21 σ22 ... σ2n 
 
Σ=
 .. .. .. .. 
.

 . . . 
σn1 σn2 ... σnn
where σij = Cov[xi , xj ].

You might also like