Professional Documents
Culture Documents
Introduction To Probability (Lectures 1-2)
Introduction To Probability (Lectures 1-2)
Introduction To Probability (Lectures 1-2)
(Lectures 1-2)
1
1 Probability Spaces and Random Variables
The goal of Lecture 1 and Lecture 2 is twofold. First, we will present the formal definitions of a
probability space and a random variable. Then, we will introduce some more concrete concepts that
will be used throughout the course (such as cumulative distribution function and probability density
function).
Overview of Lectures 1-2: The triplet:
(Ω, F, P),
3. P: The probability measure. We will think about it as a [0, 1]-valued function summarizing
the likelihood of an event.
X : Ω → R,
which will satisfy a restriction that we will call “measurability”. We will focus on how the probability
measure on (Ω, F) induces a probability over subsets of R.
Once we are done with abstract definitions, we will introduce objects that we will use quite
often during this course. In the context of a real-valued random variable we will define
a) The cumulative distribution function (c.d.f.): which we will connect to the “induced” probability
measure of a random variable,
b) “Discrete” and “Absolutely Continuous” random variables: this classification will emerge from
differences in the c.d.f. of random variables.
c) The probability density function (p.d.f.): a convenient way of summarizing the information in
the c.d.f. of an absolutely continuous random variable,
d) The expectation operator EP [·] (and then the variance of a real-valued random variable).
2
1.1 Measurable spaces, Probability Measures, and Probability Spaces
1.1.1 σ-algebras and Measurable Spaces
We will start with the definition of a “measurable space”. In order to model randomness, we need
to have some structure that allows us to specify what the randomness is all about.
Let Ω be any given nonempty set and let F be a nonempty collection of subsets satisfying the
following properties:
Definition (Measurable Space). (Billingsley (1995), p. 161) The pair (Ω, F), where F is a
σ-algebra of subsets of Ω is called a measurable space. The subsets of F are called the events of Ω.
If you are interested in playing with these definitions, here is a simple exercise.
ii) Let σ(A) denote the intersection over all the σ-algebras contained in F ∗ (A). Show that σ(A)
is a σ-algebra. The collection σ(A) will be the called the “σ-algebra generated by A”. Such
collection is unique.
Practice Problem 2 (Optional). Show that σ-algebra is “closed” under countable intersections.
That is, if Fn ∈ F for all n ∈ N then ∩∞
n=1 Fn ∈ F.
Once we know which are the objects that uncertainty will refer to, we need to have a formal way
of expressing the “likelihood” of the different events we have specified. In order to do this we will
define a probability measure.
3
Definition. (Probability Measure) (Billingsley (1995), p. 22) Let (Ω, F) be a measurable space.
Let P : F → [0, 1] be a “set” function mapping the σ-algebra of subsets of Ω into the real line. We
say that P is a probability measure if it satisfies the following properties:
Definition (Probability space). (Billingsley (1995), p. 23) The triplet (Ω, F, P ) is called a
probability space.
2. Boole’s Inequality: (Billingsley (1995) p. 24) Boole’s inequality is a pretty useful result
that will show up quite often. Let An be any sequence of events in F. Then
∞
X
P(∪∞
n=1 An ) ≤ P(An )
i=1
We provide a proof of Boole’s inequality for only two events A, B. This is, we show that:
P(A ∪ B) = P(A1 ∪ A2 )
= P (A1 ) + P (A2 )
4
In order for you to manipulate the definition of a probability measure, you will be asked to
write down proofs of the following statements:
Definition (S-valued random variable). Let (Ω, F), (S, S) be two measurable spaces. The map:
X :Ω→S
is called an S-valued random variable [relative to (Ω, F)-(S, S)] if for all A ∈ S :
X −1 (A) ≡ {ω ∈ Ω | X(ω) ∈ A} ∈ F,
Note that, by definition, a random variable takes events in the space S to well-defined events
in the space Ω. If there is already a well-defined probability measure P on (Ω, F), then there is a
natural “induced” probability measure in (S, S):
In this section, we will analyze the statement: “X is a real-valued random variable with a cumulative
distribution function F : R → [0, 1]”. We will explain the relation between the c.d.f. of a random
5
variable and the induced probability measure we discussed in the last subsection. We will focus on
real-valued random variables (relative to the Borel σ-algebra on the real line, which is the smallest
σ-algebra containing all open sets).
Definition. (Billingsley (1995), p. 188 and 256) The cumulative distribution function (c.d.f.) of a
real-valued random variable X : Ω → R is defined as the real-valued function F : R → [0, 1] given
by
FX (x) ≡ P{ω ∈ Ω | X(ω) ≤ x} = P{X −1 (−∞, x]}
The c.d.f. of a random variable X is nothing else than its induced probability measure evaluated
at sets of the form (−∞, x]. The c.d.f. satisfies the following properties:
1. FX is non-decreasing.
2. limx↑∞ FX (x) = 1
3. limx↓−∞ FX (x) = 0
4. limh→0+ F (x + h) = F (x)
Furthermore, if F is a function satisfying 1,2,3,4, then there is a probability space (Ω, F, P) and a
random variable X : Ω → R such that F coincides with FX .
Practice Problem 4. This week’s problem set will walk you through the proof of Proposition 1.
Here are some commonly used examples of c.d.f.s of real-valued random variables:
• Uniform Distribution: The random variable X is said to have a uniform distribution in [a, b]
if the c.d.f. is given by:
0 if x<a
F (x) = x − a/[b − a] if x ∈ [a, b)
x≥b
1 if
• Bernoulli Distribution: The random variable X is said to have a Bernoulli distribution with
parameter p ∈ [0, 1] if its c.d.f. is given by:
6
0 if x<0
F (x) = 1 − p if x ∈ [0, 1)
1 if x≥1
(
0 if x<1
F (x) =
1 − (1 − p)k if x ∈ [k, k + 1) for k ∈ N
(
0 if x < 0
F (x) =
1− e−λx if x ≥ 0
• Gaussian Distribution: The random variable X is said to have a Gaussian distribution with
parameters (µ, σ 2 ) if
Z x 1 1
F (x) = √ exp − 2 (u − µ)2 du
−∞ 2πσ 2 2σ
• Pareto Distribution: The random variable X is said to have a Pareto distribution with pa-
rameters (xm , α) if
0 if x < xm
F (x) =
xm
α
1− if x ≥ xm
x
One common way to classify the random variables and their c.d.f.s is the following:
PX ({x1 , x2 . . .}c ) = 0
The smallest set {x1 , x2 , . . .} for which PX ({x1 , x2 . . .}c ) = 0 is called the support of X. The
map p : Supp → [0, 1] defined by p(s) ≡ PX (s) is called the probability mass function.
7
2. Absolutely Continuous Random Variables (w.r.t. Lebesgue Measure): A random
variable X is absolutely continuous if:
Z x
F (x) = f (y)dy
−∞
for some integrable, nonnegative function f that we will call the probability density func-
tion (p.d.f.) of X. If F is continuously differentiable the p.d.f. will coincide with the
derivative of F (see p. 257 in Billingsley (1995)).
A Bernoulli r.v. is an example of a discrete distribution with finite support. A Geometric r.v.
is an example of a discrete random variable with countable support. The gaussian, pareto, and
exponential distribution are both continuous and absolutely continuous random variables. Note
that not every random variable falls into one of the categories above (see, for example, the entry
for Cantor’s function in wikipedia) .
In this section we introduce the definition of expectation for only discrete and absolutely continuous
random variables. For a more detailed exposition you can see Chapters 1.4-1.6 in Durrett (2010).
8
in Weak Laws of Large numbers and Central Limit Theorems. For certain random variables
the variance need not be finite (the pareto distribution with parameter α = 2 is an example
of this fact)
3. Jensen’s Inequality: For any convex function g: EF [g(X)] ≥ g(E[X]). This is an important
result. We will not prove it, but we will be using it quite often.
4. Markov’s Inequality: For any random variable X and c > 0, cPX [|X| > c] ≤ EF [|X|]
5. k-th Moment: EF [X k ] is called the k-th uncentered moment of the random variable X,
k = 1, 2, . . .. Note that Jensen’s inequality implies that if EF [|X|k ] < ∞ for some k ∈ N then
EF [|X|j ] < ∞ for all j < k. Likewise, if EF [|X|k ] = ∞, then EF [|X|j ] = ∞ for all j > k.
7. Moment Generating Function: The random variable X is said to have a moment gener-
ating function mX (t) if
for some ǫ > 0. Important: Two random variables with the same moment generating
function have the same distribution.
The last question of this week’s problem set will ask you to go over some of these concepts.
9
Quick Summary of Lectures 1 and 2: We have provided formal definitions of a probability
space and a real-valued random variable.
We defined discrete and absolutely continuous random variables. We used our classification to
provide a definition of the expectation of a function g(X). We defined the mean of a real-valued
random variable (µ), the variance of a real-valued random variable (σ 2 ), the k-th moment of a
real-valued random variable, and at the very end of Lecture 2, the moment generating function.
The moment generating function contains—in a sense that we did not make precise—the same
information as the c.d.f. of a real-valued random variable. Here are some results to keep in mind.
• Two discrete random variables with the same moment generating function have the same
support and the same probability mass function. See Billingsley (1995), p.147, second para-
graph.
• Two positive random variables with the same moment generating function have the same
distribution. See Billingsley (1995), Theorem 22.2. Can you fill in the details of the proof?
Optional: Suppose that FX has a density f . Suppose that mX (t) = E[exp(tX)] is well defined
for every t ∈ R. The question is: how do we recover f from the moment generating function? An
sketch of the answer is the following:
√
1. Extend the image of the m.g.f. to the complex plane: Let i = −1. For t ∈ R, define:
The function φ(t) : R → C is called the characteristic function (in fact, this is always well
defined: even if the m.g.f. does not exist.).
2. Recover the p.d.f. from the characteristic function: It turns out that one can prove the
following equality (or inversion formula):
Z ∞
f (x) = exp(−it)φ(t)dt
−∞
You can found details in Section 26 (342-349) in Billingsley (1995) or Section 3.3 (106-111) in
Durrett (2010). The main take away (and we will use this result later) is that under certain condi-
tions two random variables with the same moment generating function have the same distribution
over the Borel σ-algebra in the real line. In this sense, the moment generating function provides a
characterization of the c.d.f.
10
References
Billingsley, P. (1995): Probability and Measure, John Wiley & Sons, New York, 3rd ed.
Durrett, R. (2010): Probability: Theory and Examples, Cambridge Series in Statistical and
Probabilistic Mathematics, Cambridge University Press, 4th ed.
11