Lecture 19 - Law of Large Numbers, Central Limit Theorem, Joint Distribution

Mathematics for Machine Learning
Vazgen Mikayelyan
September 15, 2020
V. Mikayelyan Math for ML September 15, 2020 1 / 15

The Weak Law of Large Numbers

Markov’s theorem
Let Xn be a sequence of random variables, each having a finite mean
(E [Xk ] = ak ) and variance. If
n n
!
1 X 1X P
lim V ar Xk = 0, then (Xk − ak ) −
→ 0.
n→∞ n2 n
k=1 k=1

Markov’s theorem
Let Xn be a sequence of random variables, each having a finite mean
(E [Xk ] = ak ) and variance. If
n n
!
1 X 1X P
lim V ar Xk = 0, then (Xk − ak ) −
→ 0.
n→∞ n2 n
k=1 k=1
Corollary
If Xn is a sequence of independent and identically distributed random
variables, each having a finite mean (E [X1 ] = a) and variance, then
n
1X P
Xk −
→ a.
n
k=1

The Strong Law of Large Numbers
Borel’s theorem
variables, each having a finite moment of order 4 and E [X1 ] = a, then
n
1X a.s.
Xk −−→ a.
n
k=1

The Strong Law of Large Numbers
Borel’s theorem
variables, each having a finite moment of order 4 and E [X1 ] = a, then
n
1X a.s.
Xk −−→ a.
n
k=1
Kolmogorov’s theorem
variables, each having a finite mean (E [X1 ] = a), then
n
1X a.s.
Xk −−→ a.
n
k=1

The Central Limit Theorem
Theorem
variables, each having a finite mean µ and variance σ 2 , then the CDFs of
n
X
Xk − nµ
k=1
√
σ n
tends to the standard normal distribution as n → ∞.

Double Integrals

Double Integrals
Let f : A → R, where A = [a, b] × [c, d]. Also let
P1 = {x0 , x1 , . . . , xn } , P2 = {y0 , y1 , . . . , yn }
are partitions of the segments [a, b] and [c, d] respectively.

Double Integrals
P1 = {x0 , x1 , . . . , xn } , P2 = {y0 , y1 , . . . , yn }
are partitions of the segments [a, b] and [c, d] respectively. Denote by λ

the maximal length of the diagonals of the rectangles obtained by doing
partitions.

Double Integrals
P1 = {x0 , x1 , . . . , xn } , P2 = {y0 , y1 , . . . , yn }
are partitions of the segments [a, b] and [c, d] respectively. Denote by λ

the maximal length of the diagonals of the rectangles obtained by doing
partitions.
Definition
The Riemann sum of a function f (x, y) over this partition of A is
n X
X n
σ= f (ui , vj ) ∆xi ∆yj ,
i=1 j=1
where (ui , vj ) is some point in the rectangle [xi−1 , xi ] × [yj−1 , yj ] and

∆xi = xi − xi−1 , ∆yi = yi − yi−1 .

Double Integrals
Definition
The double integral of a function f (x, y) in the rectangular region A is
the following limit
ZZ n X
X n
f (x, y) dxdy = lim f (ui , vj ) ∆xi ∆yj
λ→0
A i=1 j=1

Double Integrals
Definition
the following limit
ZZ n X
X n
λ→0
A i=1 j=1
To define the double integral over a bounded region A other than a

rectangle, we choose a rectangle B such that A ⊂ B and define the
function g (x, y) so that
(
f (x, y) , if (x, y) ∈ A,
g (x, y) =
0, if (x, y) ∈ B \ A.

Double Integrals
Definition
the following limit
ZZ n X
X n
λ→0
A i=1 j=1
To define the double integral over a bounded region A other than a

rectangle, we choose a rectangle B such that A ⊂ B and define the
function g (x, y) so that
(
f (x, y) , if (x, y) ∈ A,
g (x, y) =
0, if (x, y) ∈ B \ A.
ZZ ZZ
We define f (x, y) dxdy = g (x, y) dxdy.
A B
Double Integrals
Fubini’s Theorem
If A = [a, b] × [c, d] and f ∈ C (A) then
Zb
 d
Zd Zb
  
ZZ Z
f (x, y) dxdy =  f (x, y) dy  dx =  f (x, y) dx dy.
A a c c a

Double Integrals
Fubini’s Theorem
If A = [a, b] × [c, d] and f ∈ C (A) then
Zb
 d
Zd Zb
  
ZZ Z
f (x, y) dxdy =  f (x, y) dy  dx =  f (x, y) dx dy.
A a c c a
Change of Variable
Assume that the mapping x = X (u, v) , y = Y (u, v) from the domain T
in the uv-plane to the domain S in the xy-plane is bijective, the functions
X and Y are continuous and have continuous first order partial derivatives
and the Jacobian J (u, v) is never zero. Then
ZZ ZZ
f (x, y) dxdy = f (X (u, v) , Y (u, v)) |J (u, v)| dudv.
S T

Jointly Distributed Random Variables.

Definition of joint CDF of 2 RVs

Let X : Ω → R and Y : Ω → R be two random variables. The function
F(X,Y ) (x, y) = P(X ≤ x, Y ≤ y), (x, y) ∈ R2
is called joint cumulative probability distribution function (CDF) of

X and Y

Properties of Joint CDF
1. 0 ≤ F(X,Y ) (x, y) ≤ 1 for any (x, y) ∈ R2 .

1. 0 ≤ F(X,Y ) (x, y) ≤ 1 for any (x, y) ∈ R2 .
2. F(X,Y ) (x, •) and F(X,Y ) (•, y) are increasing functions.

1. 0 ≤ F(X,Y ) (x, y) ≤ 1 for any (x, y) ∈ R2 .
3. F(X,Y ) (x, •) and F(X,Y ) (•, y) are right continuous functions.

1. 0 ≤ F(X,Y ) (x, y) ≤ 1 for any (x, y) ∈ R2 .
4. F(X,Y ) (x, +∞) = FX (x), x ∈ R, where FX is the CDF of X. Similarly,
F(X,Y ) (+∞, y) = FY (y) for any y ∈ R.

1. 0 ≤ F(X,Y ) (x, y) ≤ 1 for any (x, y) ∈ R2 .
5. F(X,Y ) (+∞, +∞) = 1,

1. 0 ≤ F(X,Y ) (x, y) ≤ 1 for any (x, y) ∈ R2 .
5. F(X,Y ) (+∞, +∞) = 1,
6. F(X,Y ) (x, −∞) = F(X,Y ) (−∞, y) = 0 for any x, y ∈ R.

1. 0 ≤ F(X,Y ) (x, y) ≤ 1 for any (x, y) ∈ R2 .
5. F(X,Y ) (+∞, +∞) = 1,
Remark: The distribution functions FX and FY are sometimes referred to

as the marginal distribution functions of X and Y respectively.

1. 0 ≤ F(X,Y ) (x, y) ≤ 1 for any (x, y) ∈ R2 .
5. F(X,Y ) (+∞, +∞) = 1,
Remark: The distribution functions FX and FY are sometimes referred to

as the marginal distribution functions of X and Y respectively.
Theorem
P (a < X ≤ b, c < Y ≤ d) =
F(X,Y ) (a, c) + F(X,Y ) (b, d) − F(X,Y ) (a, d) − F(X,Y ) (b, c).

Joint PMF of 2 Discrete RVs
In the case when X and Y are both discrete random variables defined on
the same sample space, it is convenient to define the joint probability
mass function (joint PMF) of X and Y by the formula
p(X,Y ) (x, y) = P(X = x, Y = y).

p(X,Y ) (x, y) = P(X = x, Y = y).
The probability mass function of X can be obtained from the joint PMF
p(X,Y ) (x, y) by
X
pX (x) = p(X,Y ) (x, y)
y:p(X,Y ) (x,y)>0

p(X,Y ) (x, y) = P(X = x, Y = y).
p(X,Y ) (x, y) by
X
pX (x) = p(X,Y ) (x, y)
y:p(X,Y ) (x,y)>0
Similarly, X
pY (y) = p(X,Y ) (x, y)
x:p(X,Y ) (x,y)>0

p(X,Y ) (x, y) = P(X = x, Y = y).
p(X,Y ) (x, y) by
X
pX (x) = p(X,Y ) (x, y)
y:p(X,Y ) (x,y)>0
Similarly, X
pY (y) = p(X,Y ) (x, y)
x:p(X,Y ) (x,y)>0
Remark: The functions pX and pY are sometimes called the marginal

PMFs of X and Y respectively.
Example
We are rolling a fair six-sided die. Define two random variables on
Ω = {1, 2, 3, 4, 5, 6} as

0, if w = 1,

X(ω) = 1, if w is composite

2, if w is prime.

(
−5, if w ≤ 3,
Y (ω) =
10, if w ≥ 4
√
Construct the joint PMF of X and Y and compute F(X,Y ) ( 2, 7) and
P(Y > 5X).

Definition
Two random variables X and Y , defined on the same sample space, are
called jointly (absolutely) continuous if there exists a non-negative
function f , defined on the plane, such that the joint CDF of X and Y is
representable as
Zx Zy
F(X,Y ) (x, y) = f (u, v)dudv, (x, y) ∈ R2 .
−∞ −∞
The function f is called the joint probability density function (PDF) of

X and Y .

Computing Marginal PDFs from the Joint PDF
Theorem
If X and Y are jointly continuous random variables then they are also
individually continuous. Moreover, their (marginal) PDFs, fX and fY ,
can be computed from the joint PDF, f(X,Y ) , as follows:
+∞
Z
fX (u) = f (u, v)dv
−∞
Z+∞
fY (v) = f (u, v)du

−∞

Example
Assume the piecewise defined function
x2 +y 2
(
α · xy · e− 2 , if x ≥ 0, y ≥ 0
f (x, y) =
0, otherwise
is a joint PDF of some random vector (X, Y ).

Example
x2 +y 2
(
α · xy · e− 2 , if x ≥ 0, y ≥ 0
f (x, y) =
0, otherwise

a. Find the constant α;

Example
x2 +y 2
(
α · xy · e− 2 , if x ≥ 0, y ≥ 0
f (x, y) =
0, otherwise

b. Find the marginal density functions (PDFs);

Example
x2 +y 2
(
α · xy · e− 2 , if x ≥ 0, y ≥ 0
f (x, y) =
0, otherwise

c. Find the joint and the marginal distribution functions (CDFs);

Example
x2 +y 2
(
α · xy · e− 2 , if x ≥ 0, y ≥ 0
f (x, y) =
0, otherwise

d. Compute the probability P(0 < X < 1; 1 < Y < 2);

Example
x2 +y 2
(
α · xy · e− 2 , if x ≥ 0, y ≥ 0
f (x, y) =
0, otherwise

e. Compute P(0 < Y /X < 2);

Example
x2 +y 2
(
α · xy · e− 2 , if x ≥ 0, y ≥ 0
f (x, y) =
0, otherwise

f. Find the PDF of the new random variable Z = Y /X ;

Example
x2 +y 2
(
α · xy · e− 2 , if x ≥ 0, y ≥ 0
f (x, y) =
0, otherwise

f. Find the PDF of the new random variable Z = Y /X ;
g. Calculate P(X 2 + Y 2 ≤ 4).

Signal Processing

Lecture 19 - Law of Large Numbers, Central Limit Theorem, Joint Distribution

Uploaded by

Copyright:

Available Formats

You might also like

Lecture 19 - Law of Large Numbers, Central Limit Theorem, Joint Distribution

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 19 - Law of Large Numbers, Central Limit Theorem, Joint Distribution

Uploaded by

Copyright:

Available Formats

Mathematics for Machine Learning

September 15, 2020

V. Mikayelyan Math for ML September 15, 2020 1 / 15

V. Mikayelyan Math for ML September 15, 2020 2 / 15

V. Mikayelyan Math for ML September 15, 2020 2 / 15

V. Mikayelyan Math for ML September 15, 2020 2 / 15

V. Mikayelyan Math for ML September 15, 2020 3 / 15

V. Mikayelyan Math for ML September 15, 2020 3 / 15

tends to the standard normal distribution as n → ∞.

V. Mikayelyan Math for ML September 15, 2020 4 / 15

V. Mikayelyan Math for ML September 15, 2020 5 / 15

are partitions of the segments [a, b] and [c, d] respectively.

V. Mikayelyan Math for ML September 15, 2020 5 / 15

are partitions of the segments [a, b] and [c, d] respectively. Denote by λ

V. Mikayelyan Math for ML September 15, 2020 5 / 15

are partitions of the segments [a, b] and [c, d] respectively. Denote by λ

where (ui , vj ) is some point in the rectangle [xi−1 , xi ] × [yj−1 , yj ] and

V. Mikayelyan Math for ML September 15, 2020 5 / 15

V. Mikayelyan Math for ML September 15, 2020 6 / 15

To define the double integral over a bounded region A other than a

V. Mikayelyan Math for ML September 15, 2020 6 / 15

To define the double integral over a bounded region A other than a

V. Mikayelyan Math for ML September 15, 2020 7 / 15

V. Mikayelyan Math for ML September 15, 2020 7 / 15

V. Mikayelyan Math for ML September 15, 2020 8 / 15

Definition of joint CDF of 2 RVs

F(X,Y ) (x, y) = P(X ≤ x, Y ≤ y), (x, y) ∈ R2

is called joint cumulative probability distribution function (CDF) of

V. Mikayelyan Math for ML September 15, 2020 8 / 15

V. Mikayelyan Math for ML September 15, 2020 9 / 15

V. Mikayelyan Math for ML September 15, 2020 9 / 15

V. Mikayelyan Math for ML September 15, 2020 9 / 15

V. Mikayelyan Math for ML September 15, 2020 9 / 15

V. Mikayelyan Math for ML September 15, 2020 9 / 15

V. Mikayelyan Math for ML September 15, 2020 9 / 15

Remark: The distribution functions FX and FY are sometimes referred to

V. Mikayelyan Math for ML September 15, 2020 9 / 15

Remark: The distribution functions FX and FY are sometimes referred to

V. Mikayelyan Math for ML September 15, 2020 9 / 15

p(X,Y ) (x, y) = P(X = x, Y = y).

V. Mikayelyan Math for ML September 15, 2020 10 / 15

p(X,Y ) (x, y) = P(X = x, Y = y).

V. Mikayelyan Math for ML September 15, 2020 10 / 15

p(X,Y ) (x, y) = P(X = x, Y = y).

V. Mikayelyan Math for ML September 15, 2020 10 / 15

p(X,Y ) (x, y) = P(X = x, Y = y).

Remark: The functions pX and pY are sometimes called the marginal

V. Mikayelyan Math for ML September 15, 2020 11 / 15

The function f is called the joint probability density function (PDF) of

V. Mikayelyan Math for ML September 15, 2020 12 / 15

fY (v) = f (u, v)du

V. Mikayelyan Math for ML September 15, 2020 13 / 15

is a joint PDF of some random vector (X, Y ).

V. Mikayelyan Math for ML September 15, 2020 14 / 15

is a joint PDF of some random vector (X, Y ).

V. Mikayelyan Math for ML September 15, 2020 14 / 15

is a joint PDF of some random vector (X, Y ).

V. Mikayelyan Math for ML September 15, 2020 14 / 15