Professional Documents
Culture Documents
Thebook v33
Thebook v33
Thebook v33
Processes
Lecture notes: Course taught by Dr. Rakesh Nigam.
Madras School of Economics
PGDM research and analytics
Stochastic Processes
Course taught by
November 8, 2020
1
Preface
This book is a compilation of lecture notes from the Stochastic Processes course
module, part of the Madras School of Economics PGDM curriculum, taught by
Dr Rakesh Nigam. The Data Science and Financial Engineering courses at MSE are
deeply focussed on the Mathematical foundations behind core concepts like - Re-
gression Analysis, Financial time series, CAPM, Natural Language Processing.
In this edition, our focus primarily rests on - the fundamentals of Stochastic pro-
cesses. We start off by revisiting various key concepts in probability theory, set
theory and distribution functions, before moving to more invovled concepts like
- Measure theory and Limit theorems. The field of Stochastics is primarily con-
cerned with sequences of random variables and its related theorems prove to be
cornerstones of important applications in the real world - especially those involv-
ing stock returns and financial analyses. Therefore we develop these concepts
from the ground up. The aim of this book is to strengthen one’s foundations in
Probability, Statistics and Convergence theorems, which form the crucial under-
pinnings of many Statistical modeling and Machine Learning applications as well.
Note that the book starts off with some prerequisites, which we must firmly grasp
before moving on to more involved concepts. The actual chapters begin after the
prerequisites sections.
Contents
0.1 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.1.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.1.2 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
0.3.1 Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
0.3.2 Odds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
0.4 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
0.4.1 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
0.4.2 Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
0.7 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
0.8 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2
Contents 3
1.1.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.1.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.6 Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1 Introduction: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6 WLLN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4.1 Example 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.1 Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.10 Summary: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
0.1 Combinatorics
Starting with the fundamental principle of counting, we can assume that experi-
ment 1 results in any of m possible outcomes and experiment 2 results in any of n
possible outcomes. Then, if these two experiments are performed in succession,
we would observe that there are a total of mn outcomes possible. Note that the
below matrix lists out all the possible pairs of outcomes from experiment 1 and
2. The item (i, j) corresponds to the pair in which i was obtained in experiment 1
and j was obtained in experiment 2.
(1, 1) (1, 2) · · · (1, n)
.. .. .. ..
. . . . (1)
(m, 1) (m, 2) · · · (m, n)
n1 × n2 · · · × nr (2)
0.1.1 Permutations
Each such arrangement is called a permutation. Note that as a general rule, for n
objects there are n! permutations:
7
Contents 8
Things become a bit more involved when we are permuting elements in which
there are some objects that are alike. For example if we want to find different
arrangements of the word PEPPER then obviously we will have a total of 6! per-
mutation possible since there are six letters in the word. But, what if we simply
interchange the alike elements in the word ? For example, if we simply inter-
change the two middle P’s, then it wouldn’t really change our permutation. For
this reason we calculate the total number of permutations of PEPPER by adjust-
ing for the permutations among the alike elements as well. So we final number of
permutations would become:
6!
(5)
3!2!
Note that 3! refers to the number of permutations among the P’s (which are three
in number) and 2! refers to the number of permutations among the E’s. As a
general rule we can say:
n!
(6)
n1 !n2 ! · · · nr !
Where there are n1 alike elements of type 1, n2 alike elements of type 2 and so on.
0.1.2 Combinations
As a simple example, consider finding the expansion of (x + y)3 . We can use the
binomial theorem to expand this expression:
Ç å Ç å Ç å Ç å
3 3 3 3 3
(x + y)3 = x0 y 3 + x1 y 2 + x2 y + xy (9)
0 1 2 3
This set of all unions consists of all outcomes in atleast one of the Ei events. In a
similar manner, the event consisting of outcomes in all of the Ei events is given by
a continuous intersection of these sets:
inf
\
En (12)
n=1
Note the all important De-Morgans laws given by the following expressions. Also
note that the superscript c refers to complement of a set (which is nothing but a
set of elements not in the set).
n
!c n
[ \
c c
(A ∪ B) = A ∩ B → c
Ei = Eic (13)
n=1 n=1
n
!c n
\ [
c c
(A ∩ B) = A ∪ B → c
Ei = Eic (14)
n=1 n=1
Note an important point that events are nothing but sets of outcomes and hence
we can denote events as sets and perform set manipulation on them. For exam-
ple, we can denote the concept of mutually exclusive events using set notation as
follows:
E1 ∩ E2 = E1 E2 = φ (15)
The above equation basically means that if the intersection of two sets is the dis-
joint set then they are effectively, mutually exclusive sets or events. We can also
compute the probability of the union of many mutually exclusive events as fol-
lows: Ñ é
inf
[ Xinf
P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) → P = P (Ei ) (16)
n=1 n=1
Moving on, we can state the basic expansion of a union of sets that are not mu-
tually exclusive as:
E ∪ F = E + F − EF (17)
Expanding the union of three sets:
E ∪ F ∪ G = E + F + G − EF − EG − F G + EF G (18)
P (E ∪F ∪G) = P (E)+P (F )+P (G)−P (EF )−P (EG)−P (F G)+P (EF G) (19)
Contents 10
Notice hard enough and you’ll see that a pattern emerges in terms of signs in the
above summation. The combined union is the sum of all (positive sign) sets taken
one at a time, all (negative sign) sets taken two at a time and all (positive sign)
sets taken three at a time. We can generalize this to the union of n sets as follows
in terms of probability:
n
X X
P (E1 ∪ E2 · · · En ) = P (Ei ) − P (E1 E2 ) + · · · + (−1)n+1 P (E1 E2 · · · En ) (20)
i=1 i1 <i2
This means finding the probability of an event E occurring given the fact that
event F has occurred. Since F has already occurred we can say that this is now
our new sample space, instead of the entire sample space. So essentially, we want
to find the probability that E and F both occur simultaneously given that F has
already occurred. It is given by:
P (EF )
P (E|F ) = (21)
P (F )
Now note an important point. Suppose that there are two sets or events called
E and F . Now we know that when only these two sets exist in our world, then
the set E can be defined as - the union of the intersection of E with F and the
intersection of E with the complement of F .
E = EF ∪ EF c (23)
Therefore from the above expression we can say that the total probability of event
E is the weighted average of the conditional probability of E that F has occurred
and the conditional probability of E such that F c has occurred or F has not oc-
curred.
0.3.1 Bayes
We will introduce the concept of Bayes theorem with the help of a common exmaple.
Suppose that D is the event that a person has a disease and E is the event that upon
testing for the disease, the test comes out positive (Note that there can be a false
Contents 11
positive test also - if a person does not have the disease then the test comes pos-
itive). Now if we want to find the probability that - the person has the disease
given that the result if positive.
P (DE)
P (D|E) = (25)
P (E)
P (E|D)P (D)
= (26)
P (E|D)P (D) + P (E|Dc )P (Dc )
0.3.2 Odds
P (A) P (A)
c
= (27)
P (A ) 1 − P (A)
0.4 Distributions
Starting with the Bernoulli random variable, we define this random variable as
the outcome of a single trial when the outcomes are of only two types - success
and failure, encoded as 1 and 0 respectively.
p(0) = P (X = 0) = 1 − p (28)
p(1) = P (X = 1) = p (29)
Extending the same concept a little further, if supposing we have n independent
trials, each of which is associated with a probability of success of p and probability
of failure of (1−p) and if we define random variable X as the Number of successes
in n trials then what we have is essentially a binomial random variable.
Ç å
n i
p(i) = p (1 − p)n−i (30)
i
E[X] = np (31)
V AR[X] = npq = np(1 − p) (32)
n Ç å
X n k
P (X ≤ i) = p (1 − p)n−k (33)
k=0
k
Contents 12
0.4.1 Poisson
e−λ λi
p(i) = P (X = i) = (34)
i!
Note that this is the approximation of a binomial variable when n is very large
and p is small. Some general properties:
E[X] = λ (35)
V AR[X] = λ (36)
The derivation for optional reading can be presented below in a step wise manner:
n!
P (X = i) = pi (1 − p)n−i (37)
(n − i)!i!
• Now we basically let λ = np or p = λ/n and with this we can rewrite the
previous formula in terms of λ as follows :
Å ãi Å
λ n−i
ã
n! λ
P (X = i) = 1− (38)
(n − i)!i! n n
n(n − 1) · · · (n − i + 1) λi (1 − λ/n)n
= (39)
ni i! (1 − λ/n)i
λ n
Å ã
1− ≈ e−λ (40)
n
n(n − 1) · · · (n − i + 1)
≈1 (41)
ni
λ i
Å ã
1− ≈1 (42)
n
• And finally we end up with :
e−λ λi
P (X = i) = (43)
i!
Contents 13
0.4.2 Geometric
Suppose now that there are many independent trials, each having a probability
of success as p, such that these trials are performed until a success occurs. Our
random variable X primarily defines the number of trials required until the first
success is encountered.
P (X = n) = (1 − p)n−1 p (44)
Some key points:
1
E[X] = (45)
p
1−p q
V AR[X] = = (46)
p2 p2
Now suppose that we perform many independent trials, with each trial having
the same probability of success as p and we perform trials until we accumulate r
successes. Here let the primary random variable X denote the number of trials
required to accumulate r successes.
Ç å
n−1 r
P (X = n) = p (1 − p)n−r (47)
r−1
The main logic is that for us to stop conducting the trials, the rt h success has to
happen at the nt h trial and therefore we count the combinations of the r − 1 suc-
cesses that must have occurred in the last n − 1 trials. Some key points :
r
E[X] = (48)
p
r(1 − p) rq
V AR[X] = 2
= 2 (49)
p p
The cumulative distribution function F (X) for a random variable X is given by:
F (X) = P (X ≤ x) (50)
Note that for a distribution function F (X), F (b) denotes the probability that the
random variable takes on values less than or equal to b. some properties about
CDF functions are:
• F is non decreasing which essentially means that for a < b we have F (a) <
F (b).
Contents 14
Here is a quick list of some general pointers regarding expectations and variances
regarding discrete random variables.
X
E[X] = xp(x) (53)
X
E[g(X)] = g(x)p(x) (54)
0.7 Sequences
a1 - first term
a2 - second term
an - nth term
{an } (59)
{an }∞
n=1 (60)
To illustrate with an example, here is how we would write the first few terms of a
sequence:
n+1 3 4 5
{ 2 }∞ n=1 = { 2 , , , } (61)
n n=1 4 9 16
n=2 n=3 n=4
An interesting way to think about sequences is as functions that map index values
to the value that the particular sequence might take. For example consider the
same sequence as above written as a function and its values written in a tuple of
the format (n, f (n)).
n+1
f (n) = (62)
n2
values → (1, 2), (2, 3/4), (3, 4/9), (4, 5/16) (63)
We do this because in this situation we can essentially plot out the values and
obtain a graphical representation of a sequence.
15
Contents 16
2.5
f (n)
2
1.5
0 1 2 3 4 5
n
We can observe from this graph that as n increases the value of sequence terms is
going closer and closer to zero. Hence we can say that the limiting value of this
sequenec is zero:
n+1
lim an = lim =0 (64)
n→∞ n→∞ n2
limn→∞ an = L
• We can say that limn→∞ an = ∞ if for every number M > 0 there is an integer
N such that:
• We can say that limn→∞ an = −∞ if for every number M < 0 there exists a
number N such that:
• The key insight for us that for a limit to exist and have a finite value, then
all the sequence terms must get closer and closer to that finite value as n
approaches infinity.
Contents 17
• If limn→∞ an exists and is finite we say that the sequence is convergent whereas
if limn→∞ an does not exist and if infinite then we say that the sequence is di-
vergent.
• Given a sequence {an } if we have a function f (x) such that f (n) = an and
that limx→∞ f (x) = L then we can say that:
limn→∞ an = L
This theorem is particularly useful when we are trying to compute the limits of
sequence that alternate in signs, for example the modulus function. Another im-
portant theorem to state, which we shall prove using the squeeze theorem is:
Additionally we note that for this theorem to work, the limit has to be zero. Now
to prove this using the squeeze theorem:
Given a sequence {an } we have the following important definitions that explain
key concepts about the nature of the sequence.
Contents 18
• If there exists a number m such that m ≤ an for every n then we say that
the sequence is bounded below and m is called the lower bound of the se-
quence.
• If there exists a number M such that an ≤ M for every n then we say that
the sequence is bounded above and M is called the upper bound of the
sequence.
• Finally we can say that if {an } is bounded and monotonic then {an } is con-
vergent.
0.8 Series
To begin defining an infinite series we first start with a sequence {an }. Note that
a sequence is just a sequence of numbers whereas a series represents some kind
of operation on those sequence of numbers. We can define a basic series as:
s 1 = a1
s 2 = a1 + s 2
s 3 = a1 P + a2 + a3
sn = ni=1 ai
We can further note that the successive values of the series itself forms a sequence
of numbers which can be represented as {sn }∞ n=1 . This is a sequence of partial
sums. Now we can compute the limiting value of this sequence of partial sums
as: n ∞
X X
lim sn = lim ai = ai (66)
n→∞ n→∞
i=1 i=1
Note that as in the case of sequences before, if the sequence of series values has a
finite limit, then the series is said to be convergent and if the limit does not exist
then it is divergent. Now we will prove the following theorem :
P
if: an converges then: limn→∞ an = 0
• Step 1: We can write the following two partial sums for the given series:
sn−1 = Pn−1
P
i=1 ai = a1 + a2 + · · · + an−1
sn = ni=1 ai = a1 + a2 + · · · + an
Contents 19
an = sn − sn−1
{sn }∞
n=1 → limn→∞ sn = s → limn→∞ sn−1 = s
The ratio test can be applied to check for convergence of a series. Suppose we
have a series given by: X
an (67)
Then we can define:
an+1
L = lim
(68)
n→∞ an
Now the following conditions would hold:
Now to present the root test, suppose we have the series defined by:
X
an (69)
1.1.1 Example 1
Let us consider a fair coin is tossed twice. Let sample space be denoted as S and
A1 , A2 , A3 are the three events in sample space S
S = {HH, HT, T H, T T }
20
Chapter 1. Lecture 1: Aakaash N (2019DMB01) 21
P (A1 ) = 2/4
P (A2 ) = 2/4
P (A1 )P (A2 ) = 1/4
P (A1 A2 ) = P (A1 )P (A2 )
Let us check for other conditions
1.1.2 Example 2
S = {1, 2, 3, 4, 5, 6}
A1 = {1, 2, 3, 4}
A2 = {4, 5, 6}
A3 = {4, 5, 6}
Let us check for the following condition
1.4 Consequences
Let us consider x and y are discrete and have a probability mass function p(x,y)
1.E(x + y) = E(x) + E(y) X
E(x) = xp(x)
x
XX
E(x + y) = (x + y)p(x, y)
x y
XX XX
E(x + y) = xp(x, y) + yp(x, y)
x y x y
XX
E(x) = xp(x, y)
x y
XX
E(y) = yp(x, y)
x y
E(xy) = E(x)E(y)
So X
E(XY ) = xypx (x)py (y)
x,y
XX
E(XY ) = [ xypx (x)py (y)]
x y
Chapter 1. Lecture 1: Aakaash N (2019DMB01) 24
X X
E(XY ) = [ xpx (x)][ ypy (y)]
x y
we know X
E(X) = xpx (x)
x
X
E(Y ) = ypy (y)
y
Therefore
E(xy) = E(x)E(y)
The same result is applicable for continuous case
cov(x, y) = 0
Let us see an example n independent identical trials and each with probability p
of success . Let x be no of successes
j=1,2...n, then
E(x1 ) = p
var(x1 ) = pq
But Xj has n independent identical trials
E(x1 ) = np
var(x1 ) = npq
Chapter 1. Lecture 1: Aakaash N (2019DMB01) 25
Note: X X XX
cov( Xi , Yj ) = cov(Xi , Yj )
i j
Mx (t) = E(etx )
Mx (t) = E(etx )
x=∞
X
x
e = xn /n!
x=0
1.4.1 Example 1
n Ç å
X n k (n−k)
Mx (t) = etk p q
k=0
k
n
X n
Ç å
tk
Mx (t) = e (et p)k q (n−k)
k=0
k
Consider X and Y are random variables. If Mx (t) = My (t) for all t then X and Y
have the same distribution
Note:
X and Y may have same probability mass function or cumulative distribution func-
tion but X and Y are two different function which have same distribution
Mx (t) = My (t)
E(X) = E(Y )
This implies means are same
E(X 2 ) = E(Y 2 )
This implies variance will be same
= E(etx )E(ety )
As X and Y are independent random variables ,etx and ety are independent ran-
dom variables
= Mx (t)My (t)
etx and ety are independent random variables by independence of X and Y as etx
and ety are powers series of X and Y
2
Mz (t) = et /2
X = σz + µ
has normal pdf with parameters
µ = 0; σ 2 = 1
F (x) = P (σz + µ ≤ x)
F 1 (x) = P df of (σz + µ = x)
√ Z (x−µ/σ) (t2 /2)
F (X) = 1/ 2π e dt = F (x)
−∞
Chapter 1. Lecture 1: Aakaash N (2019DMB01) 28
√ 2
F 1 (x) = 1/ 2πe−1/2((x−µ)/σ) 1/σ
We want Mx (t) where X has normal pdf with paramater µ, σ
X = σz + µ
1.6 Theorem
X ≈ N (µx , σx2 )
Y ≈ N (µy , σy2 )
Then X+Y is normal N (µx + µy , σx2 + σy2 ) Moment generating function completely
determines the distributions, X and Y are indpendent random variables
which is the moment generating function of a normal random variable with mean
(µx + µy ) and variance (σx2 + σy2 )
Chapter 2
2.1 Introduction:
The Central Limit Theorem states that the sampling distribution of the sample
means approaches to a normal distribution as the sample size increases, irrespec-
tive of what is the shape of population distribution. This holds usually true for
sample sizes above 30 i.e. n≥ 30.
Let’s consider the version of CLT for independent and identically distributed (iid)
Random Variables(RV’s) X1 ,X2 ,.,Xi ,.,Xn are iid RV’s with
1 2 nσ 2
[W e know that V ar[xi ] = σ 2 ] = (σ + .. + σ 2
) =
n2 n2
nσ 2 σ2
V ar(S¯n ) = 2 =
n n
29
Chapter 2. Lecture 3: Kishan R (2019DMF06) 30
=⇒ SD(S¯n ) = √σ
n
Now normalizing the random variable S¯n to Zn i.e. by subtracting with mean
and dividing with the its standard deviation.
√ ¯
(S¯n ) − E(S¯n ) (S¯n − µ) n(Sn − µ)
Zn = ¯ = σ =
SD(Sn ) √
n
σ
√ √ ¯
n[ n(Sn − µ)] n(S¯n − µ) (nS¯n − nµ) (2.5)
Zn = √ = √ = √
nσ nσ nσ
[( ni=1 Xi ) − nµ]
P
=⇒ Zn = √
nσ
D
Zn ====⇒ Zn ∼ N (0, 1) (2.6)
n =⇒ ∞
Yn
=⇒ S¯n = (2.9)
n
Chapter 2. Lecture 3: Kishan R (2019DMF06) 31
−1 1
X1 = √ X1 = √
2 2
1− 13 2
z( w2 ) = √ = √3 = √2 = √1 , where w= w2 (Heads termed as H)
2 2 2 2
3 3
X1 (w)−p
Z1 (w) = √ , Ω = {w1 , w2 } , where w1 is T and w2 is H.
p(1−p)
Ç å
n k
P (Yn = k) = p (1 − p)(n−k) , where k = 0, 1, 2, ..., n (2.12)
k
z z z
Fxi (x)
Area = 1 fx i (x)
1 1
(b − a) 1
a 0 b x 0 1 x
(b - a) 1
x1
0 1 z
0 2
Why do we normalize Yn ?
Xi with E(Xi ) = µ < ∞ and var (Xi ) = σ 2 < ∞.Xi iid RV’s
n −nµ
Y√ Pn
Zn = nσ
, Yn = i=1
n
X
E(Yn ) = E(Xi ) = µ + µ + ... + µ = nµ
i=1
n (2.14)
X
2 2 2 2
√
V ar(Yn ) = V ar(Xi ) = σ + σ + ... + σ = nσ and SD(Yn ) = nσ
i=1
Zn is normalized version of Yn
−E(Yn )]
Zn = [YnSD(Y n)
=⇒ E(Zn ) = 0 and V ar(Zn ) = 1.
In which we normalize Yn to get finite mean and variance for Zn
n→∞ n→∞
E(Yn ) = nµ ===⇒ ∞ and V ar(Yn ) = nσ 2 ===⇒ ∞
Note: For any fixed n , the CDF of Zn Fzn (z) is obtained by scaling and shift-
ing CDF of Yn . Hence CDF of Zn and CDF pf Yn have similar shape.
Ä ä n → ∞ Y∞
Yn ∼ nµ, nσ 2 Flat Distribution
Y∞ ∼ (∞, ∞)
nµ
nµ
(large value ∞)
Figure 3.5 Distribution
σ 2 n→∞
Also consider, S¯n = Xi , E[S¯n ] =µ and var(S¯n ) = 0,
1
Pn
n i=1 n
===⇒
n→∞
but E[S¯n ] = µ ===⇒ µ
Chapter 2. Lecture 3: Kishan R (2019DMF06) 35
Distribution
concentrated
at µ - no
S∞ distribution
n → ∞ σ2
Here
Ç å
σ2 ∼ (µ, 0)
Sn ∼ µ, n
n Sn ∼ (µ, 0)
µ
µ (fixed)
Z∞ Stays stable
n → ∞ =⇒ Z∞ ∼ N (0, 1)
Zn ∼ (0, 1)
0 0
3) Widely used in Finance. The Percentage change in prices of same assets are
assigned Normal RV’s. Returns of an index which is weighted average of
many assets ; hence even of returns of individuals assets are not normal.
be done manually, continuity corrections were often used to find probabilities in-
volving discrete distributions. It’s simply a topic discussed in statistics classes to
illustrate the relationship between a binomial distribution and a normal distri-
bution and to show that it’s possible for a normal distribution to approximate a
binomial distribution by applying a continuity correction.
(Yn −nµ)
Example 3: Y ∼ Bin(n, p), n =20 and p = 12 and also w.k.t Zn = √
nσ
where
Y is a Discrete RV and Xi . Find P (8 ≤ Y ≤ 10)
√ √ √
Sol: Y = (X1 + X2 + ... + Xn ) , nσ = 20 12 = 5
P (8 ≤ Y ≤ 10) = P[ 8−nµ
√
nσ
< Y√−nµ
nσ
< 10−nµ
√
nσ
]
= P[ 8−10
√
5
< Y√−nµ
nσ
< √ ]
10−10
5
−2
P (8 ≤ Y ≤ 10) ' φ(0) − φ( √ 5
) = 0.3145 (CLT) less accurate.
Here we are making the continuity correction for the above range of probabil-
ity.
P (8 ≤ Y ≤ 10) = P (7.5 < Y < 10.5) where Y is discrete.
= P[ 7.5−10
√
5
< Y√−nµ
nσ
< 10.5−10
√
5
] 0.5
' φ( √ 5
) − φ( −2.5
√ ) = 0.4567
5
Clearly, the probability before the correction is different from after the correction.
The continuity corrected range probability is more accurate.
Continuity correction used for P (y1 ≤ Y ≤ y2 ) when Y ∼ Bin and y1 and y2 are
close together
The concept of convergence in probability is based on the intuition that two ran-
dom variables are "close to each other" if there is a high probability that their
difference is very small. Convergence in probability of Xn tells tail probability is
small. That the non-occurance of the event is very small or near to zero.
P
Xn ====⇒ X = a.
n =⇒ ∞
Xn n → ∞
a+E
Band
a around Band
a
a-E
A B
Where A in the above figure shows the tail of the RV Xn and label B shows con-
vergence in probability tells about tail probability is Small but it does not tell how
far is the tail. So this gives contribution to expectations and variance.
n→∞
P [|Xn − X| ≥ E] −−−→ 0 f or any E > 0. (2.17)
P
Example 4: Xn ∼ Exp (n). Show Xn ===⇒ 0 = X
n→∞
P d
Note: If Xn −−−→ Xn then Xn −−−→ X
n→∞ n→∞
σ2
Example 5: Let Xn = (X + Yn ), where E(Yn ) = n1 ; var(Yn ) = n
;σ > 0 (constant).
P
Show that Xn converges in probability i.e. Xn −−−→ X.
n→∞
Sol: Recall triangle inequality, which states that for any triangle, the sum of the
lengths of any two sides must be greater than or equal to the length of the remain-
ing side.
a ∈ R, b ∈ R, |a + b| ≤ |a| + |b|.
1
= |Yn − E(Yn )| + |E(Yn )| =⇒ |Yn | ≤ |Yn − E(Yn )| + n
P
Recall the Definition: Xn −−−→ X.
n→∞
P (Yn > E)
P (Yn < −E)
Yn
-E E
Figure 3.8 Graphical representation of P [|Yn | ≥ E]
=⇒ En = [|Yn | ≥ E] = [Wn ≥ E]; |Yn | ≤ Wn =⇒ Wn ≥ |Yn | ≥ E =⇒ Wn ≥ E
Chapter 2. Lecture 3: Kishan R (2019DMF06) 40
Yn dn
-E E(Y n) E 0
n =
1
Wn = |Yn − E(Yn )| +
Y n ≥ E
(if ) n1 → Yn ≥ n1
Fn = [Wn ≥ e] =⇒ (−Yn + n2 ≥ E) (if ) Yn < n1
−Yn ≥ (E − 2 ) ( =⇒ ) Yn ≤ (−E + 2 ))
n n
Yn =
Y n ≥ E
(if ) n1 → Yn ≥ 0
(−Yn ≥ E) (if ) Yn < 0
Y n
(≤) 0)
Fn = [Wn ≥ e] =⇒
(
1
Yn ≥ E (if ) Yn ≥ n
=⇒ Yn ≥ 0
1 n→∞
Yn ≤ (−E + n2 ) (if ) Yn < n
−−−→ Yn < 0
Fn = En+ ∪ Fn− but (Fn− area ) > (En− area) =⇒ En− ⊂ Fn−
σx2
Recall Chebyshev’s Inequality P [|X − µx ≥ k] ≤ k2
=⇒ P [|Xn − X ≥ E] ≤
σ2 n→∞
n(E− 1 )2
−−−→ 0
n
P
=⇒ Xn −−−→ X
n→∞
P then d
Note: If [Xn −−−→ Xn ] ==⇒ [X −−−→ X] ( Convergence in probability
n→∞ n→∞
implies Convergence in distribution) but converse is not true i.e.
d P
If [Xn −−−→ X] ; [Xn −−−→ X] (Convergence in distribution does not
n→∞ n→∞
imply Convergence in probability)
2.4 Summary
n
X1 + X2 + ...... + Xn 1X
S˜n = = Xi (3.1)
n n i=1
Then, the sample mean almost sure converges to population mean µ. i.e.
P (probability)
S˜n −−−−−−−−→ µ (3.2)
n→∞
But in this lecture we will learn that converse of the above theorem is not true.
The theorem says that : Convergence in distribution does not imply convergence
in Probability
42
Chapter 3. Lecture 5 - Shashi Ranjan Mandal (2019DMB08) 43
D(Distribution) P (P robability)
If [Xn −−−−−−−−−→ X]; [Xn −−−−−−−−→ X] (3.4)
n→∞ n→∞
The above statement says that if a convergence occur in distribution , then it does
not imply that convergence will occur in probability also.Let us try to verify the
statement using counter example .
Let X be the standard normal variable with mean zero and standard deviation as
one represented as X ∼ N(0,1) .Let us also consider Xn = −X for n = 1,2,3,4,......
Using the above definition , we can tell Xn is also standard normal with mean zero
and standard deviation as one represented as Xn ∼ N(0,1) .
Here we can see that both Xn and X has the same cummulative distribution func-
tion for all values of n i.e ∀ n .This is because we have defined Xn = −X , hence
both of them has same cummulative distribution function (cdf).
Now , since they have same distribution , so we want to check whether it has
convergence in probability or not .
P [|2X| > ] = P [|X| > ] (3.6)
2
Now removing the modulus from the equation (6) and expanding the term , we
get ,
P [|X| > ] = P [X > ] + P [X < − ] 6= 0 (3.7)
2 2 2
So now we are looking at the two tails , where we have positive probability on
both sides of the tail as depicted in the graph shown below .
Now the above equation can be written as union of the two cases which we ob-
tained after removing modulus from X .
ñ ô
Event, |X| > = (X > ) ∪ (X < − ) (3.9)
2 2 2
Proof. In order to proof the above statement , let us consider a distribution function
as shown below :
FXn (x) = P [Xn ≤ x] (3.12)
Now using the Law of total Probability , the above equation can be written into
summation of below two terms .
Now we know that joint probability is intersection of two events which is smaller.The
conditional probability is a probability which is less than one. Hence we can write
as follows :
(
(Xn − x) > , if Xn ≥ X
Event(E), [|Xn − x| > ] = (3.19)
−(Xn − x) > , if Xn < X
Event E is the union of event E1 and E2 .
Also
Fx (x − ) = P [X ≤ (x − )] (3.20)
P [X ≤ (x − )] = P [X ≤ (x − ), Xn ≤ x] + P [X ≤ (x − ), Xn > x] (3.21)
Fx (x − ) = P [X ≤ (x − )|Xn ≤ x] + P [X ≤ (x − ), Xn > x] (3.22)
Now we know that ,
Using the above , we can substitute the above value in equation 22 as follows ,
Therefore , combining equation (23) and equation (27) , we can write the equation
(22) as follows :-
Fx (x − ) ≤ P [Xn ≤ x] + P [X < Xn − ] (3.28)
Fx (x − ) ≤ FXn (x) + P [X < Xn − ] (3.29)
for > 0 ,
n→∞
P [|Xn − x| > ] −−−→ (3.40)
The above equation implies that ,
Fx (x − ) ≤ lim Fxn (x) ≤ FX (x + ), ∀ > 0 (3.41)
n→inf
Note 1. The assumption of finite variance Var(Xi )=σ 2 < ∞ is not required .
Let us consider , (
1, if ω ∈ A
Xi (ω) = (3.45)
0, if ω ∈
/A
using the above condition , we can write as ,
Now we know we can obtain random sample mean S¯n as shown in the below
equation :
∞
X1 + X2 + · + Xn 1X
S¯n = = Xi = f ractionof timesω ∈ A (3.47)
n n n=1
3.6 WLLN
Converges
Xn (ω) −−−−−−→ X(ω) (3.48)
n→∞
Thus the above two equation means that distance between Xn and X tends to zero
i.e. d(Xn ,X) → 0.
Chapter 3. Lecture 5 - Shashi Ranjan Mandal (2019DMB08) 48
Definition 3.7.1. Let r ≥ 1 be a constant number . Thus the sequence of random variable
(RVs) X1 ,X2 ,·,Xn , converges to rth or(Lr ) norm mean to random variable X.
if ,
lim d(Xn , X) = lim E[|Xn − X|r ] → 0 (3.51)
x→∞ x→∞
Now when r= 2 , the above equation gives us the mean square convergence (m.s)
as shown in the equation below :
mean square convergence
Xn −−−−−−−−−−−−−−→ X (3.52)
n→∞
Now let us explain the above definition with the help of an example . Consider
a random sample Xn which has uniform distribution with mean equal to 0 and
1
variance equal to .Now we need to prove that
n
L
r
Xn −→ X = 0∀r ≥ 1, (3.53)
L P (P robability
r
Xn −−−→ X =⇒ Xn −−−−−−−−→ X (3.61)
n→∞ n→∞
Proof. Consider for any > 0 , the tail probability given as follows :
We can write the above equation because |Xn − X|r ≥ 0 and thus |Xn − X| ≥ 0 i.e.
it is non-negative random variable . Also r > 0.
If
L n→∞
r
Xn −−−→ X =⇒ E[|Xn − X|r ] −−−→ 0 (3.66)
n→∞
Note 2. Converse of the above theorem is not true i.e. the sequence of random variable Xn
that converges in Probability but does not converge in mean.
P (P robability L
r
M athematically, Xn −−−−−−−−→ X ; Xn −−−→X (3.67)
n→∞ n→∞
Chapter 4
4.1 Recap
Random variables are those variables whose values depends on the outcomes of
a random experiment. Suppose we are having a sequence of random variables
X1 , X2 , .........., Xn converges to a random variable X. That is Xn gets closer and
closer to the X for some value of n as n increases. Suppose we want to observe the
value of the random variable X, but we cannot observe it directly. So, what we do
is that we come with some estimation technique to measure the X, and got it as
X1 . And again we estimate it and update the estimation to X2 and so on. And we
continue this process to get X1 , X2 , ..... And as we increase the n our estimation
gets better and better. So, we hope that the value Xn converges to X.
There are different ways in which a sequence can be converge. Some of these
convergence are stonger than the other and some are weaker. If a sequence is
stonger and the other sequence is weaker. Then the later sequence convergence
implies the former convergence. A sequence can converges to following types:
• Convergence in distribution
50
Chapter 4. Lecture 6 - Venkat Suman Panigrahi (2019DMF12) 51
• Convergence in probability
• Convergence in mean
• Almost sure convergence
For example, using the figure, we conclude that if a sequence of random variables
converges in probability to a random variable X, then the sequence converges in
distribution to X as well.
With this type of convergence, the next outcome in a sequence of a random exper-
iment becomes good with a given probability distribution. Convergence in distri-
bution is the weakest form of convergence. However, this form of convergence is
widely used, most often it arises from application of the central limit theorem.
• Tossing coins
Let Xn be the fraction of tails coming after tossing an unbiased coin n times. Then
let the X1 has the Bernoulli distribution with expected value of µ= 0.5 and vari-
ance σ 2 = 0.25. And the subsequent sequence of random variables X2 , X3 ....... will
also be distributed binomially.
So as we increase the number toss i.e. is n the distribution starts converting to a
normal distribution. This is explained by the Central Limit Theorem. If we in-
crease the n the mean of the sample µ will be distributed normally.
• Dice Problem
Suppose in a dice making factory, the first batch of the dice produced came out
to be biased or defects. So, now the outcome from tossing the dice will follow an
different marked uniform distribution.
But as the production process improved and the dice became less and less defec-
tives. And the outcome from throwing the dice will now follow uniform distribu-
tion more closely.
Chapter 4. Lecture 6 - Venkat Suman Panigrahi (2019DMF12) 53
As a sequence progress the probability associated with the outcomes of the ex-
periment becomes smaller and smaller.
This example should not be taken literally. Consider the following experiment.
First, pick a random person in the street. Let X be his/her height, which is ex ante
a random variable. Then ask other people to estimate this height by eye. Let Xn be
the average of the first n responses. Then (provided there is no systematic error)
by the law of large numbers, the sequence Xn will converge in probability to the
random variable X.
• Note
For the interpretation , when we say that the sequence Xn converges to X, it means
the distance between the Xn and X gets smaller and smaller. For example, if we
define the distance between Xn and X as P (|Xn − X| > ), we define it as conver-
gence in probability. Another way to represent the distance between Xn and X is,
It is shown by,
Lr
Xn −→ X (4.5)
Almost sure convergence is one of the important discoveries in the probability and
statistics. It will lead to the establishment for the strong law of large numbers.
Also called ’with probability one convergence’ (w.p.1).When we say that probability
of an event is zero (0), then the event doesnot occure at all and if one (1) then the
event occures all the time.
For example, if both the side of a coin are ’Heads’ (biased coin) Then the probabil-
ity of ’tails’ is zero and probability of ’heads’ is one. When we say P (1) then it’s a
sure event i.e. degerating case.
Almost sure means occures almost every where but there are some places that
it doesnot occure. This is pointwise convergence in the same space sS. We have
a sequence of random variables X1 , X2 , .........., Xn defines on an underlying sam-
ple space and we assume S is a finite set |S| < ∞.
And we have a function (Xn ) and (X)that matches ’S’ to real numbers.Xn : S
−→ R and X : S −→ R. Limitng random variable is also on the sample space.
si is the ith outcome. Take a randomn variable Xn on sample space Xn (si ) = xni .
This is the ith real number outcome. For i = 1, 2, .......k and n = 1, 2, .......
After a random experiment is performed for example coin is tossed one of the si
will occure (that is ’H occurs). Since it is the outcome of the experiment and then
the values of the Xn are known, i.e. xni are known.
4.3.1 Example 1
So, consider each of the outcomes H or T and determine if the sequence of real
number converges or not.
If s = H =⇒ Xn (H) = n
n+1
We see here a sequence of real numbers and as we increase the n, the squence is
converging to 1.
Hence, sequence converges to 1 as n −→ ∞, if s = H (outcome is fixed at H).
We see that the sequence doesnot converges since it oscillates between -1 and +1
as n becomes larger and larger. Let’s define an event S,
This event converges when the uoutcome is Head. (s = H).So, the probability of
the event E∞ i.e. the probability of Heads is 12 . Since it is a single toss of fair coin.
a.s
if Xn −−−→ Xif P (E∞ ) = 1. (4.10)
n→∞
4.3.2 Example 2
(
1, if 0 6 s < 21
X(s) = (4.13)
0, otherwise
n+1
2n
is a bigger interval and 1
2
is a smaller interval.
Now we have to show that
a.s
Xn −−−→ X (4.14)
n→∞
Putting different values for the Xn (s), we get the values X1 (s) = 1, X2 (s) = 34 , ....
and so on. Here the intervals are shrinking.
Consider an event E∞ = {si ∈ S; limn→∞ Xn (s)=X(S)}. These are the set of out-
comes where the limn→∞ Xn (s)=X(S).
(
1, if 0 6 s < n+1
Xn (s) = 2n
(4.15)
0, otherwise
So, we choose ’n’ in such a way that when we increase ’n’ larger and larger it be-
comes, 2s−11
. So, the condition becomes, limn→∞ Xn (s)=X(s)=0. That implies, s ∈
E∞ .
=⇒ ( 21 , 1) ⊂ E∞ .
Now we can write the event as, E∞ = [0, 21 ] ∪ ( 12 , 1). And applying probability, we
get;
1 1
P [E∞ ] = P {[0, ]} + P {( , 1)} (4.16)
2 2
From the axioms of probability, disjoint unions of event and both are uniform
measures. So, the probability of occuring of both will be half ( 21 ).
Chapter 4. Lecture 6 - Venkat Suman Panigrahi (2019DMF12) 58
= 12 + 12 . = 1.
so, P[E∞ ] = 1
So, from the case above we have shown that a sequence of random variable X1 , X2 , .........., Xn
converges almost sure to X(s) when the sample size increases. i.e.
a.s
Xn −−−→ X
n→∞
n
X1 + X2 + ...... + Xn 1X
S˜n = = Xi (4.17)
n n i=1
Then, the sample mean almost sure converges to population mean µ. i.e.
a.s
S˜n −−−→ µ (4.18)
n→∞
This is stronger form of convegence because this implies weaker form of conver-
gence. Stronger law implies the weak law.So, the expected value of mean will be
E(S˜n ) = µ ( This goes to a degenerated randomn variable µ) and the var(S˜n ) goes
to zero as n −→ ∞.
4.4.1 Example 3
So, as ’n’ tends to infite, Xn becomes smaller and smaller. This implies, limn→∞
n→∞
Xn (s) → s, since sn −−−→ 0.
Chapter 4. Lecture 6 - Venkat Suman Panigrahi (2019DMF12) 59
Now consider, s = 1, which gives Xn (1)= 2 ∀ n and X(s) = 1. Hence, limn→∞ Xn (1)
= 2 6= X(1)= 1.
a.s
It is unoform probability measure, which implies , Xn −−−→ X.
n→∞
We have proved weak law of large number for the finite variance case where,
That mean the sample mean of the random variable converges in probability as n
tends to infinity to a degenerate random variable (X) µ, where µ is the a deter-
ministic constant.
Chapter 4. Lecture 6 - Venkat Suman Panigrahi (2019DMF12) 60
E(S¯n ) = µ,
σ 2 n→∞ (4.26)
var(S¯n ) = −−−→ 0.
n
NOTE: If we remove the finite variance assumption, then lets check what will hap-
pen.
Let Xi are independent identical random variable with well defined moment gen-
erating function. So, these are defined as, MXi (0) = 1 and MX0 i (0) = µ = E(Xi ).
So, by defination,
MXi (t) = E[etXi ] (4.27)
S¯n = n1 ni=1
P
Now,
t
MS¯n (t) = E[e n [X1 + X2 + ....Xn ]] (4.31)
t t t
= E[e n X1 .e n X2 .........e n Xn ] (4.32)
Breaking down the equation. As Xi are independent, the above equation 32 will
factor out.
t t t
MS¯n (t) = E[e n X1 ].E[e n X2 ].........E[e n Xn ] (4.33)
t t t t
MS¯n (t) = MX1 ( ).MX2 ( ).MX3 ( )..............MXn ( ) (4.34)
n n n n
Chapter 4. Lecture 6 - Venkat Suman Panigrahi (2019DMF12) 61
Since all the momnet generating functions are identical and since Xi is identical
to random variable X, we have
t
MS¯n (t) = [MX ( )]2 (4.35)
n
Let see it through the Taylor series expansion. So, the taylor series of ’f ’ about x
= 0 to finite order,
t t t
MX ( ) = [MX (0) + M 0 (0) + 0( )] (4.37)
n n n
Expanding to the 1st order about t = 0. So, by defination, MX (0) = 1 and MX0 (0) =
µ.
t t
=⇒ MS¯n ( ) = [1 + ( )µ] (4.38)
n n
The moment generating funtion of S¯n goes to eµt when n → ∞ which is the mo-
ment generating funtion of the degenerated random variable µ.
So, hence the distribution of the S¯n converges weakly to the distribution of de-
generated random variable Y = µ.
Recall:
We saw convergence in distribution to a constant µ implies convergence in proba-
bility hence we have weak law of large number. So,
Chapter 4. Lecture 6 - Venkat Suman Panigrahi (2019DMF12) 62
n→∞ n→∞
[Xn −−−→ µ] =⇒ [Xn −−−→ µ] (4.42)
D P
where, µ = constant
NOTE:
Here, using the moment generating function we can prove the weak law of large
number without finite variance assumption.
One of the important reason why we are using moment generating functions is to
determine the sum of random variable. Moment generating functions are a sim-
ple way to find the moments like mean(µ) and variance(σ 2 ). Through MGF we
can represent a probability distribution with a simple one-variable function. Each
probability distribution has a unique MGF which means they are especially useful
for solving problems like finding the distribution for sums of random variables.
But, before defining moment generating function let’s define what are moments.
• Moments
The nth moment of a random variable is the expected value of its nth power.
Definition: Let X be a random variable, and n N . If the expected value,
µX (n) = E[X n ] (4.43)
exists and is finite, then X is said to possess a finite nth moment and µX (n) is called
nth moment.
For example, the first moment gives the expected value E[X]. And the second cen-
tral moment is the variance of X. Similar to mean and variance, other moments
give useful information about random variables.
One question that can be raised is, why Moment Generating Function are usefull?
So, there are two reasons behind this. First being, moment generating function
Chapter 4. Lecture 6 - Venkat Suman Panigrahi (2019DMF12) 63
of any random variable X gives us all moments of X. That is why it is called the
moment generating function.
Second, MGF can uniquely determine the distribution. So, therefore if two ran-
dom variable have the same MGF then they must have the same distribution as
well. Thus, if you find the MGF of a random variable, you have indeed determined
its distribution.
k
We conclude that the k th moment of X is the coefficient of sk ! in the Taylor series
of MX (s). Thus, if we have the Taylor series of MX (s), we can obtain all moments
of X.
Some of the properties that can be said about the moment generating functions
are:
• Property 1
Let two random variable have the same moment generating function, then the
random variables have the same distribution. Suppose, X and Y are two random
variable and have the same moment generating function MX (s), then X & Y are
distributed in the same way(same CDF etc.).
So, we can say moment generating function determines the distribution of a ran-
dom variable. And this can come handy while dealing with unknown random
variable.
Chapter 4. Lecture 6 - Venkat Suman Panigrahi (2019DMF12) 64
• Property 2
While dealing with the sum of the random variable, moment generating functions
makes it easier to handel. If there are two independent random variables, X and
Y and we want to see the moment generating function of X + Y , then we have to
multiply the separate, individual moment generating functions of X and Y .
So, if X and Y are independent and X has moment generating function MX (s)
and Y has moment generating function MY (s) , then the moment generating func-
tion of X + Y is just MX (s) MY (s) , or the product of the two moment generating
functions.
Example 4
sk
Thus, the coefficient of k!
in the Taylor series for MY (s) is 1
k+1
, so
1
E[Y k ] = (4.52)
k+1
Limit theorems are very important and extremely useful in applied Statistics. The
first limit theorem is the Law of Large Number.
Chapter 4. Lecture 6 - Venkat Suman Panigrahi (2019DMF12) 65
Limit theorems help us to deal with random variable as we take limit. The first
being Law of Large Number, which essentially states that the sample mean will
eventually approach to the population mean of a random variable as we increase
the draws to infinity.
• Definition:
Consider i.i.d. random variables X1 , X2 , .........., Xn . Let the mean of each random
variable be µ. We define the sample mean as,
X1 + X2 , .......... + Xn
X̄n = (4.53)
n
It is to note that the sample mean X̄n is random itself.
It makes sense that this sample mean will fluctuate, because the components that
make it up (the X terms) are themselves random.
Now based on the concept we have two different type of law of large number,
Chapter 4. Lecture 6 - Venkat Suman Panigrahi (2019DMF12) 66
The strong law of large number states that as n tends to ∞ the sample mean X̄n
goes to the population mean X̄ with probability 1. This is a formal way of saying
that the sample mean will definitely approach the true mean.
The strong law of large number is based on almost sure convergence.
The strong law of Large Number and Almost sure convergence are being dis-
cussed throughly in the previous section of this lecture.
a.s
X̄n −−−→ µ (4.54)
n→∞
As the sample sixe ’n’ increases/grows to infinity (∞), the probability that the
sample mean differs from the population mean by some small amount "" is equal
to zero.
In simple words, The Weak Law of Large Numbers, also known as Bernoulli’s the-
orem, states that if you have a sample of independent and identically distributed
random variables, as the sample size grows larger, the sample mean will tend to-
ward the population mean.
• Definition
Let X1 , X2 , .........., Xn be i.i.d. random variables with a finite expected value E[Xi ] =
µ < ∞. Then for any > 0,
Central limit theorem is the second limit theorem which is equally important deal-
ing with random variable and it also deals with the long-run behavior of the sam-
ple mean as n grows.
The central limit theorem states that, if we choose a sufficiently large random sam-
ple from a population mean of mean ’µ’ and variance "σ 2 ", then the random sam-
ple will be distributed normally with sample mean "µ".
Chapter 4. Lecture 6 - Venkat Suman Panigrahi (2019DMF12) 67
This is an extremely powerful result, because it holds no matter what the distri-
bution of the underlying random variables (i.e., the X 0 s) is.
D σ2
X̄n −
→ N (µ, ) (4.56)
n
D
Where −→ means ‘converges in distribution’; it’s implied here that this convergence
takes place as n , or the number of underlying random variables, grows.
• NOTE:
LLN states that the mean of a large number of i.i.d. random variable converges to
expected value.
And, CLT states that, under similar condition, the sum of large sample of random
variable has an approximate normal distribution.
X̄ − µ
Zn = (4.57)
√σ
n
The central limit theorem states tha CDF of Zn converges to the standard normal
CDF.
References 68
References
[1] https://medium.com/
[2] https://www.probabilitycourse.com/chapter7/7_0_0_intro.php
[3] https://www.probabilitycourse.com/chapter7/7_2_0_convergence_of_
random_variables.php
[4] https://www.statlect.com/fundamentals-of-probability/
moment-generating-function
[5] https://www.probabilitycourse.com/
Chapter 5
Before delving into the rather involved concepts of Stochastic Processes, it is ab-
solutely essential for us to have a firm grasp on the fundamentals of Probability
theory, Set theory, Sequences and Limit theorems. In this prelude of sorts, we will
revisit some fundamental principles regarding some of those concepts and then
move on to discussing stochastic processes.
Suppose a random experiment is conducted. Then, the set of all possible outcomes
of the experiment is called the sample space, denoted by Ω of S. Suppose our
experiment is about tossing two coins. Then the sample space could be given by:
S = {(H, H), (H, T ), (T, H), (T, T )} (5.1)
Now note that any subset E of the sample space is called an event. This is typi-
cally a set that contains various outcomes of the experiment and we say that if a
particular outcome is contained within E, then event E has occurred. For exam-
ple if we define our event to be - E is the event that heads appears on the first coin toss
- then our associated set for this event would be:
E = {(H, H), (H, T )} (5.2)
For two events E and F belonging to some sample space, we say that the union of
those events is the event that consists of outcomes that are contained in either E
69
Chapter 5. Lecture 9 - Akash Gupta (2019DMB02) 70
Now if we consider an event such that it contains all the outcomes contained
in both E and F then that event would be the intersection of the two events
and is shown as follows. Assume that E = {(H, H), (H, T ), (T, H)} and F =
{(H, T ), (T, H), (T, T )}.
Now let us consider another example of two events obtained from rolling two dies
and the associated outcome tuples denote the sum of the two die rolls. Suppose
E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} is the event that the sum of two die
rolls is 7 and let F = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} be the event that the sum of
two die rolls is 6. Look carefully and you might notice that the two events have
nothing in common. There are no outcomes that are contained in both sets and
hence we say that such an event simply could not occur. Such an event is known
as a null event and is denoted as EF = φ. In this case we say that events E and
F are mutually exclusive.
Just like we defined unions and intersections of sets in the above section, we can
define unions and intersections over n number of events. Suppose we have many
events given by (E1 , E2 , · · · ) then their union is typically given by:
∞
[
En (5.5)
n=1
In a similar manner, the intersection event of many events can be defined as fol-
lows: ∞
\
En (5.6)
n=1
Now if we want to define an event such that it contains all those outcomes in
the sample space S that are not in event E, then such an event is known as the
complement of E denoted as E c . Note that the complement of the sample space
is the null set (S c = φ). Further, for any two events E and F if all the outcomes in
E are also present in F then we say that E is a subset of F and consequently, F is
the superset of E. This is denoted as:
E⊂F (5.7)
Note that the condition of equality of two sets is given is they are both subsets of
each other. That is:
E = F ⇐⇒ E ⊂ F and F ⊂ E (5.8)
Chapter 5. Lecture 9 - Akash Gupta (2019DMB02) 71
Some of these concepts seem quite intuitive when viewed in the form of Venn
diagrams. Basic examples are presented below.
A B A B
Some of the basic algebraic rules that govern set operations are mentioned below:
Consider an experiment with sample space S. For each event E of the sample
we assume that a function P (E) is defined and satisfies the following the three
axioms:
• 0 ≤ P (E) ≤ 1
• P (S) = 1.
• For a sequence of mutually exclusive events E1 , E2 , · · · such that Ei Ej = φ
then we have: Ñ é
∞
[ X∞
P Ei = P (Ei ) (5.11)
i=1 i=1
Chapter 5. Lecture 9 - Akash Gupta (2019DMB02) 72
E1 ⊃ E2 ⊃ · · · ⊃ En ⊃ En+1 (5.13)
Now for an increasing sequence of events we can essentially define a limiting event
in the form of: ∞
[
lim En = Ei (5.14)
n→∞
i=1
Similarly we can define the limiting event for a decreasing sequence of events as:
∞
\
lim En = Ei (5.15)
n→∞
i=1
Additionally we note an important proposition that lays down the probability for
an increasing or decreasing sequence of events as:
We will first illustrate the inherent confusion when looking at computing the prob-
abilities of a uniform random variable given by X ∼ U (0, 1). We know that for
all X = x that P (X = x) = 0. But at the same it is also true that P (0 ≤ X ≤ 1) = 1.
Now this is where the confusion comes in:
X
P (0 ≤ X ≤ 1) = P (X = x) (5.17)
x∈[0,1]
With this in mind, we tend to define infinite sums as a limiting case of finite sums,
denoted as: ∞ n
X X
xi = lim xi (5.18)
n→∞
i=1 i=1
Chapter 5. Lecture 9 - Akash Gupta (2019DMB02) 73
Therefore we further state that in order to even have an infinite sum, it has to
be possible to arrange the terms in a sequence. Note now that if an infinite set
of terms can be arranged in a sequence it is called countable and otherwise it is
uncountable. Positive rational are said to be countable since we can list them in
an order as a ratio of integers.
1 2 3
, , ,··· (5.19)
2 3 4
However we must note that real numbers between 0 and 1 are not countable. Sup-
pose we try to arrange these real numbers into a sequence x1 , x2 , · · · . Also note
that choose to express these as:
∞
X
xj = dij 10−i (5.20)
i=1
Where dij ∈ (0, 1, 2, · · · , 9) is the ith digit after the decimal place, of the j th number
in the sequence. What is happeneing here is that we assume that any given x can
be written as a long sequence of decimal expansions. We randomly assign decimal
expansions by allowing any given ith digit after the decimal point to take on any
value between 0 and 9 as is embodied by the set dij . We can see this illustrated as
follows:
P∞ x1 → j−i= 1
x1 as i=1 di1 10 = 0.1 + 0.031 + · · · ≈ 0.2 · · · 74131
Therefore we can write P
Similarly, x2 as ∞ i=1 di2 10
−i
= 0.2 + 0.572 + · · · ≈ 0.1 · · · 90572
We assume that with the above sequence we can essentially list out the entire set
between 0 and 1. Therefore if now consider an indicator random variable such
that I(A) = 1 if condition A is true and I(A) = 0 if condition A is false. Then we
can define a new number by:
∞
X
y= (1 + I{dii = 1})10−i (5.21)
i=1
Now look closely at this number, it is basically saying that if the diagonal element
of the array of the xj sequences, that is dii is equal to 1 then the ith digit of y would
take on the number 2. Finally when we get the decimal expansion of y we would
have the case that - The first decimal expansion element of y would be different
from the d11 element of x1 , the second decimal expansion element of y would be
different from the d22 element of x2 and so on. With this what we have essentially
done is that proven that all the xj sequence of numbers differ in at least one digit
from the newly defined number y. Therefore we note that while y does in fact
belong to the set (0, 1) it is not equal to any of the xj sequence of numbers. Hence
we can say that the elements between 0 and 1 cannot be arranged or explicitly
listed out.
Chapter 5. Lecture 9 - Akash Gupta (2019DMB02) 74
In the earlier lecture, we saw that if we have an infinite sequence of 0 and 1 and
that our event of interest is a particular ek in that sequence turning out to be 1,
then we found that there are infinitely many such events possible. No matter how
far we choose the cut off point (n) in such a sequence, we would still end up
with infinitely many occurrences of our event of interest. Then we laid down the
condition that - for all cut off points n there exists a k ≥ n such that ek = 1 which
is given as:
∀n∈N ∃k≥n ek = 1 (5.22)
Then we translated this broad condition into events. Basically we are interested in
event Ek happens infinitely many times. Before moving further, note some basic
principles:
\∞
∀n∈N → , intersections (5.23)
n=1
∞
[
∃k≥n → , unions (5.24)
k=n
With this we can write our condition of Ek happening infinitely many times as:
∞ [
\ ∞
Ek (5.25)
n=1 k=n
After this we saw that if the series of events is divergent then, under the assump-
tion that the Ek events are disjoint, we proved that as the limit of n tends to infinity,
the tail probability (starting with cutoff point n) would tend to infinity.
∞
X
lim P (Ek ) → ∞ (5.26)
n→∞
k=n
And conversely, if the series of events is convergent then the tail of the series goes
to 0 as n tends to infinity.
∞
X
lim P (Ek ) → 0 (5.27)
n→∞
k=n
Chapter 5. Lecture 9 - Akash Gupta (2019DMB02) 75
Finally with these funamental properties laid out, we formulated the Borel Canilli
Lemma which stated:
∞
X ∞ [
\ ∞
if P (En ) < ∞ → P Ek = 0 (5.28)
n=1 n=1 k=n
For an event An that occurs infinitely often, we define Lim Sup as follows:
∞ [
\ ∞
lim sup An = lim sup An = Ak (5.29)
n→∞
n=1 k=n
For an event A that occurs all but finitely often, we define the Lim Inf as follows:
∞ \
[ ∞
lim inf An = lim inf An = Ak (5.30)
n→∞
n=1 k=n
Now let us consider Ω to be our sample space and in that we consider a sample
point or an outcome ω ∈ Ω. Then the following condition can be defined:
{Xn } = {(X1 = {0}), (X2 = {1}), (X3 = {0}), (X4 = {1}) · · · } (5.31)
From this set of subsets, we can clearly notice that the even indexed elements are
all {0}, whereas all the odd indexed elements are {1}. With these we define two
new series of event classes that separately contain the odd and even indexed ele-
ments:
{Yn } = {{0}, {0}, · · · } (5.32)
{Zn } = {{1}, {1}, · · · } (5.33)
Now consider the original series {Xn }. In order to evaluate its Lim Sup we need
to first compute the successive unions of all the subsets in the series of the form
Chapter 5. Lecture 9 - Akash Gupta (2019DMB02) 76
{0}∪{1}. Note that in every union iteration, we get the same set of the form {0, 1}.
Now define the LimSup as follows:
∞
\ ∞
[ \∞
lim sup An = An → lim sup Xn =
{0, 1} = {0, 1} (5.34)
n=1 k=n n=1
Now similarly, if we want to compute the Lim Inf of this series, then we need to
first take successive intersections of all the events, which in this case would be
iteratively found out as follows: {0} ∩ {1} = φ. φ ∩ {0} = φ. And so on. In the we
will get only the null set φ. With this the LimInf could be defined as follows:
∞
[ ∞
\ [∞
lim inf An = Ak → lim inf Xn =
[φ] = φ (5.35)
n=1 k=n n=1
We can notice from equation 34 and 35 that the LimSup and LimInf functions of
particular sequence are not equal. When this is the case we say that the limit of this
sequence does not exist. Now recall through equation 32, the sequence Yn . We will
now compute the LimSup and LimInf for this series and check if it’s limit exists,
which essentially the same condition as the LimSup and LimInf being equal.
\∞ ∞
[ ∞
\
lim sup Yn = Yk =
{0} = {0} (5.36)
n=1 k=n n=1
∞
[ ∞
\ ∞
[
lim inf Yn = Yk = {0} = {0} (5.37)
n=1 k=n n=1
Hence from the above two equations we see clearly that the limit of the series
would exist and be given as:
We will now, through an example, show that the limiting behaviour of a series
does not depend on transients, rather it depends on the long term pattern shown
by the tail of the sequence. Transients are events that occur finitely often, whereas
we are more interested in finding the events that tend to happen infinitely often.
The idea is that these finitely occurring transients do not have any effect on the
long term limiting behaviour of the series. Let us suppose a sequence given by:
{Bn } = {(B1 = {50}), (B2 = {20}), (B3 = {35}), {0}, {1}, · · ·} (5.39)
| {z } | {z }
T ransients tail pattern
Chapter 5. Lecture 9 - Akash Gupta (2019DMB02) 77
Now for each value of the cut-off, specifying the starting of a sequence, we call
union set of all events after the cut-off point as Dn . With this the LimSup can be
specified as:
∞
\ [∞
lim sup Bn = Bk = D1 ∩ D2 ∩ D3 ∩ · · · = {0, 1} (5.40)
n=1 k=n
| {z }
Dn
For all values of cut-off points, we see that the event {0, 1} tends to happen in-
finitely often since the multiple unions for all n cut off points resolve to that set.
To explicitly break down this process, we see the contents of the those Dn sets and
how they evolve.
∞
[
D1 = Bk = {50, 20, 35, −15, 0, 1}
k=1
Now we will compute the LimInf of this series. Note that the successive intersec-
tions of each set in this series would simply be the null set. Let us see the workings
of this:
∞
[ \ ∞
lim inf Bn = Bk = E1 ∪ E2 ∪ E3 · · · = φ ∪ φ · · · = φ (5.41)
n=1 k=n
| {z }
En
E1 = B1 ∩ B2 ∩ B3 ∩ · · · = φ (5.42)
We now see that the limiting supremum and infimum are not at all affected by
the the transient, finitely occurring events in the series. Hence with example, it is
shown that transients do not affect the LimSup and LimInf and that the limiting
behaviour of the sequence is defined by its tail.
Chapter 5. Lecture 9 - Akash Gupta (2019DMB02) 78
x ∈ lim sup Xn iff there exists a subsequence {Xnk } of {Xn } such that x ∈ Xnk ∀k
Similar definitions for the LimInf can also be stated. LimInf of Xn is a set that
consists of elements of X that belong to Xn for all except finitely many n. We
can state this more precisly as follows:
x ∈ lim inf Xn iff there exists some m > 0 such that x ∈ Xn for all n > m
We see that the sequence {In } is infact an increasing sequence with In ⊂ In+1 .
Why is this an increasing sequence? Because in each increasing iteration of form-
ing In we are taking fewer successive intersections among the Xn events and fewer
intersections naturally correspond to bigger sets. Hence the (n + 1)th will be big-
ger than the nth set. Now we say the the least upper bound on this sequence of
Infimums (In ) is the LimInf and is given by:
∞
" ∞ #
[ \
lim inf Xn = sup{inf{Xm |m ∈ (n, n + 1, · · · )}} = Xm (5.44)
n→∞
n=1 m=n
6.1 Recap
We have discussed about what are random/ stochastic processes. Stochastic pro-
cess is the process of some values changing randomly over time. At its simplest
form, it involves a variable changing at a random rate through time. There are var-
ious types of stochastic processes. Mainly classified into discrete time stochastic
processes and continuos time stochastic processes.
6.2 Introduction
79
Chapter 6. Lecture 12: 2019DMB09 - Sri Rajitha 80
Z ∞
1 0
Rxx (τ ) = F T −1
[Sxx (w)] = Sxx (ω)ej ωτ dω (6.4)
2π −∞
Where Rxx (τ ) is the autocorrelation function, Sxx (ω) is the spectral density F T [Rxx (τ )]
is the fourier transform and F T −1 [Sxx (w)] is inverse fourier transform.
Z ∞
1
2
E(x (t)) = Rxx (0) = Sxx (ω)dω (6.5)
2π −∞
Where E(x2 (t)) say that Mean square value of the stochastic process X(t) is the
average power of X(t).
=⇒ −dτ = dτ =⇒ Even.
∗
=⇒ Sxx (ω) = Sxx (ω) (6.9)
Where Sxx∗
(ω) is complex conjugate of Sxx (ω).
Recall: a = x + jy and a∗ = x − jy.
=⇒ If a= a∗ =⇒ x + jy = x − jy =⇒ y = −y =⇒ 2y = 0 =⇒ y=0
=⇒ a=x which is a real valued function.
R +∞
6) −∞ Rxx dτ < ∞ then Sxx (ω) is a continuous function of omega.
Z +∞
Sxx (ω) = Rxx (τ )e−jωτ dτ = F T [Rxx (τ )] (6.10)
−∞
Note: Since Power spectral density Sxx (ω) is even, non-negative real func-
tion. Same Rxx (τ ) can not be the autocorrelation function of WSSP X(t).
For Example: e−ατ , τ e−ατ of Sin(ω0 τ ) can not be autocorrelation function of WSSP
since Fourier transform of the function is complex.
Definition: For 2 Stochastic Processes X(t) and Y(t) that are jointly WSSP F T [Rxx (τ )]=
Cross power spectral density= Sxy (ω). Then
Z ∞
Sxy (ω) = F T [Rxx (τ )] = Rxy (τ )e−jωτ dτ. (6.11)
−∞
Where Sxy (ω) is a complex function even when X(t) and Y(t) are real stochastic
processes.
(t - τ ) t (t + τ )
Figure 12.1
∗
=⇒ Syx (ω) = Sxy (−ω) = Sxy (ω) (6.13)
Then, Z ∞
Sxy (ω) = Rxy (τ )e−jωτ dτ = F T [Rxy (τ )]
−∞
Z ∞ Z ∞
∗ jωτ
Sxy (ω) = Rxy (τ )e dτ = Ryx (−τ )ejωτ dτ
−∞ −∞
Z ∞ Z ∞
∗
Sxy (ω) = Ryx (−τ )e jωτ
dτ ⇐⇒ τ =−τ
dτ =−dτ − Ryx (τ1 )e−jωτ dτ1
−∞ −∞
Z ∞
∗
Sxy (ω) = Ryx (τ1 )e−jωτ dτ1 = F T [Ryx (τ ) = Syx (ω).
−∞
∗
Sxy (ω) = Syx (ω) (6.14)
SXX (ω)
S0
ω
-ω0 0 + ω0
Figure 12.2
ejx + e−jx
⇒ cos x = (6.17)
2
Subtracting equation (15) and (16)
ejx + e−jx
⇒ sin x = (6.18)
2j
R∞
Rxx (ω) = F T −1 [Sxx (ω)] = 2π
1
S ejωτ dc
−∞ xx
1
R∞ So ∞
R (6.19)
Rxx (τ ) = 2π S ejωτ dω = 2π
−∞ o −∞
ejωτ dω
So ejω0 τ −e−jω0 τ
⇒ Rxx (τ ) = So
[ejωτ ]ω−ω
0
= So
[ejω0 τ − e−jω0 τ ] = [ ]
2πjτ 0 2πjτ
So
πτ 2j
(6.20)
⇒ Rxx (τ ) = S (ω0 τ )
πτ n
White noise is a random signal having equal intensity at different frequencies, giv-
ing it a constant power spectral density. Even a binary signal which can only take
on the values 1 or 0 will be white if the sequence is statistically uncorrelated. Noise
having a continuous distribution, such as a normal distribution, can of course be
white.
In statistics and econometrics one often assumes that an observed series of data
values is the sum of a series of values generated by a deterministic linear pro-
cess, depending on certain independent /explanatory variables, and on a series
of random noise values. If there is non-zero correlation between the noise values
underlying different observations then the estimated model parameters are still
unbiased, but estimates of their uncertainties such as confidence intervals will be
biased .
In Time series analysis there are often no explanatory variables other than the past
values of the variable being modeled i.e. the dependent variable. In this case the
noise process is often modeled as a moving average process, in which the current
value of the dependent variable depends on current and past values of a sequen-
tial white noise process.
Here, N(t) = White Noise
Definition: White noise is a random function N(t) whose power spectral den-
sity Sn n(ω) is constant for all frequencies ω. N(t) is the white noise.
N0
⇒ Snn (ω) = is constant ∀ ω (6.21)
2
Where N0 is a real positive constant.
Autocorrelation of white noise =Rnn (τ ) = F T −1 [Sxx (ω)] = F T −1 ( N20 )
Chapter 6. Lecture 12: 2019DMB09 - Sri Rajitha 84
N0 N0 N0
Rnn (τ ) = F T −1 ( ) = ( )F T −1 [1] = ( )δ(τ )
2 2 2
R∞
F T [δ(τ )] = −∞
δ(τ )e−jωτ dτ = e−jω(0) = e0 = 1
F T −1 [1] = δ(τ ) ⇒ F T [δ(τ )] = 1
∞ if x = 0
δ(τ ) = (6.22)
0 if x 6= 0
N0 N0
SN N (ω) = RN N (τ ) = δ (τ )
Z Z
F T −1
N0 N0
FT
Z Z
ω τ
0 0
Power Spectral Density Auto Correlation Function
SN N (ω) RN N (τ )
Figure 12.3
Example: Let Y(t) = [X(t) +N(t)] be weakly stationery process with X(t) as the
actual speed and N(t) is the zero mean noise process with variance σN
2
and µN = 0.
Find the power spectral density of Y(T) i.e. Sxx (ω ).
Sol: Z ∞
Ryy (τ ) e−jωτ dτ
Syy (ω) = F T Ryy (τ ) =
−∞
= Ryy (τ ) = E Y (t) Y (t + τ ) andY (t) = X[X(t) + N (t)]
2
µN = 0 and V ar(N (t)) = σN (6.23)
Ryy (τ ) = E y (t) y (t + τ ) = E X (t + τ ) + N (t + τ )
Ryy (τ ) = E X (t) X (t + τ ) + X (t) N (t + τ ) + N (t) X (t + τ ) + N (t) N (t + τ )
Chapter 6. Lecture 12: 2019DMB09 - Sri Rajitha 85
Ryy (τ ) = E X (t) X (t + τ ) + N (t) N (t + τ )
2
Ryy (τ ) = Rxx (τ ) + Rnn (τ ) = Rxx (τ ) + σN δτ
Recall: Rnn (0) = E N 2 (t)
2
î ó î ó2 î ó
= V ar N (t) = E N 2 (t) − E N (t) = E N 2 (t) − µ2N
σN
î ó
2
σN = E N 2 (t) = Rnn (0)
∞ if x = 0
2
⇒ Rnn (τ ) = σN δ τ ) where δ τ )= {
0 if x 6= 0
2
Ryy (τ ) = Rxx (τ ) + σN δ (τ )
δ (τ ) = F T Rxx (τ ) + F T σN
2
2
Syy (ω) = F T Ryy (τ ) = F T Rxx (τ ) + σN δ (τ )
Since F T δ (τ ) =0
Chapter 6. Lecture 12: 2019DMB09 - Sri Rajitha 86
2
Syy (ω) = Sxx (ω) + σN
Till now we have discussed about the continuous time stochastic processes (CTSP).
Let us look at the discrete time stochastic processes in further section.
When interpreted as time, if the index set of a stochastic process has a finite or
countable number of elements, such as a finite set of numbers, the set of integers,
or the natural numbers, then the stochastic process is said to be in discrete time.
DTSP Xn = X[n] iid got by sampling a continuous time stochastic process. If the
sampling interval is Ts
n→0 1 2
t
0 1 TS 2 TS
Figure 12.4
Cxx (n1 , n2 ) = E{ X (n1 ) − µx (n1 ) X (n2 ) − µx (n2 ) }
Definition: A Discrete Time Stochastic Process (DTSP) is called white noise if the
Random variables X( nk ) are uncorrelated.
Note: If White noise is Gaussian WSSP then X(n) consist of a sequence of IID RVS
with Variance σ 2
1 if m = 0
δ (m) = {
0 if m 6= 0
(6.24)
Sxx (Ω) = DF T Rxx (m)
Where Rxx (m) is discrete autocorrelation function of X (m) and DF T Rxx (m)
e−j(Ω+2π)n = e−jΩn
Ω
−π 0 π
Figure 12.5
R∞
⇒ Autocorrelation function of X(n)= Rxx (m) = 1
2π −∞
Sxx ejΩτ dΩ
+∞
X
Sxx (Ω) = Rxx (m) e−jΩm (6.27)
m=−∞
+∞
X
Sxx (Ω) = Rxx (m) Cos (jΩm) − jSin (jΩm)
m=−∞
+∞
X +∞
X
Rxx (m) Sin (jΩm) (6.28)
Sxx (Ω) = Rxx (m) Cos (jΩm) −j
m=−∞ m=−∞
Sxx (Ω) is an even function. Cos (jΩm) is even and Sin (jΩm) is odd.
⇒ Sxx (Ω) is real.
Example: Assume X(n) is Real SP ⇒ Rxx (−m) = Rxx (m) . Find the power spec-
tral density of X(n) i.e. Sxx (Ω) .
Sol:
P−1 .
+
−jΩm
P+∞ −jΩm
Sxx (Ω) = m=−∞ Rxx (m) e m=0 Rxx (m) e
.
Sxx (Ω) = +∞ −jΩm
= −1 −jΩm
+
P P
R
m=−∞ Pxx (m) e m=−∞ Rxx (m) e
+∞ −jΩm
m=0 Rxx (m) e
Introducing a dummy k.
k=∞
X +∞
X
Sxx (Ω) = Rxx (−k) ejΩm + Rxx (0) + Rxx (k) e−jΩm
k=1 k=0
Chapter 6. Lecture 12: 2019DMB09 - Sri Rajitha 89
k=∞
X î ó
Sxx (Ω) = Rxx (k) ejΩm + e−jΩm + Rxx (0)
k=1
Z π
1
Rxx (m) = Sxx (Ω) ejΩm dΩ
2π −π
Z π
1
Rxx (−m) = Sxx (Ω) e−jΩm dΩ
2π −π
Let α = - Ω
Z π
1
Rxx (−m) = Sxx (−α) e−j(−α)m [−dα]
2π −π
1
Rπ
⇒ Rxx (−m) = − 2π −π
Sxx (−α) ejαm dα
A stochastic process with property that almost all sample paths are continuous is
called a continuous process. If the index set is some interval of the real line, then
time is said to be continuous. An example of a continuous-time stochastic process
for which sample paths are not continuous is a Poisson process.
Discrete Time Stochastic Process, DTSP is a good by sampling CTSP X(t). If CTSP
X(t) is sampled at constant intervals Ts time units i.e. Ts is sampling period. Then
samples from a DTSP are defined by X(n).
Chapter 6. Lecture 12: 2019DMB09 - Sri Rajitha 90
− -2 -1 0 +1 2 − −
− -2 TS -1 TS 0 TS 1 TS 2 TS − −
TS TS TS TS
Figure 12.6
If mean is µx (t) and autocorrelation is Rxx (t) of CTSP X(t) then for DTSP X(n)
we have
Note: If X(t) is a WSSP in continuous time then X(n) is also WSSP in discrete
time with µx (n) = µx = Constant and Rxx (m) = Rxx (mTs ) .
X(t) CTSP =⇒ XT
X (ω1 , t) One Realization of CTSP
is a sample path
0 t1 t2 t3 t4 t (Continious time)
=⇒
Sampling
X(n) DTSP
X (ω1 , t) Xn , n = 0, 1, 2.....
n (Discrete time)
0 1 2 3 4
Figure 12.7
Chapter 6. Lecture 12: 2019DMB09 - Sri Rajitha 91
CTSP { X(t), t∈ T}
Joint CDF of X(t1 ) and X(t2 ) is the same as the joint distribution of X(t1 +∆ ) and
X(t2 +∆ ) i.e. time shift of ∆ doesn’t change it’s stationarity properties.
Joint CDF of X(n1 ) , X(n2 ), ...... , X(nk ) is same as joint CDF of X(n1 +∆ ) , X(n2 +∆ ),
...... , X(nk +∆ ) i.e. for all real numbers x1 , x2 , ....., xk
1) The mean function does not change due to shifts in time and is independent
of time. E[X(t1 )]= E[X(t2 )] i.e. µx (t1 ) = µx (t2 ) = Constant.
Chapter 6. Lecture 12: 2019DMB09 - Sri Rajitha 92
2) The autocorrelation function does not change by shifts in time and is inde-
pendent of time. E[X(t1 ) X(t2 )]= E[X(t1 + τ ) X(t2 + τ )].
1) µx (t) = µx (x) ∀ t ∈ R
1) µx (n) = µx ∀ n ∈ Z
î ó
Rxx (0) = E X 2 (t)
î ó
E X 2 (t) = Rxx (0)
Rxx (−τ ) = E X (t) X (t + τ ) = E X (t + τ ) X (t) = Rxx (τ )
X= X(t) and Y= X (t − τ )
Chapter 6. Lecture 12: 2019DMB09 - Sri Rajitha 93
»
[|E[X(t)X(t − τ )]) ≤ E(X(t)2 )E[X(t − τ )2 ] (6.31)
»
|Rxx (τ ) | = Rxx (0) Rxx (0) = Rxx (0) (6.32)
RXX (τ )
τ = 0 τ
Figure 12.8
A signal that is just a function of time and not a sample path of a stochastic pro-
cess can exhibit cyclostationary properties in the framework of the fraction-of-time
point of view. If the signal is further ergodic, all sample paths exhibits the same
time-average.
This process has a periodic structure. The statistical properties are repeated every
T0 units of time. That is the random variables X(t1 ) , X(t2 ), . . . . . . , X(tn ) have the
same joint CDF as the RVS X(t1 + Tp ), X(t2 + Tp ), . . . .., X(tn + Tp ) then the RVS
are cyclo-stationery.
Chapter 6. Lecture 12: 2019DMB09 - Sri Rajitha 94
1) µx (n + M ) = µx ∀ n ∈ Z
Note: Mean Square continuity does not mean that every possible realization of
X(t) is a continuos function.
No
Snn (ω) = ∀ω
2
ï ò
−1 −1 No No
Rnn (τ ) = F T Snn (ω) = F T = δ (τ )
2 2
∞ if τ = 0
δ (τ ) = {
0 if τ 6= 0
⇒ White Gaussian noise GN(t1 ) and GN(t2 ) are independent for any t1 6= t2
The Gaussian random variable is clearly the most commonly used and of most
importance. For continuous variables, possible values are distributed on a con-
tinuous scale and the probability density function links every possible value with
a given probability intensity which we can think of as the probability to find the
value of the variable around every possible value. A theoretical frequency distri-
bution for a random variable, characterized by a bell-shaped curve symmetrical
about its mean.
→
a = [a1 , a2 , . . . , an ]T ∈ R.
Chapter 6. Lecture 12: 2019DMB09 - Sri Rajitha 96
Jointly Gaussian random variables can be characterized by the property that every
scalar linear combination of such variables is Gaussian. An important property of
jointly normal random variables is that their joint PDF is completely determined
by their mean and covariance matrices.
m→ = E [X → ] , X → = [X1 , X2 , . . . , Xn ]T
Covariance matrix= C
C= E X → X →T with | C| = Det(C )
Note: If two jointly norma; random process X(t) and Y (t) are uncorrelated
that is Cxy (t1 , t2 ) = 0 ∀ t1 , t2 then X(t) and Y (t) are two independent SPs
Note: For Gaussian SP, Weak statinarity and strong stationarity SSSP are
equivalent
Theorem: For Gaussian SP, { X(t), t∈ T} if X(t) is WSSP then X(t) is SSSP
t1 , t2 , ....., tn ∈ Rx
Chapter 6. Lecture 12: 2019DMB09 - Sri Rajitha 97
t1 , t2 , ......, tn ∈ Ry the RVS X(t1 ) , X(t2 ), ...... , X(tn ) are jointly normal.
Proof:
Need to show ∀ t1 , t2 ,....., tn ∈ R, the variables X(t1 ) , X(t2 ), ....... , X(tk ) have the
same joint CDF as the RVS X(t1 + τ ), X(t2 + τ ), ....., X(tk + τ )
Since these RVS are jointly Gaussian, We show that the mean vector of co variance
matrices are same.
⇒ Mean vector of Covariance matrix of X(t1 ) , X(t2 ), ...... , X(tk ) is same as the
mean vector and covariance matrix of X(t1 + τ ), X(t2 + τ ), ......, X(tk + τ ).
6.10 Summary:
Properties of Power spectral density Sxx (ω) are it is a non-negative, even, real
and continuous function in ω. Since Power spectral density Sxx (ω) is even, non-
negative real function. Same Rxx (τ ) can not be the autocorrelation function of
WSSP X(t).White noise is a random function N(t) whose power spectral density
Sn n(ω) is constant for all frequencies ω. N(t) is the white noise. ⇒ Snn (ω) =
N0
2
is constant ∀ ω A Discrete Time Stochastic Process (DTSP) is called white
noise if the Random variables X( nk ) are uncorrelated.Properties of Sxx (ω) are it
is is periodic with 2π, is an even function in Ω, is real and even function. Discrete
Time Stochastic Process, DTSP is a good by sampling CTSP X(t). If CTSP X(t) is
sampled at constant intervals Ts time units i.e. Ts is sampling period. Then sam-
ples from a DTSP are defined by X(n).
If X(t) is a WSSP in continuous time then X(n) is also WSSP in discrete time with
µx (n) = µx = Constant and Rxx (m) = Rxx (mTs )
For Weak Stationarity, The mean function does not change due to shifts in time and
is independent of time. E[X(t1 )]= E[X(t2 )] i.e. µx (t1 ) = µx (t2 ) = Constant.The
autocorrelation function does not change by shifts in time and is independent of
time. E[X(t1 ) X(t2 )]= E[X(t1 + τ ) X(t2 + τ )].
Rxx (τ ) takes its maximum value at τ =0 that is X (t + τ ) and X (t) have highest
correlation at τ =0 The Cyclo Stationery process has a periodic structure. The sta-
tistical properties are repeated every T0 units of time. That is the random variables
X(t1 ) , X(t2 ), . . . . . . , X(tn ) have the same joint CDF as the RVS X(t1 + Tp ), X(t2
+ Tp ), . . . .., X(tn + Tp ) then the RVS are cyclo-stationery. For Gaussian SP, Weak
statinarity and strong stationarity SSSP are equivalent
Chapter 7
The ARMA angle, developed by Box and Jenkins (1970) using the time series anal-
ysis method, fails to consider the part played by explanatory variables based on
economic or financial theory and instead opts to use the extrapolation mecha-
nism, in the description of the time series, based on the changing law of the time
series itself. The reason for developing a time series model is that the time series
is stationary.
This model is among the high-resolution spectral analysis methods of the model
parameter method, which is used in studying the rational spectrum of the sta-
tionary stochastic processes and is suited for a large class of practical problems.
ARMA has a better and more accurate spectral estimation and resolution perfor-
mance when compared to the AR or MA model, but it has a cumbersome param-
eter estimation.
98
Chapter 7. Lecture 14: 2019DMB04 - Karnam Yogesh 99
7.1.1 Introduction
xt = ϕ xt− 1 + st . . . (7.2)
where st ∼ WN (0, σ 2 ) and we keep this assumption through this lecture. Simi-
larly, AR(p) (auto regressive of order p) can be written as:
The term white noise was originally an engineering term and there are subtle but
important differences in the way it is defined in various econometric texts.Here we
define white noise as a series of uncorrelated random variables with zero mean
and uniform variance If it is necessary to make the stronger assumptions of inde-
pendence or normality this will be made clear in the context and we will refer to
independent white noise or normal or Gaussian white noise. Be careful of various
definitions and of terms like weak strong and strict white noise
Chapter 7. Lecture 14: 2019DMB04 - Karnam Yogesh 100
Lag operators enable us to present an ARMA in a much concise way. Applying lag
operator(denoted L) once, we move the index back one time unit; and applying
it k times, we move the index back k units.
Lxt = xt− 1
L2 xt = xt− 2
..
.
Lk xt = xt− k
AR(1) : (1 − ϕ L)xt = st
AR(p) : (1 − ϕ 1 L − ϕ 2 L2 − . . . − ϕ p Lp )xt = st
MA(1) : xt = (1 + θ L)st
MA(q) : xt = (1 + θ 1 L + θ 2 L2 +. . . + θ q Lq )st
ϕ (L) = 1− ϕ 1 L − ϕ 2 L2 − . . . − ϕ p Lp . . . (7.8)
θ (L) = 1 + θ 1 L + θ 2 L2 + . . . + θ p Lq . . . (7.9)
With lag polynomials, we can rewrite an ARMA process in a more compact way:
AR : ϕ (L)xt = st
MA : xt = θ (L)st
Lag operators enable us to present an ARMA in a much concise way. Applying lag
operator (denoted L) once, we move the index back one time unit; and applying
it k times, we move the index back k units. Given a time series probability model,
usually we can find multiple ways to represent it.
Chapter 7. Lecture 14: 2019DMB04 - Karnam Yogesh 101
7.4 Invertibility
Given a time series probability model, usually we can find multiple ways to rep-
resent it. Which representation to choose depends on our problem. For example,
to study the impulse-response functions, MA representations maybe more conve-
nient; while to estimate an ARMA model, AR representations maybe more conve-
nient as usually xt is observable while st is not. However, not all ARMA processes
can be inverted. In this section, we will consider under what conditions can we
invert an AR model to an MA model and invert an MA model to an AR model.
It turns out that invertibility, which means that the process can be inverted, is an
important property of the model. If we let 1 denotes the identity operator, i.e., 1yt
= yt , then the inversion operator (1 ϕ L)− 1 is defined to be the operator so that
(1 − ϕ L)− 1 (1 − ϕ L) = 1
7.5.1 MA(1)
xt = st + θ st− 1 ,
E(x2 ) = (1 + θ 2 )σ 2
So, for a MA(1) process, we have a fixed mean and a covariance function which
does not depend on time t: So we know MA(1) is stationary The autocorrelation
can be computed as ρ x(h) = γ x(h)/γ x(0), so
7.5.2 MA(q)
Time series models known as ARIMA models may include autoregressive terms
and/or moving average terms., we learned an autoregressive term in a time series
Chapter 7. Lecture 14: 2019DMB04 - Karnam Yogesh 102
model for the variable is a lagged value of . For instance, a lag 1 autoregressive
term is (multiplied by a coefficient). This lesson defines moving average terms.
xt = θ (L)st = (θ k Lk )st
E(xt ) = 0
E(x2 ) =Σ θ 2 σ 2
7.6 AR(1)
7.6.2 AR(P)
where st ∼ WN (0, σ 2 ) and we keep this assumption through this lecture. Simi-
larly, AR(p) (auto regressive of order p)
First, for a series xt , we can model that the level of its current observations depends
on the level of its lagged observations. For example, if we observe a high GDP
realization this quarter, we would expect that the GDP in the next few quarters
are good as well. This way of thinking can be represented by an AR model. The
AR(1) (autoregressive of order one) can be written as:
xt = ϕ xt− 1 + st . . . (7.12)
where st ∼ WN (0, σ 2 ) and we keep this assumption through this lecture. Simi-
larly, AR(p) (auto regressive of order p) can be written as:
and
xt = st + θ 1 st− 1 + . . . + θ q st− q . . . (7.15)
A.1 Introduction
To provide some context for our discussion about Fourier transforms and series,
let us imagine a scenario. Since Fourier analysis is concerned with signals and
waves, we imagine a musician playing a steady note on a trumpet. Further there
is a microphone in front of the trumpet that is essentially capturing the sound
produced. The mic typically has a diaphram which undergoes pressure due to
the sound waves from the trumpet and this pressure then translates into voltage,
which is proportional to the instantaneous air pressure. Now if measure this with
an oscillopscope we will get a graph of pressure against time F (t) which would
turn out to be periodic. Note that it is the reciprocal of the period which is termed
as the frequency of the note being on the trumpet. The typical relationship be-
tween frequency and time period is:
1
ν= (A.1)
T
Let us say that the fundamental frequency of this one note sound is 256Hz. Now
in reality one sine wave of the said frequency is not produced, rather multiple
overtones are produced which are multiples of the fundamental frequency with
various amplitudes and phases. Phase basically determines where in the cycle
the signal would start repeating. Technically, we can analyse the wave by finding
a list of the amplitudes and phases of the various sine waves that comprise the
complex signal. We can plot a graph of amplitudes against frequency denoted by
A(ν). Now since we are effectively bringing the function from the time domain
to the frequency domain we say that A(ν) is the Fourier transform of F (t).
105
Appendix A. Appendix A: Fourier Transforms 106
Continuing with our previous example, we can say that a steady note sound signal
can be described completely by the fundamental frequency, its amplitude and the
amplitudes of its overtones or harmonics. For this we can use a discrete sum:
Here ν0 represents the fundamental frequency. The various sine and cosine func-
tions in the series denote the various phases of the signal that are not in step with
the fundamental signal. We can rewrite the previous formula as:
∞
X
F (t) = an cos(2πnν0 t) + bn sin(2πnν0 t) (A.3)
−∞
Note that this process of constructing a waveform by adding together the funda-
mental frequency and its overtones of various amplitudes is called Fourier syn-
thesis. Given that cos(−x) = cos(x) and sin(x) = − sin(−x) we can rewrite the
above expression as:
∞
X
F (t) = A0 /2 + An cos(2πnν0 t) + Bn sin(2πnν0 t) (A.4)
n=1
A.3 Amplitudes
Now note that the opposite process of extracting the frequencies and amplitudes
from the original signal is called Fourier analysis. We are interested in trying
to find the amplitudes Am and Bm for various instances of m. Now before mov-
ing ahead, we note the utilisation of the orthogonality property of trigonometric
functions - the central idea is that if we take a sine and a cosine, or two sines or two
cosines (as multiples of the fundamental frequency), then take their product and
integrate this product over the period of fundamental frequency, then the result
is zero. Noting that 1 period is denoted as the inverse of frequency: P = 1/ν0 , we
have: Z P
cos(2πnν0 t) cos(2πmν0 t)dt = 0 (A.5)
t=0
Z P
sin(2πnν0 t) sin(2πmν0 t)dt = 0 (A.6)
t=0
Z P
sin(2πnν0 t) cos(2πmν0 t)dt = 0 (A.7)
t=0
Appendix A. Appendix A: Fourier Transforms 107
Note that in case m = n then the first two integrals would resolve to 1/2ν0 . Now
we note some general expressions for the coefficient values:
Z P
2
Bm = F (t) sin(2πmν0 t)dt (A.8)
P t=0
Z P
2
Am = F (t) cos(2πmν0 t)dt (A.9)
P t=0
An alternate way of writing the Fourier series is shown below. Note that this
expression comes about as a result of taking Am = Rm cos φm and Bm = Rm sin φm .
∞
A0 X
F (t) = + Rm cos(2πmν0 t + φm ) (A.10)
2 m=1
We can actually write the Fourier series in the form of complex exponentials in-
stead of trigonometric functions. First as a reference we note the DeMoivre’s the-
orem:
(cos x + i sin x)n = cos(nx) + i sin(nx) = eixn (A.11)
Now we denote the Fourier series using this notation:
∞
X
F (t) = Cm e2πimν0 t (A.12)
−∞
Now the coefficients Cm are basically complex numbers. Now typically, without
going into the derivations, we use inversion formulae to get the coefficienty values
of the real and complex parts of the coefficients:
Z 1/ν0
Am = 2ν0 F (t) cos(2πmν0 t)dt (A.13)
0
Z 1/ν0
Bm = 2ν0 F (t) sin(2πmν0 t)dt (A.14)
0
Z 1/ν0
Cm = 2ν0 F (t)e−2πmν0 t dt (A.15)
0
Appendix A. Appendix A: Fourier Transforms 108
We note that whether F (t) is periodic or not, we can give a complete description
of the function using combinations of sines and cosines. Note that a non periodic
function can be thought of as the limiting case of a periodic function wherein
the period tends to infinity and the fundamental frequency tends to zero. Also
in this case the harmonics would be closely spaced and there would be a contin-
uum of them, with each such harmonic having an infinitesimal amplitude given
as: a(ν)dν. Now integrating throughout all these amplitudes to synthesize the
function we get:
Z ∞ Z ∞
F (t) = a(ν)dν cos(2πνt) + b(ν)dν sin(2πνt) (A.21)
−∞ −∞
Note that if F (t) is real then a(ν) and b(ν) are real as well, however if the function
F (t) is asymmetrical that is if F (t) 6= F (−t) then we have complex values Φ(ν). In
Appendix A. Appendix A: Fourier Transforms 109
certain cases F (t) is symmetrical, which in turn implies that Φ(ν) is real and F (t)
consists only of cosines. Our Fourier series would then become:
Z ∞
F (t) = Φ(ν) cos(2πνt)dν (A.24)
−∞
Now comes the interesting bit. We can actually recover the function that contains
information about the frequencies Φ(ν) from F (t) by way of inversion.
Z ∞
Φ(ν) = F (t) cos(2πνt)dt (A.25)
−∞
Finally we say that Φ(ν) which is a function in the frequency domain, is the Fourier
transform of F (t) which is in the time domain. Another general formulation of
this is given by: Z ∞
Φ(ν) = F (t)e−2πiνt dt (A.26)
−∞
A.6 Spectrum
Note first that the square of the amplitude of oscillation of a wave, gives a measure
of power contained in each harmonic of the wave. In case the fourier transform
of F (t) that is Φ(ν) is complex, then if we take the product of this and its complex
conjugate Φ∗ (ν) then we would get the power spectrum or the spectral power
density of F (t).
S(ν) = Φ(ν)Φ∗ (ν) (A.27)
References
A popular property of the Laplace transform is that of linearity and can be stated
as:
L{aF1 (t) + bF2 (t)} = aL{F1 (t)} + b{F2 (t)} (B.2)
Yet another important theorem associated with this transform is called the first
shift theorem and can be defined as follows:
111
Appendix B. Appendix B: Laplace, Dirac Delta and Fourier Series 112
n!
• L(tn ) =
sn+1
1
• L{tea t} =
(s − a)2
• Before the next formula we must recall the Euler’s formula that gives us an
expression for the polar coordinates of complex numbers:
eit = cos(t) + i sin(t) (B.10)
Now we note that due to the linearity property, the Lapace transform of ei t
is given by:
L(eit ) = L(cos(t)) + iL(sin(t)) (B.11)
Where the Laplace transforms of the individual trigonometric functions are:
s
L(cos(t)) = 2 (B.12)
s +1
1
L(sin(t)) = 2 (B.13)
s +1
d
• L{tF (t)} = − f (s)
ds
• A popular function whose Laplace transform is immensely useful is the
Heaviside’s unit step function which is given by:
(
0, if t < 0.
H(t) = (B.14)
1, if t ≥ 0.
Consequently its Laplace transform is given by:
1
L(1) = (B.15)
s
Appendix B. Appendix B: Laplace, Dirac Delta and Fourier Series 113
In a similar manner, we can generalize the above two points to write the
Laplace transform of an n times differentiable function as:
L{F (n) (t)} = sn f (s) − sn−1 F (0) − sn−2 F 0 (0) − · · · − F (n−1) (0) (B.18)
w
• L{sin(wt)} =
s2
+ w2
s
• L{cos(wt)} = 2
s + w2
We earlier mentioned that the Laplace transform of the Heaviside function is given
by L{H(t)} = 1/s. However we are usually more interested in finding out the
transform os H(t − t0 ) where t0 > 0. Applying the Laplace transform definition
to this we get: Z ∞
l{H(t − t0 )} = H(t − t0 )e−st dt (B.19)
0
Note that as per the way this function is defined, for t < t0 we would have H(t −
t0 ) = 0 and the transform would evaluate as follows, taking only those t such that
t > t0 and consequently the function evaluating to H(t − t0 ) = 1.
Z ∞ ñ ô∞
e−st e−st0
L{H(t − t0 )} = −st
e dt = − = (B.20)
t0 s t s
0
This function assumes special relevance when it is multiplied with another func-
tion and that action of multiplying this Heaviside function is analogous to ’switch-
ing on’ the other function. With this intuition we can state the second shift theo-
rem defined as:
L{H(t − t0 )F (t − t0 )} = e−st0 f (s) (B.21)
Note that with this we can find the Laplace transform of a function that is switched
on at t = t0 .
Appendix B. Appendix B: Laplace, Dirac Delta and Fourier Series 114
Taking out the inverse of Laplace transforms usually involves a bit of solving us-
ing the partial fractions decomposition method. The standard definition for the
inverse transform is given as:
It is observed that there exist certain functions which might not classify as func-
tions in the true sense. In order to classify as a function, an expression needs to
be defined for all values of the variable in the specified range. Note that if this
is not the case, then the expression would not be a function since it would cease
to be well defined. We are usually not interested in such expressions, however
we note that even if some of these expressions might not be well defined, if they
have some desirable global properties, then such expressions indeed turn out to
be rather useful. One such function is Dirac’s δ function. The definition is as
follows:
δ(t) = 0, ∀t, t 6= 0 (B.27)
Z ∞
h(t)δ(t)dt = h(0) (B.28)
−∞
The above is defined for any function h(t) that is continuous in the interval (−∞, ∞).
The Dirac-δ function can be thought of as the limiting case of a top hat function
with unit area as it becomes infinitesimally thin and tall. First we define a function
as follows:
0,
if t ≤ −1/T .
Tp (t) = 0.5T, if −1/T < t < 1/T . (B.29)
if t ≥ 1/T .
0,
Appendix B. Appendix B: Laplace, Dirac Delta and Fourier Series 115
The Dirac Delta function then models the limiting behaviour of this function and
can be written as:
δ(t) = lim Tp (t) (B.30)
T →∞
The value of the integral within the limits indicates the area under the cuve h(t)Tp (t)
and we say that this area would approach the value h(0) as T → ∞. Further we
say that for very large value of T the interval [−1/T, 1/T ] will be small enough for
the value of h(t) to not differ from its value at the origin. With this we can express
h in the form of: h(t) = h(0) + (t) where the term (t) tends to 0 as T goes to
infinity. Therefore we can say that h(t) tends to h(0) for extremely large values of
T . Note that δ(t) is not a true function since it is not defined for t = 0, therefore
δ(0) has no value. Writing out the left and right side limits we get:
Z ∞
h(t)δ(t)dt = h(0) (B.32)
0−
Z 0+
h(t)δ(t)dt = h(0) (B.33)
−∞
As a limiting case of the top hat function the Dirac Delta function then looks this:
We note an important property that as the interval gets smaller and smaller due
to T becoming large, the area under the top hat function would always be unity.
Hence in the limiting case, the length of the arrow (which happens to represent
the Dirac-δ function) is 1. Therefore we have with h = 1:
Z ∞
δ(t)dt = 1 (B.34)
−∞
This essentially means that we are reducing the width of the top hat function such
that it lies between 0 and 1/T (because in the exponential order laplace transfor-
mation we usually have limits starting from 0), and that we are increasing the
height from T /2 to T so as to preserve the unit area.
Going further with the Dirac-δ function we say that the function δ(t−t0 ) represents
an impulse that is centered at time t0 . This can be thought of as a transient signal
and the limiting case of a function K(t) which is the displaced version of the top
hat function:
0,
if t ≤ t0 − 1/T .
K(t) = 0.5T, if t0 − 1/T < t < t0 + 1/T . (B.36)
if t ≥ t0 + 1/T .
0,
We can get the Laplace transform of this Dirac Delta function, provided that t > 0
as:
L{δ(t − t0 )} = e−st0 (B.38)
This has been called the filtering property since we can see clearly from the defi-
nition that the Dirac-δ function helps us pick out a particular value of a function.
Z ∞
h(t)δ(t − t0 )dt = h(t0 ) (B.39)
−∞
Appendix B. Appendix B: Laplace, Dirac Delta and Fourier Series 117
The central idea behind a Fourier series is that any given function can be ex-
pressed as a series of Sine and Cosine functions. Here we will be dealing mostly
with periodic and piecewise continuous functions. Let us first start with functions
defined on the closed interval [−π, π] which possess one sided limits at −π and π.
We have a function that maps values such that f : [−π, π] → C. We can now state
the Dirichlet theorem as follows: If f is a member of the space of piecewise con-
tinuous functions which are 2π periodic on the closed interval [−π, π], having both
left and right derivatives at the end points, then we say that for each x ∈ [−π, π]
the Fourier series of f converges to:
f (x− ) + f (x+ )
(B.40)
2
And at both the end points (x = ±π) the series converges to:
f (π − ) + f ((−π)+ )
(B.41)
2
The fourier series gives us a result that at points of discontinuity the value of
Fourier series of f takes the value of the mean of one sided limits of f as the
value at the discontinuous point.
Remember that the whole point of a Fourier series is to express a periodic function
as a series of sine and cosine functions. We see the components of such a series
are typically periodic functions of period 2π given as:
These terms, together form a trigonometric system and the resulting series so
obtained is called the trigonometric series:
Here the a and b terms are the coefficients of the series and we say that if the
coefficients are such that the series converges, then its sum would also have the
same period as the individual components, that is 2π. Now if we have a function
f (x) of period 2π and can be represented by a convergent series of the form in
equation 44 then we say that the Fourier series of f (x) is:
∞
X
f (x) = a0 + (an cos(nx) + bn sin(nx)) (B.45)
n=1
References 118
Consequently, the Fourier coefficients can be found using the following equa-
tions: Z π
1
a0 = f (x)dx (B.46)
2π −π
1 π
Z
an = f (x) cos(nx)dx (B.47)
π −π
1 π
Z
bn = f (x) sin(nx)dx (B.48)
π −π
A crucial point to note is that the underlying concept behind this Fourier series is
the orthogonality of the trigonometric system - which means that every term in
the trigonometric series is orthogonal to each other, or that their inner product is
zero. In terms of integrals we can write this condition as:
Z π
cos(nx) sin(mx)dx = 0 (B.49)
−π
References