Cheat Sheet - JAM

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Probability, Statistics & Mathematics

(Cheat Sheet)

Author: Anik Chakraborty


Contents

1 Probability 1
1.1 Theory of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Sigma Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.3 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.4 Stochastic Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Univariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Bivariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.3 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.4 Probability Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 Markov & Chebyshev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.2 Cauchy-Schwarz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.3 Jensen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.4 Lyapunov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Theoretical Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.1 Discrete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.2 Continuous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5.3 Multivariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.4 Truncated Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.1 Chi-square, t, F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.2 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7 Distribution Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7.1 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7.2 Negative Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7.4 Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.5 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.6 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.7 Cauchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.8 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

i
CONTENTS ii

1.8 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.8.1 Orthogonal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.8.2 Polar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.8.3 Special Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Statistics 21
2.1 Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.1 Minimum MSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.2 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.3 Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.4 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.5 Exponential Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1.6 Methods of finding UMVUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.7 Cramer-Rao Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.8 Methods of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Testing of Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.1 Tests of Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.1 Methods of finding C.I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 Wilk’s Optimum Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.3 Test Inversion Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Large Sample Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1 Modes of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Mathematics 35
3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1 Combinatorial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.2 Difference Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.1 Vectors & Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.4 System of Linear Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 1

Probability

1.1 Theory of Probability


1.1.1 Sigma Field
Ω: Universal Set. A non-empty class A of few subsets of Ω is said to form a sigma field on Ω if it
satisfies the following properties-

(i) A ∈ A =⇒ Ac ∈ A (Closed under complementation)



S
(ii) A1 , A2 , . . . , An , . . . ∈ A =⇒ An ∈ A (Closed under countable union)
n=1

Theorems
(1) A σ-field is closed under finite unions.

(2) A σ-field must include the null set, ϕ and the whole set, Ω.

(a) A = {∅, Ω} is the smallest/minimal σ-field on Ω.


(b) If A ∈ Ω, then A = {∅, A, Ac , Ω} is the minimal σ-field containing A, on Ω.
(c) The power set of Ω (the set of all subsets of Ω) is the largest σ-field on Ω.

(3) A σ-field is closed under countable intersections.

1.1.2 Properties
(1) For two sets A, B ∈ A -

(a) Monotonic Property: If A ⊆ B, P (A) ≤ P (B)


(b) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) =⇒ P (A ∪ B) ≤ P (A) + P (B)
(c) P (A ∪ B) = P (A − B) + P (B − A) + P (A ∩ B)
p
(d) P (A ∩ B) ≤ min{P (A), P (B)} =⇒ P (A ∩ B) ≤ P (A) · P (B)
(e) P (A ∩ B) ≥ P (A) + P (B) − 1
(f) P (A) = P (A ∩ B) + P (A ∩ B c ) =⇒ P (A − B) = P (A) − P (A ∩ B)

1
CHAPTER 1. PROBABILITY 2

(2) For any n events A1 , A2 , . . . , An ∈ A -


n  n
S P
(a) Boole’s inequality: P Ai ≤ P (Ai )
i=1 i=1
n  n
T P
(b) Bonferroni’s inequality: P Ai ≥ P (Ai ) − (n − 1)
i=1 i=1
n
n  n
P P S P
(c) P (Ai ) − 1⩽i1 <i2 ⩽n P (Ai1 ∩ Ai2 ) ≤ P Ai ≤ P (Ai )
i=1 i=1 i=1

(d) Poincare’s theorem:


n
! n
[ X X X
P Ai = P (Ai ) − P (Ai1 ∩ Ai2 ) + P (Ai1 ∩ Ai2 ∩ Ai3 ) − · · ·
i=1 i=1 1⩽i1 <i2 ⩽n 1⩽i1 <i2 <i3 ⩽n

n
!
\
· · · + (−1)n−1 P Ai
i=1

(e) Jordan’s theorem:


i. The probability that exactly m of the n events will occur is -
     
m+1 m+2 n−m n
P(m) = Sm − Sm+1 + Sm+2 − · · · + (−1) Sn
m m m

ii. The probability that atleast m of the n events will occur is -

Pm = P(m) + P(m+1) + · · · + P(n)


     
m m+1 n−m n − 1
= Sm − Sm+1 + Sm+2 − · · · + (−1) Sn
m−1 m−1 m−1
P
where, Sr = 1⩽i1 <i2 <···<ir ⩽n P (Ai1 ∩ Ai2 ∩ · · · ∩ Air ), r = 1(1)n

1.1.3 Conditional Probability


Consider, a probability space (Ω, A, P).
n−1 
T
(1) Compound probability: n events A1 , A2 , . . . , An ∈ A are such that P Ai > 0. Then,
i=1
n−1
T
P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 ) · P (A2 /A1 ) · P (A3 /A1 ∩ A2 ) · · · P (An / Ai )
i=1

(2) Total Probability Theorem: If (B1 , B2 , . . . , Bn ) is a partition of Ω with P (Bi ) > 0 ∀i, then
n
P
for any event A ∈ A, P (A) = P (Bi ) · P (A/Bi )
i=1

P (Bi )P (A/Bi )
(3) Bayes’ Theorem: P (Bi /A) = n
P ,i = 1(1)n, if P (A) > 0
P (Bk )P (A/Bk )
k=1
CHAPTER 1. PROBABILITY 3

(4) Bayes’ theorem with future events: Let, C ∈ A be an event under the previous conditions
with P (A/Bi ) > 0, i = 1(1)n. Then,
n
P
P (Bi )P (A/Bi )P (C/A ∩ Bi )
i=1
P (C/A) = n
P
P (Bi )P (A/Bi )
i=1

1.1.4 Stochastic Independence


(1) For two independent events A, B -

P (A/B) = P (A/B c ) = P (A) ⇐⇒ P (A ∩ B) = P (A) · P (B)

(2) Pairwise independence: For n events A1 , A2 , . . . , An ∈ A are said to be ‘pairwise’ indepen-


dent if -
P (Ai1 ∩ Ai2 ) = P (Ai1 ) · P (Ai2 ), ∀i1 < i2

(3) Mutual independence: The above events are ‘mutually’ independent if -

P (Ai1 ∩ Ai2 ) = P (Ai1 ) · P (Ai2 ), ∀i1 < i2

P (Ai1 ∩ Ai2 ∩ Ai3 ) = P (Ai1 ) · P (Ai2 ) · P (Ai3 ), ∀i1 < i2 < i3


..
.
P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 ) · P (A2 ) · · · P (An )

1.2 Random Variable


1.2.1 Univariate
X : Ω → R, such that {ω : X(ω) ≤ x} ∈ A, ∀x ∈ R is a random variable on {Ω, A}.

(1) The same variable X(·) is a R.V. for a particular choice of σ-field but may not for another
choice of σ-field.

(2) X is a R.V. on (Ω, A) =⇒ f (X) is also a R.V. on (Ω, A). (for any f )

(3) Continuity theorem of Probability: (A1 ⊂ A2 ⊂ . . .) or (A1 ⊃ A2 ⊃ . . .)


 
=⇒ lim P (An ) = P lim An
n→∞ n→∞

(4) CDF: FX (x) = P [{ω : X(ω) ≤ x}], ∀x ∈ R

(a) Non-decreasing: −∞ < x1 < x2 < ∞ =⇒ F (x1 ) ≤ F (x2 )


(b) Normalized: lim F (x) = 0, lim F (x) = 1
x→−∞ x→+∞

(c) Right Continuous: lim+ F (x) = F (a), ∀a ∈ R


x→a
CHAPTER 1. PROBABILITY 4

For any R.V. X with CDF F (·) -

P (a < X < b) = F (b − 0) − F (a) P (a ≤ X ≤ b) = F (b) − F (a − 0)

P (a < X ≤ b) = F (b) − F (a) P (a ≤ X < b) = F (b − 0) − F (a − 0)

(5) Decomposition theorem: F (x) = αFc (x) + (1 − α)Fd (x) where, 0 ≤ α ≤ 1 and Fc (x), Fd (x)
are continuous and discrete D.F., respectively.

(a) α = 0 =⇒ X is purely discrete.


(b) α = 1 =⇒ X is purely continuous.
(c) 0 < α < 1 =⇒ X is mixed.

(6) X is non-negative with E(X) = 0 =⇒ P (X = 0) = 1


(b−a)2
(7) P (a ≤ X ≤ b) = 1 =⇒ V ar(X) ≤ 4

(8) P (|X| ≤ M ) = 1 for some 0 ≤ M < ∞ =⇒ µ′r exists ∀ r



P
(9) P (X ∈ {0, N}) = 1 =⇒ E(X) = {1 − F (x)}
x=0

(10) P (X ∈ [0, ∞)) = 1 =⇒ lim x{1 − F (x)} = 0, if E(X) exists.


x→∞

R∞
(11) E(X) = {1 − F (x)}dx for any non-negative R.V. X.
0
Z ∞
r
E(X ) = r xr−1 {1 − F (x)}dx
0

(12) ln(GMX ) = E(ln X)

(13) pth quantile: ξp such that F (ξp − 0) ≤ p ≤ F (ξp ). For continuous case, F (ξp ) = p

Symmetry
X has a symmetric distribution about ‘a’ if any of the following, holds -

(a) P (X ≤ a − x) = P (X ≥ a + x), ∀x ∈ R

(b) F (a − x) + F (a + x) = 1 + P (a + x)

Again, if X is continuous then F (a − x) + F (a + x) = 1 or f (a − x) = f (a + x), ∀x ∈ R

ˆ E(X) = a, if it exists

ˆ Med (X) = a
CHAPTER 1. PROBABILITY 5

1.2.2 Bivariate
X

Y
: Ω → R2 , such that {ω : X(ω) ≤ x, Y (ω) ≤ y} ∈ A, ∀(x, y) ∈ R2 is a bivariate random variable
on {Ω, A}.

(1) CDF: F (x, y) = P [{ω : X(ω) ≤ x, Y (ω) ≤ y}], ∀(x, y) ∈ R2

(a) F (x, y) is non-decreasing and right continuous w.r.t. each of the arguments x and y.
(b) F (−∞, y) = F (x, −∞) = 0, F (+∞, +∞) = 1
(c) For x1 < x2 , y1 < y2 -

P (x1 < X < x2 , y1 < Y < y2 ) = F (x2 , y2 ) − F (x1 , y2 ) − F (x2 , y1 ) + F (x1 , y1 ) ≥ 0

Marginal CDF

FX (x) = lim FX,Y (x, y), FY (y) = lim FX,Y (x, y)


y→∞ x→∞
p
• FX (x) + FY (y) − 1 ≤ FX,Y (x, y) ≤ FX (x) · FY (y), ∀(x, y) ∈ R2

(2) Joint distribution cannot be determined uniquely from the marginals.

(3) fX,Y (x, y) = fX (x) · fY |X (y|x) = fY (y) · fX|Y (x|y)


n o
(4) fX,Y (x, y; α) = fX (x) fY (y) 1 + α · 2FX (x) − 1 · 2FY (y) − 1 , α ∈ [−1, 1]

Stochastic Independence
(5) FX,Y (x, y) = FX (x) · FY (y), ∀(x, y) ∈ R2 =⇒ fX,Y (x, y) = fX (x) · fY (y), ∀(x, y)

(6) X ⊥⊥ Y =⇒ f (X) ⊥⊥ g(Y ) (converse is true when f , g is 1-1)

(7) X ⊥⊥ Y iff fX,Y (x, y) = k · f1 (x) · f2 (y), ∀ x, y ∈ R for some k > 0.

1.2.3 Results
(1) Sum Law: E(X + Y ) = E(X) + E(Y ), if all exists

(2) Product Law: X ⊥⊥ Y =⇒ E(XY ) = E(X) · E(Y )


Cov (X, Y ) = 0 ⇏ X ⊥⊥ Y

(3) X, Y identically distributed ⇏ P (X = Y ) = 1

(4) X1 , X2 , . . . , Xn are iid and continuous R.V.s =⇒ n! arrangements are equally likely
iid
(5) X ∼ Y =⇒ (X − Y ) is symmetric
Ru
(6) PDF of max{X, Y } : fU (u) = {f (u, t) + f (t, u)}dt f : Joint PDF of (X, Y )
−∞
CHAPTER 1. PROBABILITY 6

Conditional Distribution
(7) X ⊥⊥ Y =⇒ E(Y |X = x) = k, some constant ∀ x
 
(8) X ⊥⊥ Y − ρ σσXY =⇒ E(Y |X = x) is linear in x

(9) E(Y ) = E[E(Y |X)] or E(X) = E[E(X|Y )]


(10) V ar (Y ) = V ar {E(Y |X)} + E{V ar (Y |X)}
V ar {E(Y |X)}
(11) Correlation ratio: ηY2 X = V ar (Y )

N
P
(12) Wald’s equation: {Xn }: sequence of iid R.V.s, P (N ∈ N) = 1. Define, SN = Xi
i=1

=⇒ E(SN ) = E(X1 ) E(N )

=⇒ V ar (SN ) = V ar (X1 ) · E(N ) + E 2 (X1 ) · V ar (N )

1.3 Generating Functions


1.3.1 Moments
 
(1) MGF: MX (t) = E etX , |t| < t0 , for some t0 > 0 [if E etX < ∞]
It determines a distribution uniquely.
tr
(2) µ′r : coefficient of r!
in the expansion of MX (t), r = 0, 1, 2, . . .

tr µ′r
converges absolutely, then a sequence of moments {µ′r } determine
P
(3) If the power series: r!
r=0
a distribution uniquely. For a bounded R.V this always holds.

(4) Xi are independent with MGF Mi (t) =⇒ MS (t) = ni=1 Mi (t), where S = ni=1 Xi
Q P


(5) Bivariate MGF: MX,Y (t1 , t2 ) = E et1 X+t2 Y for |ti | < hi for some hi > 0, i = 1, 2
tr1 ts2
(6) µ′r,s : coefficient of r!s!
in the expansion of MX,Y (t1 , t2 )

∂ r+s
(7) Also, ∂tr1 ∂ts2
MX,Y (t1 , t2 ) |(t1 =0,t2 =0) = µ′r,s

(8) Marginal MGF: MX (t) = MX,Y (t, 0) & MY (t) = MX,Y (0, t)

(9) X & Y are independent ‘iff’ MX,Y (t1 , t2 ) = MX,Y (t1 , 0) · MX,Y (0, t2 ), ∀(t1 , t2 )

1.3.2 Cumulants
(1) CGF: KX (t) = ln{MX (t)}, provided the expansion is a convergent power series.

(2) k1 = µ′1 (mean), k2 = µ2 (variance), k3 = µ3 & k4 = µ4 − 3k22

(3) For two independent R.V. X & Y , kr (X + Y ) = kr (X) + kr (Y )


CHAPTER 1. PROBABILITY 7

1.3.3 Characteristic Function



(1) CF: ϕX (t) = E eitX

(2) ϕX (0) = 1, |ϕX (t)| ≤ 1

(3) ϕX (t) is continuous on R and always exists for t ∈ R

(4) ϕX (−t) = ϕX (t)

(5) If X has a symmetric distribution about ‘0’ then ϕX (t) is real valued and an even function of t.

(6) Uniqueness property and independence as of MGF.


R∞
(7) Inversion theorem: If −∞ ϕX (t)dt < ∞, then pdf of the distribution is -
Z ∞
1
f (x) = e−itX ϕX (t)dt
2π −∞

1.3.4 Probability Generating Function


(1) PGF: PX (t) = E(tX ), if |t| < 1

(2) It generates probability and factorial moments. It also determines a distribution uniquely.
dr
(3) rth order factorial moment: µ[r] = P (t) | (t=1) ,
dtr X
r = 0, 1, . . .
Qn Pn
(4) X1 , X2 , . . . , Xn are independent with PGF Pi (t) =⇒ PS (t) = i=1 Pi (t), where S = i=1 Xi

1.4 Inequalities
1.4.1 Markov & Chebyshev
E(X)
(1) Markov: For a non-negative R.V. X, P (X ⩾ a) ⩽ a
, for a > 0.
‘=’ holds if X has a two-point distribution.

(2) Chebyshev: P (|X − µ| ⩾ tσ) ⩽ t12 , t > 0 where µ = E(X) & σ 2 = V ar(X) < ∞.
‘=’ holds if X is such that -
1


 2 , if x = µ ± tσ
f (x) = 2t (t > 1)
1 − 1 , if x = µ

t2

(3) One-sided Chebyshev: E(X) = 0, V ar(X) = σ 2 < ∞



σ2
⩽ , if t > 0


P (X ⩾ t) σ 2 + t2
 t2
⩾
 , if t < 0
σ 2 + t2
CHAPTER 1. PROBABILITY 8

(4) If also µ4 < ∞ then,


µ4 − σ 4
P (|X − µ| ⩾ tσ) ⩽
µ4 − σ 4 + (t2 − 1)2 σ 4
µ4
It is an improvement over Chebyshev’s inequality if t2 ≥ σ4

(5) Bivariate Chebyshev: (X1 , X2 ) is a bivariate R.V. with means (µ1 , µ2 ), variances (σ12 , σ22 ) &
correlation ρ. Then for t > 0,
p
1 + 1 − ρ2
P (|X1 − µ1 | ⩾ tσ1 or |X2 − µ2 | ⩾ tσ2 ) ⩽
t2

1.4.2 Cauchy-Schwarz
If a bivariate R.V. (X, Y ) has finite variances and E(XY ) exists, then -

E 2 (XY ) ≤ E(X 2 )E(Y 2 )

‘=’ holds iff X & Y are linearly related passing through the origin i.e. P (X + λY = 0) = 1, for any
λ.

1.4.3 Jensen
f (·) is convex function and E(X) exists, then E [f (X)] ≥ f [E(X)]

Note: A function, f (·) is said to be convex on an interval I, if for x1 , x2 ∈ I and for some
λ ∈ [0, 1], if
f [λx1 + (1 − λ)x2 ] ≤ λf (x1 ) + (1 − λ)f (x2 )

If f (·) is twice differentiable then f ′′ (x) ≥ 0 is the condition for convexity.

1.4.4 Lyapunov
For a R.V. X, define βr = E (|X|r ) (assuming it exists) then -
n 1o 1 1
r+1
βrr is non decreasing i.e. βrr ≤ βr+1
CHAPTER 1. PROBABILITY 9

1.5 Theoretical Distributions


1.5.1 Discrete

X CDF PDF E(X) Var(X) MGF


 Nt 
#{i:xi ⩽ x} 1 N +1 N 2 −1 e t e −1
U {x1 , . . . , xN } N N 2 12 N et −1

Bernoulli (p) (1 − p)1−x px (1 − p)1−x p p(1 − p) (1 − p + pet )


n
 n
Bin (n, p) I1−p (n − x, x + 1)[1] x
px (1 − p)n−x np np(1 − p) (1 − p + pet )
−N p
(Nxp)(Nn−x ) N −n

Hyp (N, n, p) - N np np(1 − p) N −1
-
(n)
1−p 1−p p
Geo (p) 1 − (1 − p)x+1 p(1 − p)x p p2 1−(1−p)et
 n
n+x−1
 n(1−p) n(1−p) p
NB (n, p) Ip (n, x + 1)[1] n−1
pn (1 − p)x p p2 1−(1−p)et
R∞
eλ(e −1)
t
e−t tx x
Poisson (λ) Γ(x+1)
dt e−λ λx! λ λ
λ

Rp tk−1 (1−t)n−k
[1] Ip (k, n − k + 1) = 0 B(k,n−k+1) dt (Incomplete Beta Function)

Properties
Binomial
(1) Mode: [(n + 1) p] if (n + 1) p is not an integer, else {(n + 1) p − 1} and (n + 1) p.

(2) Factorial Moment: µ(r) = (n)r pr


1
(3) Bin (n, p) is symmetric iff p = 2
1
(4) Variance of Bin (n, p) is minimum iff p = 2
and minimum variance = n4 .
iid 2n 1 2n
(5) X, Y ∼ Bin (n, 21 ) =⇒ P (X = Y ) =
 
n 2

Geometric
1 1−p
(1) X : number of trials required to get the 1st success, then E(X) = p
and V ar(X) = p2

(2) Lack of Memory: X ∼ Geo (p) ⇐⇒ P (X > m + n | X > m) = P (X ⩾ n), ∀m, n ∈ N

Negative Binomial
h i  
(1) Mode: (n−1)(1−p)
p
if (n−1)(1−p)
p
is not an integer, else (n−1)(1−p)
p
−1 , (n−1)(1−p)
p
.

(2) NB (n, p) ≡ Bin (−n, P ) where, P = − 1−p


p
CHAPTER 1. PROBABILITY 10

(3) Y : number of trials required to get the rth success. Then -


 
y−1 r
P (Y = y) = p (1 − p)y−r , y = r, r + 1, . . .
r−1

Here, Y is discrete waiting time R.V. (Pascal Distribution)

(4) X ∼ Bin (n, p), Y ∼ NB (r, p) =⇒ P (X ≥ r) = P (Y ≤ n)

Poisson
(1) Mode: [λ] if λ is not an integer, else (λ − 1) and λ.

1.5.2 Continuous

X CDF PDF E(X) Var(X) MGF


x−a I{a<x<b} a+b (b−a)2 etb −eta
U (a, b) b−a b−a 2 12 t(b−a)
x
e− θ xn−1 1
Gamma (n, θ) Γx (n, θ)[2] θn Γ(n)
nθ nθ2 (1−tθ)n
x x
Exp (θ) 1 − e− θ 1
θ
e− θ θ θ2 1
(1−tθ)

xm−1 (1−x)n−1 m mn
Beta (m, n) Ix (m, n) B(m,n) m+n (m+n)2 (m+n+1)
-
1 xm−1 m m(m+n−1)
Beta2 (m, n) - B(m,n)
· (1+x)m+n n−1
(n > 1) (n−2)(n−1)2
(n > 2) -
(x−µ)2 t2 σ 2
x−µ √1 e−

N (µ, σ 2 ) Φ σ σ 2π
2σ 2 µ σ2 etµ+ 2

(ln x−µ)2 1
 
e−
ln x−µ
 1 2 2µ+σ 2 σ2
Λ (µ, σ 2 ) Φ σ xσ


2σ 2 eµ+ 2 σ e e −1 ×
x−µ
1 1
tan−1 σ

C (µ, σ) 2
+ π σ π{σ 2 +(x−µ)2 }
× × ×
x−µ x−µ
SE (µ, σ) 1 − e−( σ ) 1
σ
e−( σ ) µ+σ σ2 etµ
(1−tσ)
1 x−µ

 e σ
 ,x ≤ µ |x−µ|
DE (µ, σ) 2 1
e− σ µ 2σ 2 etµ
(1−t2 σ 2 )
1 − 1 e− x−µ

,x > µ
 σ
2
x0 θ
 θxθ0 θx0 θx20
Pareto (x0 , θ) 1− x xθ+1 θ−1
(θ > 1) (θ−2)(θ−1)2
(θ > 2) -
x−α
1 1 e β β 2 π2 πβt etα
Logistic (α, β) − x−α x−α 2 α
( ) β 3 sin(πβt)
 
1+e β 1+e β

t
e− θ tn−1
Rx
[2] Γx (n, θ) = 0 θ n Γ(n) dt (Incomplete Gamma Function)
CHAPTER 1. PROBABILITY 11

Properties
Uniform
br+1 −ar+1
(1) µ′r = (r+1)(b−a)

(2) X ∼ U (0, n), n ∈ N =⇒ X − [X] ∼ U (0, 1)


(3) Classical & Geometric definition of probability is based on ‘Uniform distribution’ over discrete
& continuous space, respectively.

Gamma
Γ(n+r)
(1) Moments: µ′r = θr Γ(n)
, if r > −n

(2) HM: (n − 1) θ, if n > 1


(3) Mode: Mode is at (n − 1) θ, if n > 1; 0, if n = 1 and for 0 < n < 1 no mode.

Exponential
(1) µ′r = θr r!
(2) ξp = −θ ln(1 − p) =⇒ Median = θ ln 2

(3) Mode is at x = 0, MDθ = e

(4) Lack of Memory: X ∼ Exp (θ) ⇐⇒ P (X > m + n | X > m) = P (X > n), ∀m, n > 0
F ′ (x)
(5) = constant ∀x > 0 ⇐⇒ X ∼ Exponential
1−F (x)
 1

(6) X ∼ Exp (λ) =⇒ [X] ∼ Geo p = 1 − e− λ and X ⊥⊥ [X]

1 n−1

(7) X ∼ DE (θ, 1) =⇒ P [X(1) ≤ θ ≤ X(n) ] = 1 − 2

Beta
B(r+m,n)
(1) µ′r = B(m,n)
, if r + m > 0
m−1
(2) HM: m+n−1
, if m > 1
m−1
(3) Mode: m+n−2
, if m > 1, n > 1

(4) If m = n, median = 21 , ∀n > 0 and mode = 21 , if n > 1, else no mode.


D
(5) Beta (1, 1) ≡ U (0, 1)

Beta2
B(r+m,n−r)
(1) µ′r = B(m,n)
, if −m < r < n
n
(2) HM: m−1
, if m > 1
m−1
(3) Mode: n+1
, if m > 1, for 0 < m < 1, no mode.
CHAPTER 1. PROBABILITY 12

Normal
(1) median = mode = µ and bell-shaped (unimodal)
Γ(r+ 21 )
(2) µ2r−1 = 0, µ2r = (2σ 2 )r √
π
= {(2r − 1) · (2r − 3) · · · 5 · 3 · 1} σ 2r
q
(3) MDµ = σ π2
R
(4) t ϕ(t) dt = −ϕ(t) + c
1−Φ(x) ϕ(x)
(5) For x > 0, x1 − x13 < 1

ϕ(x)
< x
=⇒ 1 − Φ(x) ≃ x
, for large x (x > 3)

(6) X ∼ N (0, 1) =⇒ E[X] = − 12

Lognormal
1 2 2
(1) µ′r = erµ+ 2 r σ

1 2 2
(2) HM: eµ− 2 σ , GM: eµ , Median: eµ , Mode: eµ−σ

=⇒ Mean > Median > Mode =⇒ Positively skewed

iid 2
(3) Xi ∼ Λ (µ, σ 2 ) =⇒ GM (X) ∼ Λ (µ, σn )
˜
Cauchy
(1) µ′r exists for −1 < r < 1

(2) Median = Mode = µ

1.5.3 Multivariate
′
A ‘p’-component (dimensional) Random Vector (R.V.), X p×1 = X1 X2 · · · Xp defined on
(Ω, A) is a vector of p real-valued functions X1 (·), X2 (·), . . . , X˜p (·) defined on ‘Ω’ such that -
′
{ω : X1 (ω) ≤ x1 , X2 (ω) ≤ x2 , · · · , Xp (ω) ≤ xp } ∈ A, ∀x = x1 x2 · · · xp ∈ Rp is a random vector.
˜
 
(1) CDF: FX (x) = P {ω : X1 (ω) ≤ x1 , X2 (ω) ≤ x2 , · · · , Xp (ω) ≤ xp } , ∀x ∈ Rp
˜ ˜ ˜
(a) FX (x) is non-decreasing and right continuous w.r.t. each of x1 , x2 , . . . , xp .
˜ ˜
(b) FX (+∞, +∞, . . . , +∞) = 1, lim FX (x) = 0, ∀i = 1(1)p
˜ xi →−∞ ˜ ˜

(c) For h1 , h2 , . . . , hp > 0 -

P (x1 < X1 < x1 + h1 , x2 < X2 < x2 + h2 , . . . , xp < Xp < xp + hp ) ≥ 0


p
P p
(2) FX (xi ) − (p − 1) ≤ FX (x) ≤ p
FX1 (x1 ) FX2 (x2 ) · · · FXp (xp )
i=1 ˜ ˜ ˜

(3) The distribution of any sub-vector is a marginal distribution. There are (2p − 1) marginals.
 
p×1 X(1)
(4) Independence: X = ˜ , X(1) ⊥⊥ X(2) ⇐⇒ FX (x) = FX(1) (x(1) )·FX(2) (x(2) ), ∀x ∈ Rp
˜ X (2) ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜
˜
CHAPTER 1. PROBABILITY 13

(5) E(a′ X) = a′ µ, V ar(a′ X) = a′ Σ a, Cov(a′ X, b′ X) = a′ Σ b for non-stochastic vectors a, b ∈ Rp


˜ ˜ ˜˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜˜ ˜ ˜ ˜ ˜
′ ′
(6) E(AX) = Aµ, D(AX) = AΣA , Cov(AX, B X) = AΣB for non-stochastic matrices Aq×p , B r×p
 ˜ ˜ ˜ ˜ ˜
(7) E (X − α) A(X − α) = trace(AΣ) + (µ − α)′ A(µ − α)

˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜
(8) A matrix Σ = (σij ) is a dispersion matrix if and only if it is n.n.d.
(9) Generalized variance: det(Σ), where Σ = E (X − µ)(X − µ)′ = E(X X ′ )−µµ′ = D (X p×1 )

˜ ˜ ˜ ˜ ˜ ˜ ˜˜ ˜
(10) Σ is p.d. iff there is no a ̸= 0 for which P (a′ X = c) = 1
Σ is p.s.d. iff there is a ˜vector
˜ a ̸= 0 for which
˜ ˜ P a′ (X − µ) = 0 = 1
˜ ˜ ˜ ˜ ˜
(11) det(Σ) > 0 =⇒ Non-singular, det(Σ) = 0 =⇒ Singular Distribution
(12) Σ = BB ′ for any dispersion matrix Σ, where B is n.n.d.
(13) Σ is p.d. =⇒ Σ = BB ′ , B is non-singular and let, Y = B −1 (X−µ) =⇒ E(Y ) = 0, D(Y ) = Ip
˜ ˜ ˜ ˜ ˜ ˜
ρ12 − ρ23 ρ31
(14) ρ12· 3 = p p
1 − ρ213 1 − ρ223

Multinomial
PMF: (X1 , X2 , . . . , Xk ) ∼ Multinomial (n; p1 , p2 , . . . , pk )
(a) Singular -
k k

n! X X
px1 pxk · · · pxkk

 , if xi = n, pi = 1
fX (x1 , x2 , . . . , xk ) = x1 ! x2 ! · · · xk ! 1 k i=1 i=1
˜ 

0 , otherwise
k−1 k−1
 k−1 k−1

P P P P
(b) Non-singular - xi ≤ n, pi < 1 xk = n − xi , p k = 1 − pi
i=1 i=1 i=1 i=1
Properties
(
npi (1 − pi ) , if i = j
(1) E(Xi ) = npi , Cov(Xi , Xj ) = , i, j = 1, 2, . . . , k − 1
−npi pj ̸ j
, if i =
r
pi pj  k
P 
(2) ρij = ρ(Xi , Xj ) = − , i ̸= j As Xi = n, Xi ↑ =⇒ Xj ↓ on an average
(1 − pi )(1 − pj ) i=1

D = diag (p1 , p2 , . . . , pk−1 )


(3) det(Σ) = nk−1 det(D − P P ′ ) = nk−1 det(D)(1 − P ′ D−1 P )
˜˜ ˜ ˜ P = (p1 , p2 , . . . , pk−1 )′
˜
k−1
(4) X k−1×1 ∼ Multinomial (n; p1 , . . . , pk−1 ),
P
pi < 1
˜ i=1
 
k−1
 X p1 
=⇒ X1 | (X2 = x2 , . . . , Xk−1 = xk−1 ) ∼ Bin n − xi ,
 
k−1

 P 
i=2 1− pi
i=2

=⇒ the regression of X1 on X2 , X3 , . . . , Xk−1 is linear and the distribution is heteroscedastic.


CHAPTER 1. PROBABILITY 14
 k−1
n
t′ X

pi (eti − 1)
P
(5) MGF: E e˜ ˜ = 1+
i=1
Multiple Correlation
(6) For singular case, ρ1· 23···k = 1
k−1
P
p1 · pi
i=2
(7) For non-singular case, ρ21· 23···k−1 =  k−1

P
(1 − p1 ) 1 − pi
i=2

p1 p2
ρ12· 34···k−1 = −p p
(1 − p2 − p3 − · · · − pk−1 ) (1 − p1 − p3 − · · · − pk−1 )

Bivariate Normal
(X, Y ) ∼ BN(µ1 , µ2 , σ1 , σ2 , ρ)
 
(1) X ∼ N (µ1 , σ12 ), Y | X = x ∼ N µ2 + ρ σσ22 (x − µ1 ), σ22 (1 − ρ2 )
 2     2 
1 X−µ1 X−µ1 Y −µ2 Y −µ2
(2) Q(X, Y ) = 1−ρ2 σ1
− 2ρ σ1 σ2
+ σ2 = U 2 + V 2 ∼ χ22

Y − µ2 − ρ σσ21 (X − µ1 ) X − µ1 iid
where, U = p , V = ∼ N (0, 1)
σ2 1 − ρ2 σ1
(3) (X, Y ) is independent ⇐⇒ ρ = 0
(4) (X, Y ) ∼ BN(0, 0, 1, 1, ρ)
 q 
X+Y
(a) (X + Y ) ⊥⊥ (X − Y ) =⇒ X−Y ∼ C 0, 1+ρ 1−ρ
q  q 
(b) E{max(X, Y )} = 1−ρ π
, PDF: f U (u) = 2ϕ(u)Φ u 1−ρ
1+ρ

(c) ρ(X 2 , Y 2 ) = ρ2
(
−1, X < 0
(5) (X, Y ) ∼ BN(0, 0, 1, 1, 0), Y1 = X1 sgn(X2 ), Y2 = X2 sgn(X1 ), where sgn(X) =
1, X>0
2
=⇒ (Y1 , Y2 ) ≁ BN, ρ(Y1 , Y2 ) = π

1.5.4 Truncated Distribution


Univariate
F (x) be the CDF of X over the sample space X. Let, A = (a, b] ⊂ X, then the CDF of X over
truncated space A is -


 0 ,x ≤ a

 F (x) − F (a)
G(x) = P (X ≤ x | X ∈ A) = ,a < x ≤ b

 F (b) − F (a)

1 ,x > b

f (x)
PMF/PDF: P (X∈A)
, x∈A
CHAPTER 1. PROBABILITY 15

Results
ˆ E(X) = E(X|A) · P (X ∈ A) + E(X|Ac ) · P (X ∈ Ac )

ˆ X ∼ Geo (p) =⇒ (X − k) | X ≥ k ∼ Geo (p)

ˆ Truncated Normal distribution is platykurtic.

ˆ Truncated Cauchy distribution has finite moments.

Bivariate
(X, Y ): bivariate R.V. with PDF, f (x, y) over the sample space, X ⊆ R2 . Let, A ⊂ X, then the PDF
over the truncated space is -

f (x, y)
g(x, y) =   , if (x, y) ∈ A
P (X, Y ) ∈ A

• µ′r,s (A) = E(X r Y s |A) = xr y s  f (x,y)  dx dy


RR
A P (X,Y )∈A

1.6 Sampling Distributions


1.6.1 Chi-square, t, F
χ2n
(1) E(X) = n, Var (X) = 2n
Γ( n2 + r)
(2) µ′r =2 ·r
n , if r > − n2
Γ( 2 )
D n

(3) χ2n ≡ Gamma 2
,2 , n∈N

tn
n
(1) E(X) = 0 (n > 1), Var (X) = n−2
(n > 2)

Γ( 21 + r) Γ( n2 − r)
(2) µ′2r = nr · √ · , if −1 < 2r < n
π Γ( n2 )
D
(3) t1 ≡ C (0, 1)
D
(4) t2n ≡ F1,n

Fn1 ,n2
n2 2n22 (n1 + n2 − 2)
(1) E(X) = (if n2 > 2), Var (X) = (if n2 > 4)
n2 − 2 n1 (n2 − 2)2 (n2 − 4)
n2 (n1 − 2)
(2) Mode: (if n1 > 2) =⇒ Mean > 1 > Mode (if n1 , n2 > 2)
n1 (n2 + 2)
CHAPTER 1. PROBABILITY 16
r
Γ( n21 + r) Γ( n22 − r)

n2
(3) µ′r = · · , if −n1 < 2r < n2
n1 Γ( n21 ) Γ( n22 )

(4) ξp and ξp′ are pth quantile of Fn1 ,n2 and Fn2 ,n1 respectively =⇒ ξp ξ1−p =1
1 D
(5) F ∼ Fn,n =⇒ F ≡ and median (F ) = 1
F
n1 n n 
1 2
(6) F ∼ Fn1 ,n2 =⇒ F ∼ Beta2 ,
n2 2 2
(7) Points of inflexion are equidistant from mode (if n > 4)

1.6.2 Order Statistics


Order Statistics: X(1) ≤ X(2) ≤ · · · ≤ X(n)
n
n
{F (x)}k {1 − F (x)}n−k = IF (x) (r, n − r + 1)
P 
(1) FX(r) (x) = k
k=r

(2) FX(1) ,X(n) (x1 , x2 ) = {F (x2 )}n − {F (x2 ) − F (x1 )}n , x1 < x2

Only for Absolutely Continuous Random Variable - CDF: F (x), PDF: f (x)
(3) fX(r) (x) = n!
(r−1)!(n−r)!
{F (x)}r−1 f (x) {1 − F (x)}n−r , x ∈ R

x<y
(4) Joint PDF: n!
(r−1)!(s−r−1)!(n−r)!
{F (x)}r−1 f (x) {F (y) − F (x)}s−r−1 f (y) {1 − F (y)}n−s ,
(r < s)
R∞
(5) Sample Range: fR (r) = n(n − 1) {F (r + s) − F (s)}n−2 f (r + s)f (s) ds, 0 < r < ∞
−∞

Results
iid
(6) Xi ∼ U(0, 1) =⇒ X(r) ∼ Beta (r, n − r + 1), r = 1, 2, . . . , n
iid   σ   σ
(7) X1 , X2 ∼ N (µ, σ 2 ) =⇒ E X(1) = µ − √ , E X(2) = µ + √
π π
iid 1
(8) X1 , X2 , X3 ∼ N (µ, σ 2 ) =⇒ Sample Range: 2
(|X1 − X2 | + |X2 − X3 | + |X3 − X1 |)
iid
(9) Xi ∼ Exp (θ) =⇒ E[X(n) ] = θ 1 + 21 + 31 · · · + 1

n

iid ⊥
⊥ θ

(10) Xi ∼ Exp (θ) =⇒ Ui = X(i) − X(i−1) ∼ Exp n−i+1
, X(0) = 0

(11) X1 , X2 , . . . , X2k+1 : random sample from a continuous distribution, symmetric about µ


=⇒ Distribution of X̃ is also symmetric about µ
 
X(1) +X(n)
=⇒ E(X̃) = µ, E 2
= µ (if exists)

iid  iid
(12) Xi ∼ Shifted Exp (θ, 1) =⇒ (n − i + 1) X(i) − X(i−1) ∼ Exp (1)
  n  
=⇒ 2n X(1) − θ ∼ χ22 ⊥⊥ 2 X(i) − X(1) ∼ χ22n−2
P
i=2
CHAPTER 1. PROBABILITY 17

1.7 Distribution Relationships


1.7.1 Binomial
n
iid P
(1) Xi ∼ Bernoulli (p) =⇒ Xi ∼ Bin (n, p)
i=1

k
 k


⊥ P P
(2) Xi ∼ Bin (ni , p) =⇒ Xi ∼ Bin ni , p
i=1 i=1

iid
(3) X1 , X2 ∼ Bin (n, 12 ) =⇒ X1 − X2 is symmetric about ‘0’.
m
 k

⊥⊥ P P nk
(4) Xi ∼ Bin (ni , p) =⇒ Xk Xi = t ∼ Hyp N = ni , t, , k = 1(1)m
i=1 i=1 N
(5) Bin (n, p) → Poisson (λ = np), for n → ∞ and p → 0 such that np is finite.

(6) Bin (n, p) → N np, np(1 − p) , for large n and moderate p.

1.7.2 Negative Binomial


iid
(1) Xi ∼ Geo (p) =⇒ X(1) ∼ Geo (1 − q n ), where q = 1 − p
iid
(2) X, Y ∼ Geo (p) ⇐⇒ X X + Y = t ∼ U {0, 1, 2, . . . , t}
iid
(3) X, Y ∼ Geo (p) =⇒ min{X, Y } ⊥⊥ (X − Y )
n
iid P
(4) Xi ∼ Geo (p) =⇒ Xi ∼ NB (n, p)
i=1

k
 k


⊥ P P
(5) Xi ∼ NB (ni , p) =⇒ Xi ∼ NB ni , p
i=1 i=1

(6) NB (n, p) → Poisson (λ = n(1 − p)), for n → ∞ and p → 1 such that n(1 − p) is finite.

1.7.3 Poisson
k
 k


⊥ P P
(1) Xi ∼ Poisson (λi ) =⇒ Xi ∼ Poisson λi
i=1 i=1

m
  m

⊥ P λk P
(2) Xi ∼ Poisson (λi ) =⇒ Xk Xi = t ∼ Bin t, p = , k = 1(1)m, where λ = λi
i=1 λ i=1

k

⊥ P
(3) Xi ∼ Poisson (λi ) =⇒ (X1 , X2 , . . . , Xk ) Xi = t ∼ Multinomial (t, p1 , p2 , . . . , pk ), where
i=1
λi
pi = k
, i = 1, 2, . . . , k
P
λi
i=1
CHAPTER 1. PROBABILITY 18

1.7.4 Normal
D
(1) X ∼ N (0, σ 2 ) =⇒ X ≡ −X
n
 n n



N (µi , σi2 ) a2i σi2
P P P
(2) Xi ∼ =⇒ ai X i ∼ N ai µ i ,
i=1 i=1 i=1
n n
iid
(3) Xi ∼ N (µ, σ 2 ),
P P
ai Xi ⊥⊥
bi Xi ⇐⇒ a.b = 0
i=1 ˜˜
i=1
X̄ and (X1 − X̄, X2 − X̄, . . . , Xn − X̄) are independently distributed

1.7.5 Gamma
iid θ

(1) Xi ∼ Exp (θ) =⇒ X(1) ∼ Exp n
iid
(2) X, Y ∼ Exp (θ) =⇒ X X + Y = t ∼ U(0, t)
(3) X ∼ Shifted Exp (µ, θ) =⇒ (X − µ) ∼ Exp (θ)
X−µ
(4) X ∼ DE (µ, σ) =⇒ σ
∼ Exp (θ = 1) and |X| ∼ Shifted Exp (µ, σ)
n
iid P
(5) Xi ∼ Exp (θ) ≡ Gamma (n = 1, θ) =⇒ Xi ∼ Gamma (n, θ)
i=1
n
 k


⊥ P P
(6) Xi ∼ Gamma (ni , θ) =⇒ Xi ∼ Gamma ni , θ
i=1 i=1

1.7.6 Beta
X
(1) X ∼ Beta (m, n) =⇒ 1−X
∼ Beta2 (m, n)
X
(2) X ∼ Beta2 (m, n) =⇒ 1+X
∼ Beta (m, n)

(3) X1 ∼ Beta (n1 , n2 ) & X2 ∼ Beta (n1 + 21 , n2 ), independently =⇒ X1 X2 ∼ Beta (2n1 , 2n2 )

1.7.7 Cauchy
iid
(1) Xi ∼ C (µ, σ) =⇒ X̄n ∼ C(µ, σ)
iid iid
(2) Xi ∼ C (0, σ) =⇒ X1i ∼ C 0, σ1 =⇒ HMX ∼ C(0, σ)

˜
n
n n

⊥⊥ P P P
(3) Xi ∼ C (µi , σi ) =⇒ Xi ∼ C µi , σi
i=1 i=1 i=1

1.7.8 Others
n
iid
Xi2 ∼ χ2n
P
(1) Xi ∼ N (0, 1) =⇒
i=1

(2) U ∼ N (0, 1), V ∼ χ2n , independently =⇒ √U ∼ tn


V /n


⊥ U1 /n1
(3) Ui ∼ χ2ni =⇒ U2 /n2
∼ Fn1 ,n2
D
(4) X is symmetric about ‘0’ =⇒ X ≡ −X
CHAPTER 1. PROBABILITY 19

1.8 Transformations
1.8.1 Orthogonal
y = T (x) = An×n xn×1 → Linear Transformation. [If det(A) ̸= 0, Jacobian: J = det(A−1 )].
˜ ˜ ˜
(1) If T (x) is orthogonal transformation then AT A = In =⇒ det(A) = ±1 & |J| = 1
˜
(2) y y = xT x =⇒ |y|2 = |x|2 (length is preserved)
T
˜ ˜ ˜ ˜ ˜ ˜
n
iid
Xi2 = X T A1 X + X T A2 X, where A1 , A2 are n.n.d.
P
(3) Cochran’s theorem: Xi ∼ N (0, 1) &
i=1 ˜ ˜ ˜ ˜
matrices with ranks r1 , r2 , r1 + r2 = n

=⇒ X T A1 X ∼ χ2r1 and X T A2 X ∼ χ2r2 , independently.


˜ ˜ ˜ ˜

1.8.2 Polar
(1) For a point with Cartesian coordinates (x1 , x2 , . . . , xn ) in Rn -

x1 = r cos θ1

x2 = r sin θ1 cos θ2
x3 = r sin θ1 sin θ2 cos θ3
..
.
xn−1 = r sin θ1 · · · sin θn−2 cos θn−1
xn = r sin θ1 · · · sin θn−2 sin θn−1
n
where, r2 = x2i ,
P
0 < r < ∞ and 0 < θ1 , θ2 , . . . , θn−2 < π, 0 < θn−1 < 2π
i=1

Jacobian: |J| = rn−1 (sin θ1 )n−2 (sin θ2 )n−3 · · · sin θn−2

(2) X = R cos θ, Y = R sin θ, 0 < R < ∞, 0 < θ < 2π


iid θ ∼ U (0, 2π)
X, Y ∼ N (0, 1) ⇐⇒ D , independently.
R2 ∼ Exp (2) ≡ χ22

(3) θ ∼ U(0, 2π) ⊥⊥ R2 ∼ χ22 =⇒ R sin(θ + θ0 ) ∼ N (0, 1), θ0 is a fixed quantity

1.8.3 Special Transformations


X−a

(1) X ∼ U(a, b) =⇒ − ln b−a
∼ Exp (1)
iid
(2) X1 , X2 ∼ U(0, 1) =⇒ X1 + X2 ∼ Triangular (0, 2), |X1 − X2 | ∼ Beta (1, 2)

(3) Box-Muller Transformation:



iid Y1 = √−2 ln X1 cos(2πX2 ) iid
X1 , X2 ∼ U (0, 1) =⇒ ∼ N (0, 1)
Y2 = −2 ln X1 sin(2πX2 )
CHAPTER 1. PROBABILITY 20

(4) X ∼ Gamma (n1 , θ), Y ∼ Gamma (n2 , θ)


X
=⇒ X + Y ∼ Gamma (n1 + n2 , θ), X+Y
∼ Beta (n1 , n2 ), independently
X
=⇒ X + Y ∼ Gamma (n1 + n2 , θ), Y
∼ Beta2 (n1 , n2 ), independently
iid X X
(5) X, Y ∼ N (0, 1) =⇒ ,
Y |Y |
∼ C(0, 1)
iid 1
=⇒ (X1 − X2 ), RX1 − (1 − R)X2 ∼ DE 0, 1θ
 
(6) X1 , X2 ∼ Exp (θ), R ∼ Bernoulli 2

(7) X ∼ Beta (a, b) ⊥⊥ Y ∼ Beta (a + b, c) =⇒ XY ∼ Beta (a, b + c)

(8) X ∼ U − π2 , π2 ⇐⇒ tan X ∼ C(0, 1)




iid
(9) Dirichlet Transformation: Xi ∼ Exp (θ)
n−k+1
P
n
Xi
P i=1
=⇒ Y1 = Xi ∼ Gamma (n, θ), Yk = n−k+2
∼ Beta (n − k + 1, 1), k = 2, 3, . . . , n
i=1 P
Xi
i=1

Y1 , Y2 , . . . , Yn are independently distributed.


iid
(10) X1 , X2 , X3 , X4 ∼ N (0, 1) =⇒ X1 X2 ± X3 X4 ∼ DE (0, 1) (valid for any combination)
n
iid 2
Xi ∼ χ22n
P
(11) Xi ∼ Exp (θ) =⇒ θ
i=1

(12) X ∼ Beta (θ, 1) =⇒ − ln X ∼ Exp 1θ




 
(13) X ∼ Pareto (θ, x0 ) =⇒ ln xX0 ∼ Exp 1

θ

aX1 + bX2
(14) X1 , X2 ∼ χ22 =⇒ ∼ U(a, b) (a < b)
X1 + X2
Chapter 2

Statistics

2.1 Point Estimation


2.1.1 Minimum MSE
(1) Measures of Closeness: T : Statistic/Estimator, ψ(θ): Parametric function
Destroying the randomness, general measures of closeness are -

(a) E|T − θ|r , for some r > 0 (smaller value is better)


 
(b) P |T − θ| < ϵ , for ϵ > 0 (higher value is better)

 2
(2) Mean Square Error: MSEψ(θ) (T ) = E[T − ψ(θ)]2 = V ar(T ) + b ψ(θ), T
T can be said a ‘good estimator’ of ψ(θ) if it has a small variance.

(3) E(m′r ) = µ′r , if µ′r exists =⇒ E(X̄) = µ, provided µ = E(X1 ) exists


 n

2 1 2
= σ 2 , population variance (if exists) but E(m2 ) ̸= σ 2 = µ2
P
(4) E s = n−1
(Xi − X̄)
i=1

 n
  rn 
iid 2
pπ 1
P 2
P 2
(5) Xi ∼ N (0, σ ) =⇒ E T1 = 2
· n
|Xi | = σ = E T2 = Cn Xi .
i=1 i=1
2
n n Γ( n2 )
Xi2 ,
P P
Here, T1 , T2 are two UEs of σ based on |Xi | and respectively. (Cn = √ )
i=1 i=1 2 Γ( n+1
2
)

iid n−1 1

(6) Xi ∼ Exp (θ) =⇒ E(X̄) = θ, E nX̄
= θ

iid
(7) Xi ∼ N (µ, σ 2 ) =⇒ T ′ = n−1
n+1
· s2 has the smallest MSE in the class {bs2 : b > 0} i.e. a biased

estimator T is better than an UE s2 , in terms of MSE.

(8) X ∼ Poisson (λ) =⇒ T (X) = (−1)X is the UMVUE of e−2λ which is an absurd UE.
Note: Absurd unbiased estimator is that unbiased estimator which can take values outside the
parameter space.

21
CHAPTER 2. STATISTICS 22

iid
(9) Xi ∼ Poisson (λ) =⇒ Tα = αX̄ + (1 − α)s2 is an UE of λ for any α ∈ [0, 1]
=⇒ There may be infinitely many UEs

(10) Estimable Parametric Functions

(a) X ∼ Bin (n, p) =⇒ E[(X)r ] = (n)r pr , r = 1, 2, . . . , n


=⇒ Only polynomials of degree ≤ n are estimable.
(b) X ∼ Bernoulli (p) =⇒ Only ψ(p) = a + bp is estimable.

(c) X ∼ Poisson (λ) =⇒ E[(X)r ] = λr , r = 1, 2, . . . =⇒ e−λ is estimable but not λ1 , λ.

1
(11) X ∼ Bernoulli (θ), T1 (X) = X and T2 (X) = 2
=⇒ Between T1 and T2 none are uniformly better than the other, in terms of MSE.

iid    
(12) Xi ∼ f (x; θ), E T (X1 ) = θ, V ar T (X1 ) < ∞
=⇒ lim V ar(Sn ) = 0, where Sn is the UMVUE of θ
n→∞

(13) Best Linear Unbiased Estimator (BLUE)


T1 , T2 , . . . , Tk be UEs of ψ(θ) with known variances v1 , v2 , . . . , vk and are independent
k
1 X Ti
=⇒ BLUE of ψ(θ) : T = k
P 1 v
i=1 i
vi
i=1

2.1.2 Consistency
 
P |Tn − θ| < ϵ → 1
Tn is consistent for θ ⇐⇒  or  as n → ∞, ∀θ ∈ Ω for every ϵ > 0
P |Tn − θ| > ϵ → 0

(1) Sufficient Condition


P
E(Tn − θ)2 → 0 ⇐⇒ E(Tn ) → θ, V ar(Tn ) → 0 as n → ∞ =⇒ Tn −→ θ

n
P
(2) m′r = 1
xri −→ µ′r = E(X1r ), r = 1, 2, . . . , k (if k th order moment exists)
P
n
i=1

P
(3) If Tn −→ θ then -
P
(a) bn Tn −→ θ, if bn → 1 as n → ∞
P
(b) an + Tn −→ θ, if an → 0 as n → ∞

This also shows that, ‘unbiasedness’ and ‘consistency’ are not interrelated.

P P
(4) Invariance Property: Tn −→ θ =⇒ ψ(Tn ) −→ ψ(θ), provided ψ(·) is continuous
CHAPTER 2. STATISTICS 23

2.1.3 Sufficiency
S is sufficient for θ ⇐⇒ (X1 , X2 , . . . , Xn ) | S = s is independent of θ, ∀ s
S is sufficient for θ ⇐⇒ T | S = s is independent of θ, ∀ s, for all statistic T .

(1) Any one-to-one function of a sufficient statistic is also sufficient for a parameter.

(2) Factorization Theorem


n
Y 
f (xi ; θ) = g T (x); θ · h(x) ⇐⇒ T (X) is sufficient for θ
i=1
˜ ˜ ˜

where, g T (x); θ depends on θ and on x only through T (x) and h(x) is independent of θ.
˜ ˜ ˜ ˜
(3) Trivial Sufficient Statistic: (X1 , X2 , . . . , Xn ) and (X(1) , X(2) , . . . , X(n) ).
Sufficiency means “space reduction without losing any information”. In this aspect, the
order statistics, (X(1) , X(2) , . . . , X(n) ) is better as a sufficient statistic than the whole sample
i.e. (X1 , X2 , . . . , Xn ), with respect to data summarization.

(4) T1 , T2 are two sufficient statistic for θ =⇒ they are related


n
iid P
(5) Xi ∼ DE (µ, σ), ∃ non-trivial sufficient statistic if µ is known (say, µ0 ) and that is |Xi − µ0 |.
i=1

Minimal Sufficient Statistic


(6) T0 is a minimal sufficient of θ if,

(a) T0 is sufficient
(b) T0 is a function of every sufficient statistic
f (x;θ)
(7) Theorem: For two sample points x and y, the ratio f (˜y;θ)
is independent of θ if and only if
˜ ˜ ˜
T (x) = T (y), then T (X) is minimal sufficient for θ.
˜ ˜ ˜

2.1.4 Completeness
T is complete for θ ⇐⇒ “E[h(T )] = 0, ∀θ ∈ Ω =⇒ P [h(T ) = 0] = 1, ∀θ ∈ Ω”

Remark
If a two component statistic (T1 , T2 ) is minimal sufficient for a single component parameter θ, then
in general (T1 , T2 ) is not complete.
It is possible to find h1 (T1 ) and h2 (T2 ) such that,

E[h1 (T1 )] = ψ(θ) = E[h2 (T2 )], ∀θ

=⇒ E[h(T1 , T2 )] = 0, ∀θ where, h(T1 , T2 ) = h1 (T1 ) − h2 (T2 ) ̸= 0


=⇒ (T1 , T2 ) is not complete.
CHAPTER 2. STATISTICS 24

2.1.5 Exponential Family


One Parameter
An one parameter family of PDFs or PMFs, {f (x; θ) : θ ∈ Ω} that can be expressed in the form -
 
f (x; θ) = exp T (x)u(θ) + v(θ) + w(x) , x ∈ S

with the following regularity conditions -


C1 : The support, S = {x : f (x; θ) > 0} is independent of θ
C2 : The parameter space, Ω is an open interval in R i.e. Ω = {θ : a < θ < b}
C3 : {1, T (x)} and {1, u(θ)} are linearly independent i.e. T (x) and u(θ) are non-constant functions
is called One Parameter Exponential Family (OPEF)

K Parameter
A K-parameter family of PDFs or PMFs, {f (x; θ) : θ ∈ Ω ⊆ Rk } satisfying the form -
" k ˜ ˜ #
X
f (x; θ) = exp Ti (x)ui (θ) + v(θ) + w(x) , x ∈ S
˜ i=1
˜ ˜

with the following regularity conditions -


C1 : The support, S = {x : f (x; θ) > 0} is independent of θ
˜ ˜
C2 : The parameter space, Ω ⊆ Rk is an open rectangle in Rk i.e. ai < θi < bi , i = 1(1)k
C3 : {1, T1 (x), . . . , Tk (x)} and {1, u1 (θ), . . . , uk (θ)} are linearly independent
˜ ˜
is called K-parameter Exponential Family

Theorem
n
iid P
(a) X ∼ f (x; θ) ∈ OPEF =⇒ T (Xi ) is complete and sufficient for the family.
˜ i=1
 n n

iid P P
(b) X ∼ f (x; θ) ∈ K-parameter Exponential Family =⇒ T1 (Xi ), . . . , Tk (Xi ) is complete
˜ ˜ i=1 i=1
and sufficient for the family.

Distributions in Exponential Family



a(x) θx
a(x) θx
P
(a) f (x; θ) = g(θ)
, x = 0, 1, 2, . . . ; 0 < θ < ρ, a(x) ≥ 0, g(θ) = (Power Series)
x=0
=⇒ Binomial (n known), Poisson, Negative Binomial (n known) are in OPEF.
(b) Normal, Exponential, Gamma, Beta, Pareto (x0 known) are in the Exponential family.
(c) Uniform, Cauchy, Laplace, Shifted Exponential, {N (θ, θ2 ) : θ ̸= 0}, {N (θ, θ) : θ > 0} are not
in the Exponential Family.
The last two families are identified by Lehmann as ‘Curved Exponential Family’.
CHAPTER 2. STATISTICS 25

2.1.6 Methods of finding UMVUE


Theorem 2.1.6.1 (Necessary & Sufficient n Condition for UMVUE) Let X has a distribution
o
   
from {f (x; θ) : θ ∈ Ω}. Define, Uψ = T (X) : Eθ T (X) = ψ(θ), V arθ T (X) < ∞, ∀θ ∈ Ω and
n     o
U0 = u(X) : Eθ T (X) = 0, V arθ u(X) < ∞, ∀θ ∈ Ω . Then, T0 ∈ Uψ is UMVUE of θ if and
only if Covθ (T0 , u) = 0, ∀θ ∈ Ω, ∀u ∈ U0

Results
ˆ UMVUE if exists, is unique
k k
ˆ Ti is UMVUE of ψ(θ) =⇒
P P
ai Ti is UMVUE of ai ψi (θ)
i=1 i=1

ˆ T is UMVUE =⇒ T k is UMVUE =⇒ any polynomial function, f (T ) is UMVUE of their


expectations

Theorem 2.1.6.2 (Rao-Blackwell) Let X has a distribution from {f (x; θ) : θ ∈ Ω} and h be a


statistic from Uψ = {h : E(h) = ψ(θ), V ar(h) < ∞, ∀θ ∈ Ω}. Let, T be a sufficient statistic for θ.
Then-

(a) E(h | T ) is an UE of ψ(θ)


 
(b) V ar E(h | T ) ≤ V ar(h), ∀θ ∈ Ω

Implication: UMVUE is necessarily a function of minimal sufficient statistic

Theorem 2.1.6.3 (Lehmann-Scheffe) Let X has a distribution from {f (x; θ) : θ ∈ Ω} and T be


a complete sufficient statistic for θ. Then-
 
(a) If E h(T ) = ψ(θ), then UMVUE of ψ(θ) is the unique UE, h(T )

(b) If h∗ is an UE of ψ(θ), then E(h∗ | T ) is the UMVUE of ψ(θ)

UMVUE of Different Families


Binomial
n
iid P
Xi ∼ Bernoulli (p) =⇒ Complete Sufficient: T = Xi
i=1

T
(1) p = E(X1 ) : n

T (n−T )
(2) p(1 − p) = V ar(X1 ) : n(n−1)

(T )r T (T −1)···(T −r+1)
(3) pr : (n)r
= n(n−1)···(n−r+1)
, r = 1, 2, . . . , n

n
iid P
Xi ∼ Bin (n, p) =⇒ Complete Sufficient: T = Xi
i=1
X̄n T
→ p: = 2
n n
CHAPTER 2. STATISTICS 26

Poisson
n
iid P
Xi ∼ Poisson (λ) =⇒ Complete Sufficient: T = Xi
i=1

(T )r
(1) λr : nr
, r = 1, 2, . . .
k T (n−1)T −k
(2) e−k λk! = P (X1 = k) :

k nT

k T
(3) e−kλ = P (X1 = 0, X2 = 0, . . . , Xk = 0) : 1 −

n
, 1≤k<n

Geometric
n
iid P
Xi ∼ Geo (p) =⇒ Complete Sufficient: T = Xi
i=1
n−1
→ p = P (X1 = 0) : n−1+T

Uniform
iid
(1) Discrete: Xi ∼ U{1, 2, . . . , N } =⇒ Complete Sufficient: T = X(n)
T n+1 −(T −1)n+1
→N : T n −(T −1)n

iid
(2) Continuous: Xi ∼ U(0, θ) =⇒ Complete Sufficient: T = X(n)
n ′ o
→ ψ(θ) : T ψn(T ) + ψ(T ) ψ(θ) = θr : n+r
  r
n
T
iid 
Also if, Xi ∼ U(θ1 , θ2 ) =⇒ Complete Sufficient: T = X(1) , X(n)
nX(1) −X(n) nX(n) −X(1)
→ θ1 : n−1
θ2 : n−1

Gamma
n
iid P
Xi ∼ Exp (θ) =⇒ Complete Sufficient: T = Xi
i=1

T
(1) θ = E(X1 ) : n
1 n−1
(2) θ
: T
k k n−1
(3) P (X1 > k) = e− θ : 1 −

T
, if k < T
k (n−1) k n−2
(4) f (k; θ) = 1θ e− θ :

T
1− T
, if k < T
 n n

iid P P
Xi ∼ Gamma (p, θ) =⇒ Complete Sufficient: ln Xi , Xi (mean = p θ)
i=1 i=1

Γ(np)
For known p, θr : T r , r > −np
Γ(np + r)
iid σ0
Xi ∼ Shifted Exp(θ, σ0 ) =⇒ Complete Sufficient: X(1) → θ : X(1) −
n
n Γ(n)
iid
|Xi − µ0 | → σ r : T r , r > −n
P
Xi ∼ DE(µ0 , σ) =⇒ Complete Sufficient: T =
i=1 Γ(n + r)
CHAPTER 2. STATISTICS 27

Beta
 n n

iid P P
Xi ∼ Beta (θ1 , θ2 ) =⇒ Complete Sufficient: ln Xi , ln(1 − Xi )
i=1 i=1
n−1
For θ2 = 1, UMVUE of θ1 is n
P
− ln Xi
i=1

Normal
iid
n σ02
Xi ∼ N (µ, σ02 ) =⇒ Complete Sufficient: µ2 : X̄ 2 −
P
Xi or X̄ → µ : X̄
i=1 n
n
iid
Xi ∼ N (µ0 , σ 2 ) =⇒ Complete Sufficient: (Xi − µ0 )2 or S02 → σ r : S0r Kn,r , r > −n
P
i=1

 n n

iid 
Xi ∼ N (µ, σ 2 ) =⇒ Complete Sufficient: Xi2 or X̄, S 2
P P
Xi ,
i=1 i=1

(1) µ = E(X1 ) : X̄
" r
#
(n − 1) 2 Γ( n−1
2
)
(2) σ 2 : S 2 , σ r : S r Kn−1,r , r > −(n − 1) Kn−1,r = r n−1+r
2 2 Γ( 2 )
µ
(3) : X̄ · Kn−1,−r S −r , r < (n − 1)
σr
(4) pth quantile of X1 = ξp = µ + σ Φ−1 (p) : X̄ + Kn−1,1 S Φ−1 (p)
h i
iid
Xi ∼ N (θ, 1) ϕ(x; µ, σ 2 ) : PDF of N (µ, σ 2 )
 r 
n
(1) Φ(k − θ) = P (X1 ≤ k) : Φ (k − X̄)
n−1
(2) ϕ k; θ, 1) : ϕ(k; X̄, n−1

n

h2
(3) ehθ : ehX̄− 2n

Others
n n
iid Q P
Xi ∼ Pareto(θ, x0 ) =⇒ Complete Sufficient: Xi or ln Xi (x0 known)
i=1 i=1
P  Xi  r
n 
1 Γ(n)
→ r : ln x0 , r > −n
θ Γ(n + r) i=1
n−1
Special case: r = −1 =⇒ θ : P
n  
ln Xi
x0
i=1

iid
Xi ∼ Pareto(θ0 , α) =⇒ Complete Sufficient: X(1) (θ0 known)
 
r r r
 
→α : 1− X(1) if r < nθ0
nθ0
CHAPTER 2. STATISTICS 28

2.1.7 Cramer-Rao Inequality


Let X has a distribution from {f (x; θ) : θ ∈ Ω} satisfying the following regularity conditions -
(i) The parameter space, Ω is an open interval in R i.e. Ω = {θ : a < θ < b}

(ii) The support, S = {x : f (x; θ) > 0} is independent of θ



 
(iii) For each x ∈ S, ∂θ ln f (x; θ) exists and finite
P R
(iv) The identity “ f (x; θ) = 1” or “ f (x; θ)dx = 1” can be differentiated under the summation
x∈S S
or integral sign.
n     o
(v) T ∈ Uψ = T (X) : Eθ T (X) = ψ(θ), V arθ T (X) < ∞, ∀θ ∈ Ω is an UE of ψ(θ) such that
 
the derivative of ψ(θ) = E T (X) with respect to θ can be evaluated by differentiating under
the summation or integral sign.
′ (θ)}2 2
Then, V ar T (X) ≥ {ψI(θ)
  ∂
where I(θ) = E ∂θ ln f (x; θ) >0

Equality Case
‘=’ holds in CR Inequality iff -
∂  I(θ)
ln f (x; θ) = ± ′ {T − ψ(θ)} a.e. . . . . . . (∗)
∂θ ψ (θ)
⇐⇒ the family {f (x; θ) : θ ∈ Ω} belongs to OPEF
→ (∗) is the necessary and sufficient condition for attaining CRLB by an UE, T (X) of ψ(θ).

Remarks
 
(1) Even in OPEF, the only parametric function for which T (X) attains CRLB, is that E T (X)
′ (θ)
(2) If MVBUE T (X) of ψ(θ) exists, then it is given by, T (X) = ψ(θ) ± ψI(θ) ∂

· ∂θ ln f (X; θ)
MVBUE is also the UMVUE but UMVUE may not be MVBUE always -
ˆ Non-regular case: one of the regularity conditions does not hold, eg. {U(0, θ) : θ > 0}
ˆ If all the regularity conditions hold but CRLB is not attainable, then there may exist
UMVUE but that is not the MVBUE
(3) Fisher’s Information
∂ 2 h 2 i

(a) I(θ) = E ∂θ ln f (X; θ) = E − ∂θ2 ln f (X; θ)

(b) IX (θ) = n · IX1 (θ), if the regularity conditions hold


˜
iid
(c) X ∼ {f (x; θ) : θ ∈ Ω} =⇒ for any statistic T (X), IT (X) (θ) ≤ IX (θ)
˜ ˜ ˜ ˜
‘=’ holds if and only if T (X) is sufficient
˜ ∂ 2
E(T ) {ψ ′ (θ) + b′ (θ)}2
(4) Lower bound for the MSE of any estimator: MSEψ(θ) (T ) ≥ ∂θ =
I(θ) I(θ)
(5) {C(θ, 1) : θ ∈ R} is a regular family as the CR inequality holds, but CRLB is not attainable
CHAPTER 2. STATISTICS 29

2.1.8 Methods of Estimation


Method of Moments
If the sample drawn is a good representation of the population, then this method is quite reasonable.
Equate ‘k’ sample moments m′r with corresponding population moments µ′r and solve for k unknowns
for a k-parameter family.

Method of Least Squares


Here we minimize the sum of squares of errors with respect to the parameter (θ1 , θ2 , . . . , θk )

Model: yi = E(Y | X = xi ) + zi

Assumptions: Conditional distribution of Y | X = xi is homoscedastic.

Method of Maximum Likelihood


(1) Bernoulli (p)

(a) p ∈ (0, 1) =⇒ No MLE of p when x = 0 or x = 1, else X̄


˜ ˜ ˜ ˜
(b) Ω = p : p ∈ {Q′ ∩ [0, 1]} =⇒ No MLE of p ∈ Ω


iid
(2) Xi ∼ U(0, θ), θ > 0 =⇒ θ̂ = X(n)
 
iid
(3) Xi ∼ U(α, β), α < β =⇒ θˆ = α̂, β̂ = X(1) , X(n)

˜
(4) MLE is not unique
iid
Xi ∼ U(θ − k1 , θ + k2 ) =⇒ θ̂ = α(X(n) − k2 ) + (1 − α)(X(1) + k1 ), α ∈ [0, 1]

iid 
(5) Xi ∼ U(−θ, θ), θ > 0 =⇒ θ̂ = max {|Xi |} = max −X(1) , X(n)
i=1(1)n

   n

iid 2 2 1
P 2
(6) Xi ∼ N (µ, σ ) =⇒ µ̂, σ = X̄, n (Xi − X̄)
b
i=1

iid X̄
(7) Xi ∼ Gamma (p0 , θ) =⇒ θ̂ = p0
(p0 known)
iid n
(8) Xi ∼ Beta (θ, 1) =⇒ θ̂ = n
P
− ln Xi
i=1

 
 
iid
(9) Xi ∼ Pareto (x0 , θ) =⇒ xˆ0 , θ̂ = X(1) , n
n 
P Xi
ln X(1)
i=1

 n

iid 1
P
(10) Xi ∼ DE (µ, σ) =⇒ (µ̂, σ̂) = X̃, n |Xi − X̃|
i=1

iid 
(11) Xi ∼ Shifted Exp (µ, σ) =⇒ (µ̂, σ̂) = X(1) , X̄ − X(1)
In particular if µ = σ > 0, then µ̂ = X(1)
CHAPTER 2. STATISTICS 30

iid 1 3

(12) Truncated parameter: Xi ∼ Bernoulli (p), p ∈ ,
4 4
. Here, the MLE of p is -
1 1


 , if X̄ <


 4 4
1 3

p̂(X) = X̄, if ≤ X̄ ≤
˜ 
 4 4
 3 , if 3


X̄ >

4 4
1 3
It is better than the UMVUE, X̄ of p ∈ 4 , 4 , in terms of variability

Properties
(13) MLE, if exists is a function of (minimal) sufficient statistic
(14) Under the regularity conditions of CR inequality MVBUE exists, then that is the MLE
(15) Invariance property: θ̂ is the MLE of θ =⇒ h(θ̂) is the MLE of h(θ) for any function h(·)
(16) For large n, the bias of MLE become insignificant
(17) Under normality, LSE ≡ MLE.
(18) Asymptotic property
(a) Under certain regularity conditions, the MLE θ̂ of θ is consistent and also
√ 
   
a 1 1 
a 1
θ̂ ∼ N θ, = or n θ̂ − θ ∼ N 0,
nI1 (θ) In (θ) I1 (θ)

(b) In OPEF, let θ̂ is the MLE of θ then -


!
√  √ n {ψ ′ (θ)}2
 

a 1 o
a
n θ̂ − θ ∼ N 0, =⇒ n ψ(θ̂) − ψ(θ) ∼ N 0,
I1 (θ) I1 (θ)

2.2 Testing of Hypothesis


2.2.1 Tests of Significance
Univariate Normal
(1) H0 : µ = µ0 against H1 : µ ̸= µ0
n √ o √ 
(a) σ = σ0 (known) → ω = x : n(x̄−µ σ0
0)
> τ α
2
n(x̄−µ0 )
σ0
> τα if H1 : µ > µ0
˜
n √ o n
(b) σ unknown → ω = x : n(x̄−µ 0) 2 1
(Xi − X̄)2
P
s
> t α
2
; n−1 , s = n−1
˜ i=1

(2) H0 : σ = σ0 against H1 : σ ̸= σ0
n o
ns20 2 2
(a) µ = µ0 (known), Z = σ02
→ ω = Zobs > χ α ; n or Zobs < χ1− α ; n
2 2
n o
(n−1)s2
(b) µ unknown, Z = σ02
→ ω = Zobs > χ2α ; n−1 or Zobs < χ21− α ; n−1
2 2
CHAPTER 2. STATISTICS 31

Two Independent Univariate Normal


(1) H0 : µ1 − µ2 = ξ0 (known) against H0 : µ1 − µ2 ̸= ξ0

(a) σ1 , σ2 are known Z = X̄r1 −σ2X̄2 −ξ



σ 2
0
→ ω = |Zobs | > τ α2
1+ 2
n1 n2

1 −X̄2 −ξ0
X̄q
 (n1 −1)s21 +(n2 −1)s22
(b) σ1 = σ2 = σ (unknown), Z = → ω = |Zobs | > t α2 ; n1 +n2 −2 , s2 = n1 +n2 −2
s n1 + n1
1 2

σ1 σ1
(2) H0 : σ2
= ξ0 (known) against H1 : σ2
̸= ξ0
n o
s210 1 1
(a) µ1 , µ2 are known, F = · → ω = Fobs > F 2 ; n1 ,n2 or Fobs > F 2 ; n2 ,n1
s220 ξ02
α α

n o
s21 1 1
(b) µ1 , µ2 are unknown F = s2 · ξ2 → ω = Fobs > F 2 ; n1 −1,n2 −1 or Fobs > F 2 ; n2 −1,n1 −1
α α
2 0

Bivariate Normal (Correlated Case)


n √ o
n(x̄−ȳ−ξ0 )
(1) H0 : µ1 − µ2 = ξ0 (known) → ω = (x, y) : sxy
> t 2 ; n−1 , s2xy = s2x + s2y + 2rsx sy
α
˜ ˜
n √ o
(2) H0 : ρ = 0 → ω = r√1−r n−2
2 > t α2 ; n−2 [r: sample correlation coefficient of (x, y)]
˜ ˜
 √ 
U = X + ξ0 Y
(3) H0 : σσ21 = ξ0 → ω = r√ uv n−2
> t α2 ; n−2
2
1−ruv V = X − ξ0 Y

Binomial Proportion
(I) Single Proportion - H0 : p = p0 , observed value: x0

(a) H1 : p > p0 , p-value = P1 = PH0 (X ≥ x0 )


(b) H1 : p < p0 , p-value = P2 = PH0 (X ≤ x0 )
(c) H1 : p ̸= p0 , p-value = P3 = 2 · min{P1 , P2 } (Reject H0 if p-value ≤ α)

(II) Two Proportions - H0 : p1 = p2 = p, observed value of X1 : x10 and X1 + X2 : x0

(a) H1 : p1 > p2 , p-value = P1 = PH0 (X1 ≥ x10 | X1 + X2 = x0 )


(b) H1 : p1 < p2 , p-value = P2 = PH0 (X1 ≤ x10 | X1 + X2 = x0 )
(c) H1 : p1 ̸= p2 , p-value = P3 = 2 · min{P1 , P2 } (Reject H0 if p-value ≤ α)

Poisson Mean
n
P
(I) Single Population - H0 : λ = λ0 , observed value of S = X i : s0
i=1

(a) H1 : λ > λ0 , p-value = P1 = PH0 (S ≥ s0 )


(b) H1 : λ < λ0 , p-value = P2 = PH0 (S ≤ s0 )
(c) H1 : λ ̸= λ0 , p-value = P3 = 2 · min{P1 , P2 } (Reject H0 if p-value ≤ α)
CHAPTER 2. STATISTICS 32

n1
P
(II) Two Populations - H0 : λ1 = λ2 = λ, observed value of S1 = X1i : s10 and S1 + S2 : s0
i=1

(a) H1 : λ > λ0 , p-value = P1 = PH0 (S1 ≥ s10 | S1 + S2 = s0 )


(b) H1 : λ < λ0 , p-value = P2 = PH0 (S1 ≤ s10 | S1 + S2 = s0 )
(c) H1 : λ ̸= λ0 , p-value = P3 = 2 · min{P1 , P2 } (Reject H0 if p-value ≤ α)

2.3 Interval Estimation



T : sufficient statistic and θ1 (T ), θ2 (T ) is a confidence interval with confidence coefficient (1 − α)
  
=⇒ P θ1 (T ), θ2 (T ) ∋ ψ(θ) = 1 − α ∀θ ∈ Ω

2.3.1 Methods of finding C.I.


Find a function ϕ(T, θ), whose sampling distribution is completely specified. This is the pivot.
Then find c1 , c2 based on the restriction: Pθ [c1 < ϕ(T, θ) < c2 ] = 1 − α

Note
For a parameter θ, the method of guessing θ is known as estimation and an interval estimate of θ
is known as confidence interval for θ.
For a R.V. Y , a method of guessing Y is known as prediction and an interval prediction of Y is
known as prediction limits.
 q 
iid
Xi ∼ N (µ, σ 2 ), i = 1(1)n =⇒ Prediction limits for Xn+1 : X̄ ∓ t α2 ; n−1 n+1
n
s

2.3.2 Wilk’s Optimum Criteria



Definition: A (1 − α) level confidence interval θ∗ (T ), θ (T ) of θ ∈ Ω, is said to be shortest length


confidence interval, in the class of all level (1 − α) confidence intervals based on a pivot ψ(T, θ), if

Eθ θ∗ (T ) − θ (T ) ≤ Eθ θ(T ) − θ(T ) , ∀θ ∈ Ω
   


whatever the other (1 − α) level confidence interval θ(T ), θ(T ) based on ψ(T, θ).

2.3.3 Test Inversion Method


Let A(θ0 ) be the “Acceptance Region” of a size ‘α’ test of H0 : θ = θ0 . Define,

I(x) = {θ ∈ Ω : A(θ) ∋ x} , x ∈ X
˜ ˜
then I(x) is a confidence interval for θ at confidence coefficient (1 − α).
˜
CHAPTER 2. STATISTICS 33

List of Confidence Intervals


 
iid X(n) X(n)
(1) Xi ∼ U(0, θ) : P ivot = θ
=⇒ X(n) , √

iid σ0
(2) Xi ∼ Shifted Exp (µ, σ0 ) : P ivot = σn0 [X(1) − µ] =⇒ X(1) +
 
n
ln α, X(1) (finite length)
Infinite Length: −∞, X(1) + σn0 ln(1 − α)
 
(σ0 known)
iid √    
(3) Xi ∼ N (µ, σ 2 ) : P ivot = n X̄−µS
=⇒ X̄ ∓ t α
2
S

; n−1 n

!
n 2T 2T
iid 2
P
(4) Xi ∼ Exp (θ) : P ivot = θ
Xi =⇒ 2
, 2
i=1 χ α ;2n χ1− α ;2n
2 2
!
2n 2nX(1) 2nX(1)
Based on X(1) , P ivot = X(1) =⇒ ,
θ
χ2α ;2 χ21− α ;2
2 2

2.4 Large Sample Theory


2.4.1 Modes of Convergence
(I) Convergence in Distribution
Definition: A sequence {Xn } of random variables with the corresponding sequence Fn (x) of D.F.s
is said to converge to a random variable X with D.F. F (x), if

lim Fn (x) = F (x), at every continuity point of F(x)


n→∞

Results
iid  D D
(1) Xn ∼ U(0, θ) =⇒ n θ − X(n) −→ Exp (θ) ←− nX(1)
iid  D
(2) Xn ∼ Shifted Exp (0, θ) =⇒ n X(n) − θ −→ Exp (1)
iid D
(3) Xn ∼ N (µ, σ 2 ) =⇒ X̄ −→ µ
D
(4) Xn ∼ Geo (pn = nθ ) =⇒ Xn
n
−→ Exp ( 1θ )
D
(5) X ∼ NB (n, p) =⇒ 2pX −→ χ22n as p → 0
Xn D
(6) Xn ∼ Gamma (n, β) =⇒ n
−→ β

Limiting MGF
D
(7) MGF → Xn : Mn (t), X : M (t), E(Xn ) exists ∀ n and Xn −→ X
If lim Mn (t), lim E(Xn ) is finite then MN (t) → M (t), E(Xn ) → E(X) as n → ∞
n→∞ n→∞

(8) Theorem: Let, {Fn } be a sequence of D.F.s with corresponding M.G.F.s {Mn } and suppose
that Mn (t) exists for |t| ≤ t0 , ∀n. If there exists a D.F. F with corresponding M.G.F. M ,
which exists for |t| ≤ t1 < t0 , such that Mn (t) → M (t) as n → ∞ for every t ∈ [−t1 , t1 ], then
W
Fn −→ F
CHAPTER 2. STATISTICS 34

(II) Convergence in Probability


Definition: Let, {Xn } be a sequence of R.V.s defined on the probability space (Ω, A, P). Then we
say that {Xn } converges in probability to a R.V. X, defined on (Ω, A, P), if for every ϵ > 0,

lim P [|Xn − X| < ϵ] = 1 or lim P [|Xn − X| > ϵ] = 0


n→∞ n→∞

Sufficient Condition: If {Xn } is a sequence of R.V.s such that E(Xn ) → C and V ar(Xn ) → 0 as
P
n → ∞ or E(Xn − C)2 → 0 as n → ∞, then Xn −→ C.
Counter example:

1

1 −
 ,x = k
P (Xn = x) = n =⇒ E(Xn − k)2 ̸→ 0 as n → ∞ but Xn −→ k
P
1
,x = k + n


n

Results
 n
 n1
iid Q P 1
(1) Xi ∼ U(0, 1) =⇒ Xi −→ e
i=1

P P
(2) Xn −→ X, lim an = a ∈ R =⇒ an Xn −→ aX
n→∞

D P
(3) Xn −→ X, lim an = ∞, an > 0 ∀n =⇒ a−1
n Xn −→ 0
n→∞

P P
(4) Limit Theorems: If Xn −→ X, Yn −→ Y , then,
P
(a) Xn ± Yn −→ X ± Y
P
(b) Xn Yn −→ XY
P
(c) g(Xn ) −→ g(X), if g(·) is continuous (Invariance Property)
Xn P X
(d) −→ , provided P (Yn = 0) = 0 = P (Y = 0)
Yn Y
Chapter 3

Mathematics

3.1 Basics
3.1.1 Combinatorial Analysis
(1) For a population with n elements, the number of samples of size r is -
(
nr , WR
ordered sample = n
Pr or (n)r , WOR
 
n n
unordered sample = Cr or , WOR
r

(2) Partition of population - The number of ways in which a population of n elements can be
divided into k ordered parts of which ith part consists of ri members, i = 1, 2, . . . , k is -
    
n n − r1 n − r1 − r2 − · · · − rk−1
··· , r1 + r2 + · · · + rk = n
r1 r2 rk
 
n! n
= =
r1 !r2 ! · · · rk ! r1 r2 · · · rk

(a) The number of different distributions of r identical balls into n cells i.e. the number of
different solutions (r1 , r2 , . . . , rn ) of the equation:
   
n+r−1 n+r−1
r1 + r2 + · · · + rn = r where, ri ≥ 0, are integers, is =
r n−1

(b) The number of different distributions of r indistinguishable balls into n cells in which no
cell remains empty i.e. the number of different solutions (r1 , r2 , ..., rn ) of the equation:
 
r−1
r1 + r2 + · · · + rn = r where, ri ≥ 1, are integers, is
n−1

n 2 n 2 n 2 2n
   
(3) (a) 0
+ 1
+ · · · + n
= n
n
 n n
 n  n
 n 2n

(b) 0 m
+ 1 m+1 + · · · + n−m n
= n−m

35
CHAPTER 3. MATHEMATICS 36

3.1.2 Difference Equation


{xn } is a sequence, xn = f (xn−1 , . . . , x2 , x1 ) is difference equation
A.P.: xn = xn−1 + d
G.P.: xn = r xn−1
xn = a1 xn−1 + · · · + ap xn−p is a linear difference equation of order p.

First Order Linear Difference Equation


xn = axn−1 + b, n ≥ 1
=⇒ xn − c = (x1 − c)an−1 , b = c(1 − a)

Second Order Linear Difference Equation


xn = axn−1 + bxn−2 , n ≥ 2
Characteristic Equation: u2 − au − b = 0 with roots u1 , u2
Case I: u1 ̸= u2 =⇒ xn = Aun1 + Bun2
Case II: u1 = u2 = u =⇒ xn = (A + Bn)un
CHAPTER 3. MATHEMATICS 37

3.2 Linear Algebra


3.2.1 Vectors & Vector Spaces
r n n
a2i and
P P
(1) Length of a = ai = 1.a where, 1 = {1, 1, . . . , 1}
˜ i=1 ˜˜ i=1 ˜
rn
p P
(2) Distance between a and b = |b − a| = (b − a).(b − a) = (bi − ai )2
˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ i=1

a.b
(3) Angle (θ) between two non-null vectors a and b is given by, cos θ = ˜ ˜
˜ ˜ |a| |b|
˜ ˜
2 2 2
(4) Cauchy-Schwarz: (a.b) ≤ |a| |b| ‘=’ holds iff b = λa for some λ
˜˜ ˜ ˜ ˜ ˜
(5) Triangular Inequality: |a − b| + |b − c| ≥ |a − c|
˜ ˜ ˜ ˜ ˜ ˜

Vector Spaces
A vector space V over a field F is a non-empty collection of elements, satisfying the following
axioms:

(a) a, b ∈ V =⇒ a + b ∈ V [closed under vector addition]

(b) a ∈ V =⇒ αa ∈ V, ∀ α ∈ F [closed under scalar multiplication]

Note: Every vector space must include the null vector (0).
˜
Useful Results
Pr
(1) “ λi ai = 0 =⇒ λ = 0” ⇐⇒ {a1 , a2 , . . . , ar } are Linearly Independent.
i=1 ˜ ˜ ˜ ˜ ˜ ˜ ˜
Pr
The set is L.D. iff ∃ λ ̸= 0 for which λi ai = 0
˜ ˜ i=1 ˜ ˜
(2) A set of vectors is LIN =⇒ any subset is LIN
A set of vectors is LD =⇒ any superset is LD

(3) Basis Vector: (i) Spans V (ii) LIN

(a) {1, t, t2 , . . . , tn } : polynomial of degree ≤ n


(b) {e1 , e2 , . . . , en } : n dimensional vector space
˜ ˜ ˜
(4) The representation of every vector in terms of basis is unique.
Pr
(5) Replacement theorem: {a1 , a2 , . . . , ar } is a basis vector and b = λi ai
˜ ˜ ˜ ˜ i=1 ˜
If λi ̸= 0 then replace ai by b =⇒ {a1 , a2 , . . . , ai−1 , b, ai+1 , . . . , ar } is also a basis vector.
˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜
(6) (a) Any set of basis of basis vector for Vn space contains exactly n vectors
(b) Any n LIN vectors from Vn form a basis for Vn
(c) Any set of (n + 1) vectors from Vn is LD
CHAPTER 3. MATHEMATICS 38

(7) Extension theorem: A set of m(< n) LIN vectors from Vn can be extended to a basis of Vn

(8) Dimension: Number of vectors in a basis or maximum number of LIN vectors in the space
Dimension of a subspace: {Total no. of vectors} - {No. of LIN restrictions}

(9) V = {0} has no basis =⇒ dim(V ) is undefined [We assume dim(V ) = 0]


˜
(10) Consider two subspaces S, T of a vector space V over the field F -
p
(a) S ∩ T is also a subspace and dim(S ∩ T ) ≤ min{dim(S), dim(T )} ≤ dim(S) dim(T )
(b) S + T = {a + b : a ∈ S, b ∈ T } =⇒ dim(S + T ) = dim(S) + dim(T ) − dim(S ∩ T )
(c) S ⊆ T =⇒ dim(S) ≤ dim(T ) and dim(S) = dim(T ) ⇐⇒ S = T
In general, dim(S) = dim(T ) ⇏ S = T

Orthogonal Vectors
(11) a, b ∈ E n are orthogonal (⊥) if a.b = 0 [0 is orthogonal to every vector]
˜ ˜ ˜˜ ˜
(12) The set of vectors {a1 , a2 , . . . , an } is mutually orthogonal if ai .aj = 0, ∀ i ̸= j
˜ ˜ ˜ ˜ ˜
(13) If a mutually orthogonal set includes the null vector then it becomes LD, else LIN

(14) E n ∋ a ⊥ Sn ⊆ E n ⇐⇒ a is orthogonal to a basis of Sn


˜ ˜ ˜
(15) Ortho complement of Sn = O(Sn ) : Collection of all vectors in E n which are orthogonal to Sn

(16) (a) Sn ∩ O(Sn ) = {0} (b) Sn + O(Sn ) = E n (c) O{O(Sn )} = Sn


˜
where, Sn is a subspace of E n

(17) S ⊕ T if, S + T = {x + y : x ∈ S, y ∈ T }
˜ ˜ ˜ ˜
(18) S ⊕ T ⇐⇒ “S ∩ T = {0}” ⇐⇒ “If x ∈ S, y ∈ T, x, y ̸= 0 then {x, y} is LIN”
˜ ˜ ˜ ˜ ˜ ˜
⇐⇒ “x + y = 0 =⇒ x = y = 0 for x ∈ S, y ˜∈ T ” ⇐⇒ “dim(S + T ˜) = dim(S) + dim(T )”
˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜
(19) S, T ∈ V is said to be complement if S ⊕ T = V

(20) Mn×n (R) : Vector space of all (n × n) real matrices


S : (n × n) Symmetric matrices, T : (n × n) Skew-symmetric matrices
n(n + 1) n(n − 1)
=⇒ S, T are subspaces of Mn×n and S ⊕ T = Mn×n , dim(S) = , dim(T ) =
2 2

3.2.2 Matrices
(1) Consider a matrix Am×n

(a) Row Space: R(A) = {x′ A : x ∈ Rm }


˜ ˜
(b) Column Space: C(A) = {Ax : x ∈ Rn }
˜ ˜
(c) Null Space: N (A) = {x ∈ Rn : Ax = 0}
˜ ˜ ˜
(d) Left Null Space: N ′ (A) = {x ∈ Rm : x′ A = 0′ }
˜ ˜ ˜
(2) N (A) = O{R(A)} =⇒ dim N (A) = n − dim R(A) = nullity of A
CHAPTER 3. MATHEMATICS 39

(3) Am×n , B n×p ∋ AB = O =⇒ C(B) ⊆ N (A) =⇒ r(A) + r(B) ≤ n

(4) An×n ∋ A2 = A =⇒ C(In − A) = N (A)

Rank
(5) r (Am×n ) ≤ min{m, n}

(6) r(AB) ≤ min{r(A), r(B)} if AB is defined

(7) r(AB) = r(A), if det B ̸= 0

(8) A2 = A ⇐⇒ r(A) + r(In − A) = n

(9) r(A + B) ≤ r(A) + r(B)

(10) r(A) = r(A′ ) = r(A′ A) = r(AA′ )


r
P
(11) r(A) = r =⇒ A = Mk with r(Mk ) = 1, k = 1, 2, . . . , r
k=1

(12) r(AB − I) ≤ r(A − I) + r(B − I)

(13) Am×n , B s×n ∋ AB ′ = O =⇒ r(A′ A + B ′ B) = r(A) + r(B)

(14) A2 = A, B 2 = B, det(I − A − B) ̸= 0 =⇒ r(A) = r(B)


 
A B
(15) r ≥ r(A) + r(C)
O C

(16) Sylvester Inequality: r(AB) ≥ r(A) + r(B) − n


=⇒ r(A) + r(B) − n ≤ r(AB) ≤ min{r(A), r(B)}

(17) r(A + B) ≤ r(A) + r(B) ≤ r(AB) + n, provided AB, (A + B) is defined

(18) r(AB) = r(B) − dim {N (A) ∩ C(B)}

(19) r(x′ x) = r(x′ y) = r(xy ′ ) = 1, where x, y =


̸ 0 ∈ Rn
˜˜ ˜˜ ˜˜ ˜ ˜ ˜
Other Results
(20) Sum of all entries in a matrix A : 1′ A1
˜ ˜
2
(21) A = A =⇒ (In − A) is also idempotent
n
(22) C m×r = Am×n B n×r =⇒ C = (cij ) =
P
aik bkj
k=1

(23) tr(A + B) = tr(A) + tr(B)


tr(AB) = tr(BA)
CHAPTER 3. MATHEMATICS 40

3.2.3 Determinants
1 a a2
(1) 1 b b2 = (a − b)(b − c)(c − a)
1 c c2

a b ··· b
b a ··· b 
(2) .. .. . . .. = a + n − 1 b (a − b)n−1
. . . .
b b ··· a n×n

a+b b+c c+a a b c 1 0 1 a b c


(3) b + c c + a a + b = b c a × 1 1 0 = 2 b c a
c+a a+b b+c c a b 0 1 1 c a b

(4) Tridiagonal matrix


a b 0 0 ··· 0 0
c a b 0 ··· 0 0
An = 0 c a b · · · 0 0 =⇒ An = aAn−1 − bcAn−2 = 1 + bc + (bc)2 + · · · + (bc)n
.. .. .. .. .. .. ..
. . . . . . .
0 0 0 0 ··· c a n×n
 
x21 + y12 x1 x2 + y 1 y 2 · · · x1 xn + y 1 y n x1 y 1 0 ··· 0
2 2
x2 x1 + y2 y1 x2 + y 2 · · · x 2 xn + y 2 y n  x2 y 2
 0 ··· 0
(5) = |A| |A′ | = 0, A =  ..

.. .. ... .. .. .. .. .. 
. . . . . . . .
xn x1 + yn y1 xn x2 + y n y 2 · · · x2n + yn2 xn y n 0 ··· 0
 
n×n n×n n−1
(6) A and B differ only by a single row (or column) =⇒ |A + B| = 2 |A| + |B|

(7) A = (aij ), B = (bij ) = ri−j aij =⇒ |B| = |A|

I O I B A B I O A B
(8) = |A|, = |A| =⇒ = × = |C||A| = |A||C|
O A O A O C O C O I

(9) An×n , A(adj A) = (adj A)A = |A| In


(
Pn |A|, r = s
ari Asi = r, s = 1, 2, . . . , n
i=1 0, r ̸= s

(10) (a) |adj A| = |A|n−1


(b) (adj A)−1 = adj (A−1 ) = A
|A|

(c) adj(adj A) = |A|n−2 A


2
(d) |adj(adj A)| = |A|(n−1)

(11) |kA| = k n |A|, adj(kA) = k n−1 adj A

(12) adj(AB) = adj(B) adj(A) [if |A|, |B| =


̸ 0]
CHAPTER 3. MATHEMATICS 41

(13) Adjoint of a symmetric matrix is symmetric


Adjoint of a skew-symmetric matrix is symmetric for even order, skew-symmetric if odd order
Adjoint of a diagonal matrix is diagonal

0, if r(A) ≤ n − 2

(14) r(adj A) = 1, if r(A) = n − 1

n, if r(A) = n

Inverse of a Matrix
(15) (a) (AB)−1 = B −1 A−1

(b) (A−1 ) = (A′ )−1
−1
(c) |A + B| = ̸ 0 =⇒ |A−1 + B −1 | = ̸ 0 =⇒ (B −1 + A−1 ) = A(A + B)−1 B
 k×k 
n×n A11 A12
(16) A = , |A| =
̸ 0
A21 A22
(
|A11 ||A22 − A21 A−1 11 A12 |, if |A11 | =
̸ 0
|A| = −1
|A22 ||A11 − A12 A22 A21 |, if |A22 | = ̸ 0
 
A −u
(17) M = ̸ 0 =⇒ |M | = |A|(1 + v ′ A−1 u) = |A + uv ′ |
, |A| =
v 1˜ ˜ ˜ ˜˜
˜
̸ 0, |A + uv ′ | =
(18) |A| = ̸ 0 ⇐⇒ (1 + v ′ A−1 u) ̸= 0
˜˜ ˜ ˜
A−1 uv ′ A−1
(A + uv ′ )−1 = A−1 − ˜˜
˜˜ 1 + v ′ A−1 u
˜ ˜
−1
(19) An×n ′
a,b = (a − b)In + b 11 =⇒ Aa,b = Ac,d iff ∆ = (a − b)(a + n − 1 b) ̸= 0
˜˜
where, c = a+(n−2)b

, d = − ∆b
n n
(20) An×n = (aij ), |A| = bij = k1 , ∀ i where A−1 = (bij )
P P
̸ 0, aij = k, ∀ i =⇒
j=1 j=1

Orthogonal Matrix
 
a′1
˜a′ 
 2
(21) An×n = ˜..  where, a′i = ai1 ai2 · · ·

ain , i = 1, 2, . . . , n
. ˜

an
˜ (
1, when i = j
AA′ = In =⇒ a′i aj = |A| = ±1
˜˜ 0, when i ̸= j

(22) AA′ = A′ A = In , (In + A) ̸= 0 =⇒ (In + A)−1 (In − A) is skew-symmetric

(23) A, B real orthogonal ∋ |A| + |B| = 0 =⇒ |A + B| = 0

(24) AA′ = kIn =⇒ A′ A = kIn


CHAPTER 3. MATHEMATICS 42

Rank & Determinant


(25) Theorem: For a matrix Am×n the rank of A is the order of the “highest order non-vanishing
minor” of A.

(26) An×n , r(A) = n ⇐⇒ |A| =


̸ 0

(27) Elementary Matrices: An elementary matrix is a matrix which differs from the Identity
matrix by single row (or column) operation.

(28) Elementary Row Operation ≡ Pre-multiplying by corresponding elementary row matrix


Elementary Column Operation ≡ Post-multiplying by corresponding elementary column matrix
 
I k O
(29) r(Am×n ) = k, ∃ P m×m , Qn×n , |P |, |Q| =
̸ 0 ∋ P AQ = (Normal Form)
O O
(30) Any non-singular matrix can be written as a product of elementary matrices

(31) Rank Factorization: Let, r(Am×n ) = k, then a pair (P m×k , Qk×n ) of matrices is said to be a
rank factorization of A if, Am×n = P m×k Qk×n (non-unique way)

(32) Am×n = P m×k Qk×n , r(A) ≤ k, the following statements are equivalent -

(a) r(A) = k i.e. (P, Q) is a rank factorization of A


(b) r(P m×k ) = r(Qk×n ) = k
(c) The columns of P forms a basis of C(A) and the rows of Q forms a basis of R(A)

(33) A2 = A, Am×m = P m×k Qk×m where k = r(A) =⇒ (i) QP = In (ii) r(A) = tr(A)

(34) r(A | B) = r(A) ⇐⇒ B = AC, for some C

3.2.4 System of Linear Equation


System of Linear Equations: Am×n xn×1 = bm×1
˜ ˜
(1) Homogeneous System: Ax = 0
˜ ˜
(a) x = 0 is always consistent as x = 0 is a trivial solution
˜ ˜ ˜ ˜
(b) Am×n x = 0 has a non-trivial solution iff r(A) < n
˜ ˜
(c) The no. of LIN solutions of Ax = 0 = dim N (A) = n − r(A)
˜ ˜
(d) Elementary row operation on a matrix A doesn’t alter the N (A)

(2) General System: Ax = b, b ̸= 0


˜ ˜ ˜ ˜
(a) r(A | b) is either r(A) or r(A) + 1
˜
C(A | b) ⊇ C(A)
˜ (
Consistent ⇐⇒ r(A | b) = r(A)
(b) Ax = b → ˜
˜ ˜ Inconsistent ⇐⇒ r(A | b) > r(A)
( ˜
Unique solution ⇐⇒ r(A) = n
(c) Am×n x = b is consistent →
˜ ˜ At least two solutions ⇐⇒ r(A) < n
CHAPTER 3. MATHEMATICS 43

(d) Ax1 = Ax2 = b =⇒ αx1 + (1 − α)x2 is also a solution


˜ ˜ ˜ ˜ ˜
=⇒ If a system has two distinct solutions then it has infinitely many solutions

(3) Theorem: Ax = b be a consistent system with x0 as a particular solution. Then the set of all
˜ of
possible solutions ˜ Ax = b is given by x + N (A)˜ = {x + u : u ∈ N (A)}
0 0
˜ ˜ ˜ ˜ ˜ ˜
• Point, lines, planes not necessarily passing through the origin are called ‘flats’. If W is non-empty
flat and x0 is a fixed vector, then the translation of W by x0 is, x0 + W = {x0 + w : w ∈ W } and is
˜
a flat parallel to W . ˜ ˜ ˜ ˜ ˜

You might also like