Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Synthesis on formulas of probability and statistics

SYNTHESIS ON FORMULAS OF PROBABILITY AND STATISTICS

1 PROBABILITY PART
1.1 Probability formulas
Addition Rule and Multiplication Rule

• P (A+B) = P (A)+P (B)−P (A.B); P (A+B +C) = P (A)+P (B)+P (C)−P (A.B)−P (B.C)−P (A.C)+P (A.B.C)
n
X X X
• P (A1 + A2 + ... + An ) = P (Ai ) − P (Ai .Aj ) + P (Ai .Aj .Ak ) − ... + (−1)n−1 .P (A1 .A2 .A3 ...An )
i=1 i<j i<j<k
• P (AB) = P (A/B).P (B) = P (B/A).P (A)
• P (A1 .A2 ...An ) = P (A1 ).P (A2 /A1 ).P (A3 /A1 .A2 )....P (An /A1 .A2 ...An−1 ).

Total probability - Extended form, Bayes’s Formula - Extended form


n
X
• P (A) = P (B1 ).P (A/B1 ) + P (B2 ).P (A/B2 ) + ... + P (Bn ).P (A/Bn ) = P (Bi ).P (A/Bi )
i=1
P (Bj .A) P (Bj ).P (A/Bj )
• P (Bj /A) = = n , i = 1, 2, 3, ... and P(A) 6= 0.
P (A) X
P (Bi ).P (A/Bi )
i=1
Other formulas

• P (A.B) = 1 − P (A + B) ; P (A + B) = 1 − P (AB) P (A.B) = P (A) − P (AB) ; P (A.B) = P (B) − P (AB)

1.2 Random Variables


• Discrete Random Variables (DRV): E(X) = xi .pi ; E(X 2 ) = x2i .pi ; V (X) = E(X 2 ) − E(X)2
P P
R +∞ R +∞
• Continuous Random Variables (CRV): E(X) = −∞ x.f (x)dx; E(X 2 ) = −∞ x2 .f (x)dx; V (X) = E(X 2 ) − E(X)2
Properties:
E(a) = a if a = const; E(a.X + b.Y ) = a.E(X) + b.E(Y ); E(X.Y ) = E(X).E(Y ) if X and Y are independent.
X Z +∞
If X is DRV, E(Y ) = E[p(x)] = p(xi ).pi ; If X is CRV, E(Y ) = E[p(x)] = p(x).f (x)dx
−∞
V (a) = 0, if a = const; V (a.X + b) = a2 .V (X); V (X + Y ) = V (X − Y ) = V (X) + V (Y ) if X and Y are independent.

1.3 Some Special Distributions


The Binomial Distributions, X ∼ B(n, p) : E(X) = np, V (X) = npq, np − q ≤ M od(X) ≤ np − q + 1 (M od(X) ∈ N)
k2
X
P (X = k) = Cnk .pk .q n−k ; P (k1 ≤ X ≤ k2 ) = Cnk .pk .q n−k .
k1
When n is large, (p ≥ 5%) then X ≈ N (µ = np, σ 2 = npq):
(k − np)2

   
1 2npq k2 + 0.5 − np k1 − 0.5 − np
P (X = k) = √ √ .e P (k1 ≤ X ≤ k2 ) = Φ √ −Φ √
npq. 2π npq npq
When n is large, p < 5% then X ≈ P (λ = np):
k2
e−np .(np)k X e−np .(np)k
P (X = k) = P (k1 ≤ X ≤ k2 ) =
k! k!
k1
N −n M
The Hypergeometric Distributions, X ∼ H(N, M, n): E(X) = np, V (X) = npq , with p = and q = 1 − p
N −1 N
The Poisson Distributions, X ∼ P (λ): E(X) = V (X) = λ, λ − 1 ≤ M od(X) ≤ λ (M od(X) ∈ N)
k2
e−λ .λk X e−λ .λk
P (X = k) = P (k1 ≤ X ≤ k2 ) =
k! k!
k1
Xi ∼ P (λi ), i = 1, 2,..., n are independent, then: Y =X1 + X2 + ... + Xn ∼ P (λ1 + λ2 + ... + λn )
 0, x<a
(b − a)2

x − a a+b
The Uniform Distributions, X ∼ U (a, b): F (x) = , a ≤ x ≤ b ; E(X) = med(X) = , V (X) =

 b−a 2 12
1, x>b

Editor: Truong Duc An Page 1


Synthesis on formulas of probability and statistics

(
1 − e−λ.x , x ≥ 0
The Exponential Distributions, X ∼ E(λ): F (x) =
0, x<0
1 1 ln(2)
E(X) = , V (X) = 2 , mod(X) = 0, med(X) =
λ λ λ
The Normal Distributions, X ∼  N (a, σ 2 ):E(X) = M od(X) = M ed(X) = µ, V (X) = σ 2
k2 − µ k1 − µ ε
P (k1 ≤ X ≤ k2 ) = Φ −Φ ; P (|X − µ| < ε) = 2.Φ( ) − 1, ε > 0
σ σ σ
X1 ∼ N (µ1 , σ12 ); X2 ∼ N (µ2 , σ22 ) are independent, then: Y = a.X1 + b.X2 ∼ N (µ = a.µ1 + b.µ2 , σ 2 = a2 .σ12 + b2 .σ22 )
Xi ∼ N (µi , σi2 ), i = 1, 2,..., n are independent, then: Y = X1 + ... + Xn ∼ N (µ = µ1 + ... + µn , σ 2 = σ12 + ... + σn2 )
The Centeral Limit Theorem: X1 , ..., Xn are independent, and have: E(Xi ) = µ and V (Xi ) = σ 2 , we have:
X1 + X2 + ... + Xn σ2
If X = X1 + X2 + ... + Xn then X ∼N (n.µ, n.σ 2 ) ; if X = then X ∼N (µ, ).
n n

2 STATISTIC PART
2.1 Confidence Intervals
Summary table of problems to find confidence intervals: (Note: The problem of finding confidence intervals for
M and N is related to the proportional estimator, the result of which is round integers according to the oversold rule.)

Type Condition Structure Radius√of CI CI


p
b.(1−b
p)
Two-side ε = zα/2 . √n p = pb ± ε
Population √
X ∼ B(n, p), np ≥ 10, n(1 − p) ≥ 10 p
b.(1−b
p)
Proportion Right one-side −∞ ≤ p ≤ pb + zα . √n

p
b.(1−bp)
Left one-side pb − zα . √n ≤ p ≤ +∞
Two-side ε = zα/2 . √σn µ=x±ε
(1) Normal Population + Known σ Right one-side −∞ ≤ µ ≤ x + zα . √σn
Left one-side x − zα . √σn ≤ µ ≤ ∞
Population Two-side ε = tα/2;n−1 . √sn µ=x±ε
Mean (2) Normal Population + Unknown σ Right one-side −∞ ≤ µ ≤ x + tα;n−1 . √sn
Left one-side x − tα;n−1 . √sn ≤ µ ≤ ∞
Two-side ε = zα/2 . √sn µ=x±ε
(3) Large Sample Size Right one-side −∞ ≤ µ ≤ x + zα . √sn
Left one-side x − zα . √sn ≤ µ ≤ ∞

Sample size problem: (Note: Round the result to an integer according to the rule of rounding up)

Type Condition Sample size Note


 √ 2
zα/2 . p
b(1−b
p)
Population Known pb n0 = ε ε is radius of CI, 2ε is length of CI
Proportion zα/2 2
Unknown pb n0 = ε .0.25 Pay attention to the question of sample size to be surveyed
2
Population Known σ n0 = zα/2 . σε or sample size to be investigated further
2
Mean Unknown σ n0 = zα/2 . εs If the result is an integer, no rounding is required

2.2 Anova: The analysis of variance


1. The hypothesis H0 : µ1 = µ2 = ... = µI ; H1 : ∃µi 6= µj , i 6= j
2. Calculate the averages: x1 , x2 , ..., xI ; x.
I
X
3. Calculate the sum of squares: SST r = J. (xi − x)2 = J.[(x1 − x)2 + (x2 − x)2 + ... + (xI − x)2 ].
i=1
J
X J
X J
X
SSE = SS1 +SS2 +...+SSI = (x1j −x1 )2 + (x2j −x2 )2 +...+ (xIj −xI )2 . = (J −1)s21 +(J −1).s22 +...+(J −1).s2I
j=1 j=1 j=1
X X X
= x21j − J.x21 + x22j − J.x22 + ... + xIj − J.x2I
2

j j j
I X
X J X
SST = SST r + SSE or we can calculate: SST = (xij − x)2 = (IJ − 1)s2 = x2ij − IJ.x2 ; SSE = SST − SST r
i=1 j=1 i,j

Editor: Truong Duc An Page 2


Synthesis on formulas of probability and statistics

SST r SSE SST


4. Calculate the mean squared: M ST r = ; M SE = ; M ST =
I −1 I(J − 1) IJ − 1
M ST r
5. Test statistic: F =
M SE
6. Rejection region: F > Fα;I−1;I(J−1)
Note: I : number of comparison groups, J : number of observations in a group, IJ : sum of observations in all groups,
s2 : sample variance of observations in all groups.
Multiple Comparisons:
H1 : µi 6= µj ; i 6= j
The hypothesis H0 : µi = µj ; q
Calculate LSD = tα/2;I(J−1) . 2MJSE . Rejection region: |xi − xj | > LSD.
Confidence Intervals for the difference in mean of the two groups (µi − µj ): (xi − xj ) ± LSD

2.3 Linear regression Model

ID Problem Sign Formula


Xn

n n
( xi )2
X X i=1
2
1 Sum of squares Sxx Sxx = (xi − x) = x2i − sx )2
= n.(b
i=1 i=1
n
n
X
n n
( yi )2
X X i=1
Syy Syy = (yi − y)2 = yi2 − sy )2
= n.(b
i=1 i=1
n
Xn Xn

n n
( xi ).( yi )
X X i=1 i=1
X
Sxy Sxy = (xi − x).(yi − y) = xi .yi − = xy − n.x.y
i=1 i=1
n
2. The simple linear regression considers a single predictor x and a dependent variable Y: yb = a + b.x
Sxy
The slope b b=
Sxx
The intercept a a = y − b.x.
The simple linear regression considers a single predictor y and a dependent variable X: x b = c + d.y
Sxy
The slope d d=
Syy
The intercept c c = x − d.y
Sxy
3. Correlation coefficients rXY rXY = p
Sxx .Syy
4 Sum of squares SST SST = Syy
SSR SSR = b.Sxy
SSE SSE = SST − SSR = Syy − b.Sxy
SSR SSE
5 Multiple R square R2 R2 = =1− 2
= rXY
SST r SST
SSE
6 An unbiased estimator of σ s s=
n−2
SSE
7 An unbiased estimator of σ 2 s2 s2 =
n−2
8 Confidence interval  
s s
The slope b − tα/2;n−2 . √ ; b + tα/2;n−2 . √
pSxx Sxx !
p
s. x2 s. x2
The intercept a − tα/2;n−2 . √ ; a + tα/2;n−2 . √ .
Sxx Sxx

Hypothesis testing:
H0 H1 Test statistic Rejection region (1) Rejection region, (2)
The slope
b s
β=0 β 6= 0 tqs = sb ; sb = √ RR = (−∞; −tα/2;n−2 ) ∪ (tα/2;n−2 ; +∞) |tqs | > tα/2;n−2
Sxx
p The intercept
a s. x2
α=0 α 6= 0 tqs = sa ; sa = √ RR = (−∞; −tα/2;n−2 ) ∪ (tα/2;n−2 ; +∞) |tqs | > tα/2;n−2
Sxx

Editor: Truong Duc An Page 3


Synthesis on formulas of probability and statistics

2.4 Testing Simple Hypotheses


H0 H1 Test statistic Rejection region (1) Rejection region, (2)
Hypothesis Tests on Proportion (Note: Check np0 ≥ 10, n(1 − p0 ) ≥ 10)
p 6= p0 √ RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
b−p0
p
p = p0 p < p0 zqs = √p0 (1−p0 ) . n RR = (−∞; −zα ) zqs < −zα
p > p0 RR = (zα ; +∞) zqs > zα
Hypothesis Tests on Mean (Type 1) Normal Population + Known σ
µ 6= µ0 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
x−µ
µ = µ0 µ < µ0 zqs = σ/ √0
n RR = (−∞; −z α ) zqs < −zα
µ > µ0 RR = (zα ; +∞) zqs > zα
Hypothesis Tests on Mean (Type 2) Normal Population + Unknown σ
µ 6= µ0 RR = (−∞; −tα/2;n−1 ) ∪ (tα/2;n−1 ; +∞) |tqs | > tα/2;n−1
µ = µ0 µ < µ0 tqs = x−µ√0
s/ n RR = (−∞; −t α;n−1 ) tqs < −tα;n−1
µ > µ0 RR = (tα;n−1 ; +∞) tqs > tα;n−1
Hypothesis Tests on Mean (Type 3) Any Distribution - Large Sample Size
µ 6= µ0 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
µ = µ0 µ < µ0 zqs = x−µ√0
s/ n RR = (−∞; −z α ) zqs < −zα
µ > µ0 RR = (zα ; +∞) zqs > zα

2.5 Inferences Based on Two Samples

H0 H1 Test statistic Rejection region (1) Rejection region (2)


Hypothesis Tests on the Difference in Proportion (Note: Check n1 pb1 ≥ 10, n1 (1 − pb1 ) ≥ 10, n2 pb2 ≥ 10, n2 (1 − pb2 ) ≥ 10)
p1 6= p2 zqs = q pb1 −bp12 1 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
p(1−p)( n + n )
p1 = p2 p1 < p2 1 2 RR = (−∞; −zα ) zqs < −zα
p1 > p2 p = mn11 +m
+n2
2
RR = (z α ; +∞) zqs > zα
Hypothesis Tests on the Difference in Means (Type 1) Normal Population + Known σ
µ1 6= µ2 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
zqs = rx12−x2 2
µ1 = µ2 µ1 < µ2 σ1 σ2 RR = (−∞; −z α ) zqs < −zα
n1 + n2
µ1 > µ2 RR = (zα ; +∞) zqs > zα
Hypothesis Tests on the Difference in Means (Type 2) Normal Population + Unknown σ + Equal Variances
µ1 6= µ2 tqs = qxS12−x2S2 RR = (−∞; −tα/2;n1 +n2 −2 ) ∪ (tα/2;n1 +n2 −2 ; +∞) |tqs | > tα/2;n1 +n2 −2
+n
µ1 < µ2 n 1 2 RR = (−∞; −tα;n1 +n2 −2 ) tqs < −tα;n1 +n2 −2
µ1 = µ2 (n1 −1)s21 +(n2 −1)s22
2
µ1 > µ2 S = n1 +n2 −2 RR = (tα;n1 +n2 −2 ; +∞) tqs > tα;n1 +n2 −2
Hypothesis Tests on the Difference in Means (Type 3) Normal Population + Unknown σ + Unequal Variances
µ1 6= µ2 tqs = rx12−x2 2 RR = (−∞; −tα/2;v ) ∪ (tα/2;v ; +∞) |tqs | > tα/2;v
s1 s2
µ1 = µ2 µ1 < µ2 n + n RR = (−∞; −tα;v ) tqs < −tα;v
1 2
[(s21 /n1 )+(s22 /n2 )]2
µ1 > µ2 v= (s2 2 (s2 2 RR = (tα;v ; +∞) tqs > tα;v
1 /n1 ) + 2 /n2 )
n1 −1 n2 −1

Hypothesis Tests on the Difference in Means (Type 4) Any Distribution - Large Sample Size
µ1 6= µ2 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
zqs = rx12−x2 2
µ1 = µ2 µ1 < µ2 s1 s2 RR = (−∞; −z α ) zqs < −zα
n1 + n2
µ1 > µ2 RR = (zα ; +∞) zqs > zα
Hypothesis Tests on the Difference in Means (Type 5) Two samples in pairs + Unknown σ
µD 6= 0 RR = (−∞; −tα/2;n−1 ) ∪ (tα/2;n−1 ; +∞) |tqs | > tα/2;n−1
d √
µD = 0 µD < 0 tqs = sD n RR = (−∞; −tα;n−1 ) tqs < −tα;n−1
µD > 0 RR = (tα;n−1 ; +∞) tqs > tα;n−1

+ For Hypothesis Tests on the Difference in Means with unknown variance, X1 and X2 follow a normal distribution
s1 s1
(Type 2 or Type 3), we calculate the ratio . if ∈ [0.5; 2], then σ12 = σ22 . On the opposite, then σ12 6= σ22 .
s2 s2

Editor: Truong Duc An Page 4

You might also like