Tong Hop Cong Thuc Mt2013 Lop Thay Dungclc

Synthesis on formulas of probability and statistics
SYNTHESIS ON FORMULAS OF PROBABILITY AND STATISTICS
1 PROBABILITY PART
1.1 Probability formulas
Addition Rule and Multiplication Rule
• P (A+B) = P (A)+P (B)−P (A.B); P (A+B +C) = P (A)+P (B)+P (C)−P (A.B)−P (B.C)−P (A.C)+P (A.B.C)
n
X X X
• P (A1 + A2 + ... + An ) = P (Ai ) − P (Ai .Aj ) + P (Ai .Aj .Ak ) − ... + (−1)n−1 .P (A1 .A2 .A3 ...An )
i=1 i<j i<j<k
• P (AB) = P (A/B).P (B) = P (B/A).P (A)
• P (A1 .A2 ...An ) = P (A1 ).P (A2 /A1 ).P (A3 /A1 .A2 )....P (An /A1 .A2 ...An−1 ).
Total probability - Extended form, Bayes’s Formula - Extended form

n
X
• P (A) = P (B1 ).P (A/B1 ) + P (B2 ).P (A/B2 ) + ... + P (Bn ).P (A/Bn ) = P (Bi ).P (A/Bi )
i=1
P (Bj .A) P (Bj ).P (A/Bj )
• P (Bj /A) = = n , i = 1, 2, 3, ... and P(A) 6= 0.
P (A) X
P (Bi ).P (A/Bi )
i=1
Other formulas
• P (A.B) = 1 − P (A + B) ; P (A + B) = 1 − P (AB) P (A.B) = P (A) − P (AB) ; P (A.B) = P (B) − P (AB)
1.2 Random Variables

• Discrete Random Variables (DRV): E(X) = xi .pi ; E(X 2 ) = x2i .pi ; V (X) = E(X 2 ) − E(X)2
P P
R +∞ R +∞
• Continuous Random Variables (CRV): E(X) = −∞ x.f (x)dx; E(X 2 ) = −∞ x2 .f (x)dx; V (X) = E(X 2 ) − E(X)2
Properties:
E(a) = a if a = const; E(a.X + b.Y ) = a.E(X) + b.E(Y ); E(X.Y ) = E(X).E(Y ) if X and Y are independent.
X Z +∞
If X is DRV, E(Y ) = E[p(x)] = p(xi ).pi ; If X is CRV, E(Y ) = E[p(x)] = p(x).f (x)dx
−∞
V (a) = 0, if a = const; V (a.X + b) = a2 .V (X); V (X + Y ) = V (X − Y ) = V (X) + V (Y ) if X and Y are independent.
1.3 Some Special Distributions

The Binomial Distributions, X ∼ B(n, p) : E(X) = np, V (X) = npq, np − q ≤ M od(X) ≤ np − q + 1 (M od(X) ∈ N)
k2
X
P (X = k) = Cnk .pk .q n−k ; P (k1 ≤ X ≤ k2 ) = Cnk .pk .q n−k .
k1
When n is large, (p ≥ 5%) then X ≈ N (µ = np, σ 2 = npq):
(k − np)2
−

1 2npq k2 + 0.5 − np k1 − 0.5 − np
P (X = k) = √ √ .e P (k1 ≤ X ≤ k2 ) = Φ √ −Φ √
npq. 2π npq npq
When n is large, p < 5% then X ≈ P (λ = np):
k2
e−np .(np)k X e−np .(np)k
P (X = k) = P (k1 ≤ X ≤ k2 ) =
k! k!
k1
N −n M
The Hypergeometric Distributions, X ∼ H(N, M, n): E(X) = np, V (X) = npq , with p = and q = 1 − p
N −1 N
The Poisson Distributions, X ∼ P (λ): E(X) = V (X) = λ, λ − 1 ≤ M od(X) ≤ λ (M od(X) ∈ N)
k2
e−λ .λk X e−λ .λk
P (X = k) = P (k1 ≤ X ≤ k2 ) =
k! k!
k1
Xi ∼ P (λi ), i = 1, 2,..., n are independent, then: Y =X1 + X2 + ... + Xn ∼ P (λ1 + λ2 + ... + λn )
 0, x<a
(b − a)2

x − a a+b
The Uniform Distributions, X ∼ U (a, b): F (x) = , a ≤ x ≤ b ; E(X) = med(X) = , V (X) =

 b−a 2 12
1, x>b

Editor: Truong Duc An Page 1

(
1 − e−λ.x , x ≥ 0
The Exponential Distributions, X ∼ E(λ): F (x) =
0, x<0
1 1 ln(2)
E(X) = , V (X) = 2 , mod(X) = 0, med(X) =
λ λ λ
The Normal Distributions, X ∼ N (a, σ 2 ):E(X) = M od(X) = M ed(X) = µ, V (X) = σ 2
k2 − µ k1 − µ ε
P (k1 ≤ X ≤ k2 ) = Φ −Φ ; P (|X − µ| < ε) = 2.Φ( ) − 1, ε > 0
σ σ σ
X1 ∼ N (µ1 , σ12 ); X2 ∼ N (µ2 , σ22 ) are independent, then: Y = a.X1 + b.X2 ∼ N (µ = a.µ1 + b.µ2 , σ 2 = a2 .σ12 + b2 .σ22 )
Xi ∼ N (µi , σi2 ), i = 1, 2,..., n are independent, then: Y = X1 + ... + Xn ∼ N (µ = µ1 + ... + µn , σ 2 = σ12 + ... + σn2 )
The Centeral Limit Theorem: X1 , ..., Xn are independent, and have: E(Xi ) = µ and V (Xi ) = σ 2 , we have:
X1 + X2 + ... + Xn σ2
If X = X1 + X2 + ... + Xn then X ∼N (n.µ, n.σ 2 ) ; if X = then X ∼N (µ, ).
n n
2 STATISTIC PART
2.1 Confidence Intervals
Summary table of problems to find confidence intervals: (Note: The problem of finding confidence intervals for
M and N is related to the proportional estimator, the result of which is round integers according to the oversold rule.)
Type Condition Structure Radius√of CI CI

p
b.(1−b
p)
Two-side ε = zα/2 . √n p = pb ± ε
Population √
X ∼ B(n, p), np ≥ 10, n(1 − p) ≥ 10 p
b.(1−b
p)
Proportion Right one-side −∞ ≤ p ≤ pb + zα . √n
√
p
b.(1−bp)
Left one-side pb − zα . √n ≤ p ≤ +∞
Two-side ε = zα/2 . √σn µ=x±ε
(1) Normal Population + Known σ Right one-side −∞ ≤ µ ≤ x + zα . √σn
Left one-side x − zα . √σn ≤ µ ≤ ∞
Population Two-side ε = tα/2;n−1 . √sn µ=x±ε
Mean (2) Normal Population + Unknown σ Right one-side −∞ ≤ µ ≤ x + tα;n−1 . √sn
Left one-side x − tα;n−1 . √sn ≤ µ ≤ ∞
Two-side ε = zα/2 . √sn µ=x±ε
(3) Large Sample Size Right one-side −∞ ≤ µ ≤ x + zα . √sn
Left one-side x − zα . √sn ≤ µ ≤ ∞
Sample size problem: (Note: Round the result to an integer according to the rule of rounding up)
Type Condition Sample size Note

√ 2
zα/2 . p
b(1−b
p)
Population Known pb n0 = ε ε is radius of CI, 2ε is length of CI
Proportion zα/2 2
Unknown pb n0 = ε .0.25 Pay attention to the question of sample size to be surveyed
2
Population Known σ n0 = zα/2 . σε or sample size to be investigated further
2
Mean Unknown σ n0 = zα/2 . εs If the result is an integer, no rounding is required
2.2 Anova: The analysis of variance

1. The hypothesis H0 : µ1 = µ2 = ... = µI ; H1 : ∃µi 6= µj , i 6= j
2. Calculate the averages: x1 , x2 , ..., xI ; x.
I
X
3. Calculate the sum of squares: SST r = J. (xi − x)2 = J.[(x1 − x)2 + (x2 − x)2 + ... + (xI − x)2 ].
i=1
J
X J
X J
X
SSE = SS1 +SS2 +...+SSI = (x1j −x1 )2 + (x2j −x2 )2 +...+ (xIj −xI )2 . = (J −1)s21 +(J −1).s22 +...+(J −1).s2I
j=1 j=1 j=1
X X X
= x21j − J.x21 + x22j − J.x22 + ... + xIj − J.x2I
2
j j j
I X
X J X
SST = SST r + SSE or we can calculate: SST = (xij − x)2 = (IJ − 1)s2 = x2ij − IJ.x2 ; SSE = SST − SST r
i=1 j=1 i,j

SST r SSE SST

4. Calculate the mean squared: M ST r = ; M SE = ; M ST =
I −1 I(J − 1) IJ − 1
M ST r
5. Test statistic: F =
M SE
6. Rejection region: F > Fα;I−1;I(J−1)
Note: I : number of comparison groups, J : number of observations in a group, IJ : sum of observations in all groups,
s2 : sample variance of observations in all groups.
Multiple Comparisons:
H1 : µi 6= µj ; i 6= j
The hypothesis H0 : µi = µj ; q
Calculate LSD = tα/2;I(J−1) . 2MJSE . Rejection region: |xi − xj | > LSD.
Confidence Intervals for the difference in mean of the two groups (µi − µj ): (xi − xj ) ± LSD
2.3 Linear regression Model
ID Problem Sign Formula

Xn
n n
( xi )2
X X i=1
2
1 Sum of squares Sxx Sxx = (xi − x) = x2i − sx )2
= n.(b
i=1 i=1
n
n
X
n n
( yi )2
X X i=1
Syy Syy = (yi − y)2 = yi2 − sy )2
= n.(b
i=1 i=1
n
Xn Xn
n n
( xi ).( yi )
X X i=1 i=1
X
Sxy Sxy = (xi − x).(yi − y) = xi .yi − = xy − n.x.y
i=1 i=1
n
2. The simple linear regression considers a single predictor x and a dependent variable Y: yb = a + b.x
Sxy
The slope b b=
Sxx
The intercept a a = y − b.x.
The simple linear regression considers a single predictor y and a dependent variable X: x b = c + d.y
Sxy
The slope d d=
Syy
The intercept c c = x − d.y
Sxy
3. Correlation coefficients rXY rXY = p
Sxx .Syy
4 Sum of squares SST SST = Syy
SSR SSR = b.Sxy
SSE SSE = SST − SSR = Syy − b.Sxy
SSR SSE
5 Multiple R square R2 R2 = =1− 2
= rXY
SST r SST
SSE
6 An unbiased estimator of σ s s=
n−2
SSE
7 An unbiased estimator of σ 2 s2 s2 =
n−2
8 Confidence interval
s s
The slope b − tα/2;n−2 . √ ; b + tα/2;n−2 . √
pSxx Sxx !
p
s. x2 s. x2
The intercept a − tα/2;n−2 . √ ; a + tα/2;n−2 . √ .
Sxx Sxx
Hypothesis testing:
H0 H1 Test statistic Rejection region (1) Rejection region, (2)
The slope
b s
β=0 β 6= 0 tqs = sb ; sb = √ RR = (−∞; −tα/2;n−2 ) ∪ (tα/2;n−2 ; +∞) |tqs | > tα/2;n−2
Sxx
p The intercept
a s. x2
α=0 α 6= 0 tqs = sa ; sa = √ RR = (−∞; −tα/2;n−2 ) ∪ (tα/2;n−2 ; +∞) |tqs | > tα/2;n−2
Sxx

2.4 Testing Simple Hypotheses

H0 H1 Test statistic Rejection region (1) Rejection region, (2)
Hypothesis Tests on Proportion (Note: Check np0 ≥ 10, n(1 − p0 ) ≥ 10)
p 6= p0 √ RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
b−p0
p
p = p0 p < p0 zqs = √p0 (1−p0 ) . n RR = (−∞; −zα ) zqs < −zα
p > p0 RR = (zα ; +∞) zqs > zα
Hypothesis Tests on Mean (Type 1) Normal Population + Known σ
µ 6= µ0 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
x−µ
µ = µ0 µ < µ0 zqs = σ/ √0
n RR = (−∞; −z α ) zqs < −zα
µ > µ0 RR = (zα ; +∞) zqs > zα
Hypothesis Tests on Mean (Type 2) Normal Population + Unknown σ
µ 6= µ0 RR = (−∞; −tα/2;n−1 ) ∪ (tα/2;n−1 ; +∞) |tqs | > tα/2;n−1
µ = µ0 µ < µ0 tqs = x−µ√0
s/ n RR = (−∞; −t α;n−1 ) tqs < −tα;n−1
µ > µ0 RR = (tα;n−1 ; +∞) tqs > tα;n−1
Hypothesis Tests on Mean (Type 3) Any Distribution - Large Sample Size
µ 6= µ0 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
µ = µ0 µ < µ0 zqs = x−µ√0
s/ n RR = (−∞; −z α ) zqs < −zα
µ > µ0 RR = (zα ; +∞) zqs > zα
2.5 Inferences Based on Two Samples
H0 H1 Test statistic Rejection region (1) Rejection region (2)

Hypothesis Tests on the Difference in Proportion (Note: Check n1 pb1 ≥ 10, n1 (1 − pb1 ) ≥ 10, n2 pb2 ≥ 10, n2 (1 − pb2 ) ≥ 10)
p1 6= p2 zqs = q pb1 −bp12 1 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
p(1−p)( n + n )
p1 = p2 p1 < p2 1 2 RR = (−∞; −zα ) zqs < −zα
p1 > p2 p = mn11 +m
+n2
2
RR = (z α ; +∞) zqs > zα
Hypothesis Tests on the Difference in Means (Type 1) Normal Population + Known σ
µ1 6= µ2 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
zqs = rx12−x2 2
µ1 = µ2 µ1 < µ2 σ1 σ2 RR = (−∞; −z α ) zqs < −zα
n1 + n2
µ1 > µ2 RR = (zα ; +∞) zqs > zα
Hypothesis Tests on the Difference in Means (Type 2) Normal Population + Unknown σ + Equal Variances
µ1 6= µ2 tqs = qxS12−x2S2 RR = (−∞; −tα/2;n1 +n2 −2 ) ∪ (tα/2;n1 +n2 −2 ; +∞) |tqs | > tα/2;n1 +n2 −2
+n
µ1 < µ2 n 1 2 RR = (−∞; −tα;n1 +n2 −2 ) tqs < −tα;n1 +n2 −2
µ1 = µ2 (n1 −1)s21 +(n2 −1)s22
2
µ1 > µ2 S = n1 +n2 −2 RR = (tα;n1 +n2 −2 ; +∞) tqs > tα;n1 +n2 −2
Hypothesis Tests on the Difference in Means (Type 3) Normal Population + Unknown σ + Unequal Variances
µ1 6= µ2 tqs = rx12−x2 2 RR = (−∞; −tα/2;v ) ∪ (tα/2;v ; +∞) |tqs | > tα/2;v
s1 s2
µ1 = µ2 µ1 < µ2 n + n RR = (−∞; −tα;v ) tqs < −tα;v
1 2
[(s21 /n1 )+(s22 /n2 )]2
µ1 > µ2 v= (s2 2 (s2 2 RR = (tα;v ; +∞) tqs > tα;v
1 /n1 ) + 2 /n2 )
n1 −1 n2 −1
Hypothesis Tests on the Difference in Means (Type 4) Any Distribution - Large Sample Size
µ1 6= µ2 RR = (−∞; −zα/2 ) ∪ (zα/2 ; +∞) |zqs | > zα/2
zqs = rx12−x2 2
µ1 = µ2 µ1 < µ2 s1 s2 RR = (−∞; −z α ) zqs < −zα
n1 + n2
µ1 > µ2 RR = (zα ; +∞) zqs > zα
Hypothesis Tests on the Difference in Means (Type 5) Two samples in pairs + Unknown σ
µD 6= 0 RR = (−∞; −tα/2;n−1 ) ∪ (tα/2;n−1 ; +∞) |tqs | > tα/2;n−1
d √
µD = 0 µD < 0 tqs = sD n RR = (−∞; −tα;n−1 ) tqs < −tα;n−1
µD > 0 RR = (tα;n−1 ; +∞) tqs > tα;n−1
+ For Hypothesis Tests on the Difference in Means with unknown variance, X1 and X2 follow a normal distribution
s1 s1
(Type 2 or Type 3), we calculate the ratio . if ∈ [0.5; 2], then σ12 = σ22 . On the opposite, then σ12 6= σ22 .
s2 s2

Tong Hop Cong Thuc Mt2013 Lop Thay Dungclc

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tong Hop Cong Thuc Mt2013 Lop Thay Dungclc

Uploaded by

Copyright:

Available Formats

Synthesis on formulas of probability and statistics

SYNTHESIS ON FORMULAS OF PROBABILITY AND STATISTICS

Total probability - Extended form, Bayes’s Formula - Extended form

• P (A.B) = 1 − P (A + B) ; P (A + B) = 1 − P (AB) P (A.B) = P (A) − P (AB) ; P (A.B) = P (B) − P (AB)

1.2 Random Variables

1.3 Some Special Distributions

Editor: Truong Duc An Page 1

Type Condition Structure Radius√of CI CI

Type Condition Sample size Note

2.2 Anova: The analysis of variance

Editor: Truong Duc An Page 2

SST r SSE SST

2.3 Linear regression Model

ID Problem Sign Formula

Editor: Truong Duc An Page 3

2.4 Testing Simple Hypotheses

2.5 Inferences Based on Two Samples

H0 H1 Test statistic Rejection region (1) Rejection region (2)

Editor: Truong Duc An Page 4

You might also like