Professional Documents
Culture Documents
A Probability and Statistics Cheatsheet
A Probability and Statistics Cheatsheet
Cheat Sheet
12 Parametric Inference
12.1 Method of Moments . . . . . . . . . .
12.2 Maximum Likelihood . . . . . . . . . .
sity of California in Berkeley but also influenced by other sources
12.2.1 Delta Method . . . . . . . . . .
[4, 5]. If you find errors or have suggestions for further topics, I
12.3 Multiparameter Models . . . . . . . .
would appreciate if you send me an email. The most recent ver12.3.1 Multiparameter Delta Method
sion of this document is available at http://bit.ly/probstat.
12.4 Parametric Bootstrap . . . . . . . . .
This cheat sheet integrates a variety of topics in probability theory and statistics. It is based on literature [1, 6, 3] and in-class
material from courses of the statistics department at the Univer-
.
.
.
.
.
.
13 Hypothesis Testing
Contents
1 Distribution Overview
1.1 Discrete Distributions . . . . . . . . . .
1.2 Continuous Distributions . . . . . . . .
2 Probability Theory
3 Random Variables
3.1 Transformations . . . . . . . . . . . . .
4 Expectation
5 Variance
6 Inequalities
14 Bayesian Inference
14.1 Credible Intervals . . . .
3
14.2 Function of Parameters
3
14.3 Priors . . . . . . . . . .
4
14.3.1 Conjugate Priors
6
14.4 Bayesian Testing . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 15 Exponential Family
7
16 Sampling Methods
7
16.1 The Bootstrap . . . . . . . . . . . . .
16.1.1 Bootstrap Confidence Intervals
7
16.2 Rejection Sampling . . . . . . . . . . .
16.3 Importance Sampling . . . . . . . . . .
8
.
.
.
.
16
16
16
17
17
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
17
18
18
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
18
19
19
19
20
20
20
21
21
21
7 Distribution Relationships
8 Probability
Functions
and
Moment
17 Decision Theory
17.1 Risk . . . . . . . . . . . .
17.2 Admissibility . . . . . . .
9
17.3 Bayes Rule . . . . . . . .
17.4 Minimax Rules . . . . . .
9
9 18 Linear Regression
9
18.1 Simple Linear Regression
9
18.2 Prediction . . . . . . . . .
18.3 Multiple Regression . . .
9
18.4 Model Selection . . . . . .
10
.
.
.
.
.
11 20 Stochastic Processes
20.1 Markov Chains . . . . . . . . . .
11
20.2 Poisson Processes . . . . . . . . .
12
12
21 Time Series
12
21.1 Stationary Time Series . . . . . .
13
21.2 Estimation of Correlation . . . .
13
21.3 Non-Stationary Time Series . . .
21.3.1 Detrending . . . . . . . .
13
21.4 ARIMA models . . . . . . . . . .
21.4.1 Causality and Invertibility
14
21.5 Spectral Analysis . . . . . . . . .
14
14 22 Math
22.1 Gamma Function . . . . . . . . .
15
22.2 Beta Function . . . . . . . . . . .
15
22.3 Series . . . . . . . . . . . . . . .
15
22.4 Combinatorics . . . . . . . . . .
16
Generating
9 Multivariate Distributions
9.1 Standard Bivariate Normal . . . . . . .
9.2 Bivariate Normal . . . . . . . . . . . . .
9.3 Multivariate Normal . . . . . . . . . . .
10 Convergence
10.1 Law of Large Numbers (LLN) . . . . . .
10.2 Central Limit Theorem (CLT) . . . . . 10
22
. . . . 22
. . . . 22
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
24
24
24
24
25
25
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
26
26
27
27
Distribution Overview
1.1
Discrete Distributions
Notation1
FX (x)
Unif {a, . . . , b}
Uniform
x<a
axb
x>b
(1 p)1x
bxca+1
ba
1
Bernoulli
Bern (p)
Binomial
Multinomial
Hypergeometric
Hyp (N, m, n)
Negative Binomial
NBin (n, p)
Geometric
Geo (p)
Poisson
Po ()
MX (s)
a+b
2
(b a + 1)2 1
12
eas e(b+1)s
s(b a)
px (1 p)1x
p(1 p)
1 p + pes
np
np(1 p)
(1 p + pes )n
x N+
x
X
i
i!
i=0
Uniform (discrete)
nm(N n)(N m)
N 2 (N 1)
nm
N
r
1p
p
x e
x!
1p
p2
p
1 (1 p)es
e(e
1)
Poisson
p = 0.2
p = 0.5
p = 0.8
=1
=4
= 10
0.2
0.6
PMF
0.4
PMF
0.15
PMF
0.10
0.05
x
1 We
10
20
x
30
40
0.0
0.0
0.1
0.2
0.00
PMF
0.20
0.3
0.25
n = 40, p = 0.3
n = 30, p = 0.6
n = 25, p = 0.9
r
p
1 (1 p)es
Geometric
pi e
N/A
1p
p2
1
p
x N+
Binomial
!n
si
i=0
nx
N
x
p(1 p)x1
k
X
npi (1 pi )
npi
!
x+r1 r
p (1 p)x
r1
Ip (r, x + 1)
1 (1 p)x
V [X]
k
X
n!
x
xi = n
px1 1 pkk
x1 ! . . . xk !
i=1
m mx
Mult (n, p)
x np
p
np(1 p)
E [X]
!
n x
p (1 p)nx
x
I1p (n x, x + 1)
Bin (n, p)
fX (x)
0.8
10
10
15
20
use the notation (s, x) and (x) to refer to the Gamma functions (see 22.1), and use B(x, y) and Ix to refer to the Beta functions (see 22.2).
3
1.2
Continuous Distributions
Notation
FX (x)
Uniform
Unif (a, b)
Normal
N , 2
x<a
a<x<b
1
x>b
Z x
(x) =
(t) dt
xa
ba
Log-Normal
ln N , 2
Multivariate Normal
MVN (, )
Students t
Student()
Chi-square
2k
1
1
ln x
+ erf
2
2
2 2
fX (x)
E [X]
V [X]
MX (s)
exp
2 2
x 2 2
a+b
2
(b a)2
12
esb esa
s(b a)
2 s2
exp s +
2
,
(k/2)
2 2
1 (x)
(+1)/2
+1
x2
2
1+
2
1
xk/2 ex/2
2k/2 (k/2)
r
e+
/2
(e 1)e2+
2k
d2
d2 2
2d22 (d1 + d2 2)
d1 (d2 2)2 (d2 4)
1
exp T s + sT s
2
F
F(d1 , d2 )
Exponential
Exp ()
Gamma
Inverse Gamma
Dirichlet
Beta
Weibull
Pareto
d1 x
d1 x+d2
Gamma (, )
InvGamma (, )
d1 d1
,
2 2
Pareto(xm , )
(, x/)
()
, x
()
1
x1 ex/
()
1 /x
x
e
()
P
k
k
i=1 i Y 1
xi i
Qk
i=1 (i ) i=1
>1
1
2
>2
( 1)2 ( 2)2
( + ) 1
x
(1 x)1
() ()
+
1
1 +
k
( + )2 ( + + 1)
2
2 1 +
2
k
xm
>1
1
x
m
>2
( 1)2 ( 2)
1 e(x/)
1
d1
, 2
2
1 x/
e
Ix (, )
Weibull(, k)
xB
d1
1 ex/
Dir ()
Beta (, )
(d1 x)d1 d2 2
x
m
x xm
k x k1 (x/)k
e
x
m
+1
x
x xm
i
Pk
i=1 i
1
(s < 1/)
1 s
1
(s < 1/)
1 s
p
2(s)/2
4s
K
()
E [Xi ] (1 E [Xi ])
Pk
i=1 i + 1
1+
X
k=1
X
n=0
k1
Y
r=0
+r
++r
sk
k!
sn n
n
1+
n!
k
(xm s) (, xm s) s < 0
Normal
Lognormal
Student's t
1.0
= 0, 2 = 3
= 2, 2 = 2
= 0, 2 = 1
= 0.5, 2 = 1
= 0.25, 2 = 1
= 0.125, 2 = 1
=1
=2
=5
=
0.2
PDF
0.4
(x)
0.6
0.6
0.3
0.8
0.8
= 0, 2 = 0.2
= 0, 2 = 1
= 0, 2 = 5
= 2, 2 = 0.5
0.4
Uniform (continuous)
0.1
4
0.0
0.0
0.2
0.2
0.0
0.0
0.5
1.0
1.5
2.5
3.0
Exponential
Gamma
2.0
=2
=1
= 0.4
= 1, = 2
= 2, = 2
= 3, = 2
= 5, = 1
= 9, = 0.5
0.4
3.0
d1 = 1, d2 = 1
d1 = 2, d2 = 1
d1 = 5, d2 = 2
d1 = 100, d2 = 1
d1 = 100, d2 = 100
0.3
PDF
0.2
1.0
1.5
0.1
0
0.0
0
Inverse Gamma
Beta
10
15
20
Pareto
xm = 1, = 1
xm = 1, = 2
xm = 1, = 4
= 1, k = 0.5
= 1, k = 1
= 1, k = 1.5
= 1, k = 5
2.0
2.5
Weibull
= 0.5, = 0.5
= 5, = 1
= 1, = 3
= 2, = 2
= 2, = 5
3.0
= 1, = 1
= 2, = 1
= 3, = 1
= 3, = 0.5
3
x
3
x
0.0
0.2
0.4
0.6
x
0.8
1.0
2
0
0.0
0.0
0.5
0.5
1.0
1.0
1.5
PDF
2
1.5
2.0
2.5
0.0
0.0
0.0
0.5
0.1
0.5
1.0
0.2
0.3
2.0
1.5
2.5
0.4
0.5
k=1
k=2
k=3
k=4
k=5
2.0
0.5
0.4
1
ba
0.0
0.5
1.0
1.5
x
2.0
2.5
Probability Theory
Definitions
P [B] =
Sample space
Outcome (point or element)
Event A
-algebra A
P [B|Ai ] P [Ai ]
n
G
Ai
i=1
Bayes Theorem
P [B | Ai ] P [Ai ]
P [Ai | B] = Pn
j=1 P [B | Aj ] P [Aj ]
Inclusion-Exclusion Principle
[
X
n
n
Ai =
(1)r1
Probability distribution P
1. P [A] 0 for every A
2. P [] = 1
" #
G
X
3. P
Ai =
P [Ai ]
i=1
r=1
n
G
Ai
i=1
\
r
A
ij
Random Variables
Random Variable
i=1
X:R
Probability space (, A, P)
Probability Mass Function (PMF)
Properties
i=1
1. A
S
2. A1 , A2 , . . . , A = i=1 Ai A
3. A A = A A
i=1
n
X
P [] = 0
B = B = (A A) B = (A B) (A B)
P [A] = 1 P [A]
P [B] = P [A B] + P [A B]
P [] = 1
P [] = 0
S
T
T
S
( n An ) = n An ( n An ) = n An
DeMorgan
S
T
P [ n An ] = 1 P [ n An ]
P [A B] = P [A] + P [B] P [A B]
= P [A B] P [A] + P [B]
P [A B] = P [A B] + P [A B] + P [A B]
P [A B] = P [A] P [A B]
f (x) dx
a
FX (x) = P [X x]
Continuity of Probabilities
A1 A2 . . . = limn P [An ] = P [A]
A1 A2 . . . = limn P [An ] = P [A]
S
where A = i=1 Ai
T
where A = i=1 Ai
Z
P [a Y b | X = x] =
fY |X (y | x)dy
ab
Independence
fY |X (y | x) =
A
B P [A B] = P [A] P [B]
f (x, y)
fX (x)
Independence
Conditional Probability
P [A | B] =
P [A B]
P [B]
if
P [B] > 0
1. P [X x, Y y] = P [X x] P [Y y]
2. fX,Y (x, y) = fX (x)fY (y)
6
3.1
Transformations
E [XY ] =
Transformation function
E [(Y )] 6= (E [X])
(cf. Jensen inequality)
P [X Y ] = 0 = E [X] E [Y ] P [X = Y ] = 1 = E [X] = E [Y ]
X
E [X] =
P [X x]
Z = (X)
Discrete
X
fZ (z) = P [(X) = z] = P [{x : (x) = z}] = P X 1 (z) =
f (x)
x1 (z)
x=1
Sample mean
n
X
n = 1
X
Xi
n i=1
Continuous
Z
FZ (z) = P [(X) z] =
with Az = {x : (x) z}
f (x) dx
Az
Conditional Expectation
Z
E [Y | X = x] = yf (y | x) dy
E [X] = E [E [X | Y ]]
(x, y)fY |X (y | x) dx
Z
E [(Y, Z) | X = x] =
E [Z] =
Z
(x) dFX (x)
Z
E [IA (x)] =
Z
E[(X, Y ) | X = x] =
E [Y + Z | X] = E [Y | X] + E [Z | X]
E [(X)Y | X] = (X)E [Y | X]
E[Y | X] = c = Cov [X, Y ] = 0
Z
dFX (x) = P [X A]
Convolution
Z
Z := X + Y
fX,Y (x, z x) dx
fZ (z) =
X,Y 0
fX,Y (x, z x) dx
Z := |X Y |
Z :=
X
Y
fZ (z) = 2
fX,Y (x, z + x) dx
0
Z
Z
fZ (z) =
|x|fX,Y (x, xz) dx =
xfx (x)fX (x)fY (xz) dx
Variance
2
2
V [X] = X
= E (X E [X])2 = E X 2 E [X]
" n
#
n
X
X
X
V
Xi =
V [Xi ] + 2
Cov [Xi , Yj ]
i=1
Expectation
" n
X
i=1
#
Xi =
i=1
n
X
i6=j
V [Xi ]
iff Xi
Xj
i=1
Standard deviation
Expectation
sd[X] =
Z
E [X] = X =
Variance
x dFX (x) =
xfX (x)
x
Z
xfX (x)
P [X = c] = 1 = E [c] = c
E [cX] = c E [X]
E [X + Y ] = E [X] + E [Y ]
X discrete
X continuous
V [X] = X
Covariance
n
m
n X
m
X
X
X
Cov
Xi ,
Yj =
Cov [Xi , Yj ]
i=1
j=1
Correlation
Negative Binomial
i=1 j=1
Cov [X, Y ]
[X, Y ] = p
V [X] V [Y ]
Independence
Poisson
X
Y = [X, Y ] = 0 Cov [X, Y ] = 0 E [XY ] = E [X] E [Y ]
Xi Po (i ) Xi Xj =
Sample variance
Xi Po
i=1
S2 =
n
X
1 X
n )2
(Xi X
n 1 i=1
Conditional Variance
2
V [Y | X] = E (Y E [Y | X])2 | X = E Y 2 | X E [Y | X]
V [Y ] = E [V [Y | X]] + V [E [Y | X]]
n
X
!
i
i=1
X
n
X
n
Xj , Pn
Xj Bin
Xi Po (i ) Xi Xj = Xi
j
j=1
j=1
j=1
Exponential
Xi Exp () Xi
Xj =
n
X
Xi Gamma (n, )
i=1
Inequalities
Cauchy-Schwarz
Normal
X N , 2
2
E [XY ] E X 2 E Y 2
Markov
P [(X) t]
E [(X)]
t
Chebyshev
P [|X E [X]| t]
Chernoff
P [X (1 + )]
V [X]
t2
e
(1 + )1+
> 1
Distribution Relationships
Gamma
X Gamma (, ) X/ Gamma (, 1)
P
Gamma (, ) i=1 Exp ()
P
P
Xi Gamma (i , ) Xi
Xj =
i Xi Gamma (
i i , )
Z
()
1 x
=
x
e
dx
0
Beta
Binomial
Xi Bern (p) =
N (0, 1)
X N , Z = aX + b = Z N a + b, a2 2
X N 1 , 12 Y N 2 , 22 = X + Y N 1 + 2 , 12 + 22
P
P
P
2
Xi N i , i2 =
X N
i i ,
i i
i i
a
P [a < X b] = b
(x) = 1 (x)
0 (x) = x(x)
00 (x) = (x2 1)(x)
1
Upper quantile of N (0, 1): z = (1 )
Jensen
n
X
1
( + ) 1
x1 (1 x)1 =
x
(1 x)1
B(, )
()()
B( + k, )
+k1
E Xk =
=
E X k1
B(, )
++k1
Beta (1, 1) Unif (0, 1)
Xi Bin (n, p)
i=1
X
(Y E [Y ])
Y
p
V [X | Y ] = X 1 2
E [X | Y ] = E [X] +
|t| < 1
Xt
MX (t) = GX (e ) = E e
=E
"
#
X (Xt)i
i!
i=0
X
E Xi
=
ti
i!
i=0
P [X = 0] = GX (0)
P [X = 1] = G0X (0)
P [X = i] =
9.3
Multivariate Normal
(Precision Matrix 1 )
V [X1 ]
Cov [X1 , Xk ]
..
..
..
=
.
.
.
(i)
GX (0)
Covariance Matrix
i!
E [X] = G0X (1 )
(k)
E X k = MX (0)
X!
(k)
E
= GX (1 )
(X k)!
Cov [Xk , X1 ]
If X N (, ),
1/2
fX (x) = (2)n/2 ||
GX (t) = GY (t) = X = Y
9
9.1
1
exp (x )T 1 (x )
2
Properties
Multivariate Distributions
Let X, Y N (0, 1) X
Z with Y = X +
V [Xk ]
1 2 Z
Z N (0, 1) X = + 1/2 Z = X N (, )
X N (, ) = 1/2 (X ) N (0, 1)
X N (, ) = AX N A, AAT
X N (, ) a is vector of length k = aT X N aT , aT a
Joint density
f (x, y) =
2
1
x + y 2 2xy
p
exp
2(1 2 )
2 1 2
10
Conditionals
(Y | X = x) N x, 1 2
and
(X | Y = y) N y, 1 2
Convergence
Let {X1 , X2 , . . .} be a sequence of rvs and let X be another rv. Let Fn denote
the cdf of Xn and let F denote the cdf of X.
Types of Convergence
Independence
X
Y = 0
9.2
Bivariate Normal
Let X N x , x2 and Y N y , y2 .
f (x, y) =
"
z=
x x
x
2x y
2
+
1
p
2. In probability: Xn X
z
exp
2
2(1 2 )
1
y y
y
t where F continuous
2
2
x x
x
y y
y
n
as
#
qm
CLT Notations
Zn N (0, 1)
2
Xn N ,
n
2
Xn N 0,
n
2
n ) N 0,
n(X
n(Xn )
N (0, 1)
n
lim E (Xn X)2 = 0
Relationships
qm
Xn X = Xn X = Xn X
as
P
Xn X = Xn X
D
P
Xn X (c R) P [X = c] = 1 = Xn X
Xn
Xn
Xn
Xn
X
qm
X
P
X
P
X
Yn
Yn
Yn
=
Y = Xn + Yn X + Y
qm
qm
Y = Xn + Yn X + Y
P
P
Y = Xn Yn XY
P
(Xn ) (X)
Continuity Correction
x + 12
/ n
x 12
P Xn x 1
/ n
Xn X = (Xn ) (X)
qm
Xn b limn E [Xn ] = b limn V [Xn ] = 0
qm
n
X1 , . . . , Xn iid E [X] = V [X] < X
n x
P X
Slutzkys Theorem
D
Delta Method
Xn X and Yn c = Xn + Yn X + c
D
P
D
Xn X and Yn c = Xn Yn cX
D
D
D
In general: Xn X and Yn Y =
6
Xn + Yn X + Y
Yn N
11
10.1
2
,
n
= (Yn ) N
2
(), ( ())
n
0
Statistical Inference
iid
11.1
Weak (WLLN)
P
n
X
as n
as
n
X
as n
Strong (SLLN)
10.2
V Xn
lim P [Zn z] = (z)
Point Estimation
where Z N (0, 1)
zR
P
Consistency: bn
Sampling distribution: F (bn )
r h i
b
Standard error: se(n ) = V bn
h
i
h i
Mean squared error: mse = E (bn )2 = bias(bn )2 + V bn
11.2
b 2 . Let z/2
Suppose bn N , se
and P z/2 < Z < z/2 = 1 where Z N (0, 1). Then
= 1 (1 (/2)), i.e., P Z > z/2 = /2
b
Cn = bn z/2 se
11.3
i=1
I(Xi x)
n
(
1
I(Xi x) =
0
Xi x
Xi > x
11.4
Statistical Functionals
Statistical functional: T (F )
Plug-in estimator of = T (F ) : bn = T (Fn )
R
Linear functional: T (F ) = (x) dFX (x)
Plug-in estimator for linear functional:
b3 j
Pn
n )(Yi Yn )
(Xi X
qP
= qP i=1
n
n
2
(X
X
)
i
n
i=1
i=1 (Yi Yn )
12
Parametric Inference
Let F = f (x; : be a parametric model with parameter space Rk
and parameter = (1 , . . . , k ).
12.1
Method of Moments
j th moment
j () = E X j =
1X
(Xi )
(x) dFbn (x) =
n i=1
b 2 = T (Fn ) z/2 se
b
Often: T (Fn ) N T (F ), se
T (Fn ) =
xj dFX (x)
j th sample moment
n
j =
1X j
X
n i=1 i
n(b ) N (0, )
where = gE Y Y T g T , Y = (X, X 2 , . . . , X k )T ,
1
g = (g1 , . . . , gk ) and gj =
j ()
12.2
Maximum Likelihood
Likelihood: Ln : [0, )
Ln () =
n
Y
f (Xi ; )
i=1
n
X
log f (Xi ; )
i=1
12.2.1
Delta Method
(b
n ) D
N (0, 1)
b )
se(b
Ln (bn ) = sup Ln ()
Score Function
s(X; ) =
log f (X; )
Fisher Information
I() = V [s(X; )]
In () = nI()
Fisher Information (exponential family)
I() = E s(X; )
12.3
Multiparameter Models
2 `n
2
Hjk =
n
2 X
log f (Xi ; )
2 i=1
2 `n
j k
..
.
E [Hk1 ]
E [H11 ]
..
In () =
.
E [H1k ]
..
.
E [Hkk ]
12
13
Hypothesis Testing
H0 : 0
(bj j ) D
N (0, 1)
bj
se
h
i
b 2j = Jn (j, j) and Cov bj , bk = Jn (j, k)
where se
12.3.1
1
.
=
..
k
Definitions
Null hypothesis H0
Alternative hypothesis H1
Simple hypothesis = 0
Composite hypothesis > 0 or < 0
Two-sided test: H0 : = 0 versus H1 : 6= 0
One-sided test: H0 : 0 versus H1 : > 0
Critical value c
Test statistic T
Rejection Region R = {x : T (x) > c}
Power function () = P [X R]
Power of a test: 1 P [Type II error] = 1 = inf ()
(b
) D
N (0, 1)
b )
se(b
H0 true
H1 true
T
Type II error ()
1F (T (X))
p-value
< 0.01
0.01 0.05
0.05 0.1
> 0.1
Jn
b and
= b.
and Jn = Jn ()
=
12.4
Retain H0
Reject H0
Type
I error ()
(power)
p-value = sup0 P [T (X) T (x)] = inf : T (x) R
p-value = sup0
P [T (X ? ) T (X)]
= inf : T (X) R
|
{z
}
where
b ) =
se(b
p-value
b Then,
Suppose =b 6= 0 and b = ().
r
H1 : 1
versus
Parametric Bootstrap
Sample from f (x; bn ) instead of from Fn , where bn could be the mle or method
of moments estimator.
since T (X ? )F
evidence
very strong evidence against H0
strong evidence against H0
weak evidence against H0
little or no evidence against H0
Wald Test
Two-sided test
b 0
Reject H0 when |W | > z/2 where W =
b
se
P |W | > z/2
p-value = P0 [|W | > |w|] P [|Z| > |w|] = 2(|w|)
Likelihood Ratio Test (LRT)
T (X) =
sup Ln ()
Ln (bn )
=
sup0 Ln ()
Ln (bn,0 )
13
k
X
iid
i=1
p-value = P0 [(X) > (x)] P 2rq > (x)
Multinomial LRT
Xk
X1
,...,
be the mle
Let pn =
n
n
Xj
k
Y
Ln (
pn )
pj
T (X) =
=
Ln (p0 )
p0j
j=1
k
X
pj
D
Xj log
(X) = 2
2k1
p
0j
j=1
The approximate size LRT rejects H0 when (X) 2k1,
xn = (x1 , . . . , xn )
Prior density f ()
Likelihood f (xn | ): joint density of the data
n
Y
In particular, X n iid = f (xn | ) =
f (xi | ) = Ln ()
i=1
Posterior density f ( | xn )
R
Normalizing constant cn = f (xn ) = f (x | )f () d
Kernel: part of a density that depends Ron
R
Ln ()f ()
Posterior Mean n = f ( | xn ) d = R
Ln ()f () d
14.1
Credible Intervals
1 Posterior Interval
Pearson Test
n
k
X
(Xj E [Xj ])2
where E [Xj ] = np0j under H0
T =
E [Xj ]
j=1
D
f ( | xn ) d = 1
P [ (a, b) | x ] =
a
2k1
T
p-value = P 2k1 > T (x)
f ( | xn ) d =
2
Faster Xk1
than LRT, hence preferable for small n
f ( | xn ) d = /2
Independence Testing
I rows, J columns, X multinomial sample of size n = I J
X
mles unconstrained: pij = nij
X
1. P [ Rn ] = 1
2. Rn = { : f ( | xn ) > k} for some k
Rn is unimodal = Rn is an interval
Let = () and A = { : () }.
Posterior CDF for
14
14.2
Function of Parameters
Bayesian Inference
H(r | xn ) = P [() | xn ] =
f ( | xn ) d
Bayes Theorem
Posterior Density
f (x | )f ()
f (x | )f ()
f ( | x) =
=R
Ln ()f ()
n
f (x )
f (x | )f () d
h( | xn ) = H 0 ( | xn )
Bayesian Delta Method
Definitions
n
X = (X1 , . . . , Xn )
b
b se
b 0 ()
| X n N (),
14
14.3
Priors
Choice
Subjective Bayesianism: prior should incorporate as much detail as possible
the researchs a priori knowledge via prior elicitation.
Objective Bayesianism: prior should incorporate as little detail as possible
(non-informative prior).
Robust Bayesianism: consider various priors and determine sensitivity of
our inferences to changes in the prior.
Types
Flat: f () constant
R
Proper: f () d = 1
R
Improper: f () d =
Jeffreys prior (transformation-invariant):
f ()
I()
f ()
det(I())
14.3.1
Conjugate Priors
Discrete likelihood
Likelihood
Bernoulli(p)
Binomial(p)
Conjugate Prior
Beta(, )
Beta(, )
Posterior hyperparameters
+
+
n
X
i=1
n
X
xi , + n
xi , +
i=1
Negative Binomial(p)
Poisson()
Beta(, )
Gamma(, )
+ rn, +
+
n
X
n
X
n
X
i=1
xi
Multinomial(p)
Dirichlet()
xi , + n
x(i)
i=1
Geometric(p)
Beta(, )
Conjugate Prior
Uniform(0, )
Pareto(xm , k)
Exponential()
Gamma(, )
Posterior hyperparameters
max x(n) , xm , k + n
n
X
+ n, +
xi
i=1
Normal(, c2 )
Normal(0 , 02 )
Normal(c , 2 )
Normal(, 2 )
Normalscaled
Inverse
Gamma(, , , )
MVN(, c )
MVN(0 , 0 )
MVN(c , )
InverseWishart(, )
Pareto(xmc , k)
Gamma(, )
Pareto(xm , kc )
Pareto(x0 , k0 )
Gamma(c , )
Gamma(0 , 0 )
Pn
n
0
1
i=1 xi
+
+ 2 ,
/
2
2
02
c
0
c1
n
1
+ 2
02
c
Pn
02 + i=1 (xi )2
+ n,
+n
n
+ n
x
,
+ n,
+ ,
+n
2
n
1X
(
x )2
2
+
(xi x
) +
2 i=1
2(n + )
1
1
0 + nc
+ n, +
n
X
i=1
1
1
1
x
,
0 0 + n
1 1
1
0 + nc
n
X
n + , +
(xi c )(xi c )T
i=1
n
X
xi
xm c
i=1
x0 , k0 kn where k0 > kn
n
X
0 + nc , 0 +
xi
+ n, +
log
i=1
xi
i=1
Ni
n
X
14.4
xi
Bayesian Testing
If H0 : 0 :
i=1
Z
Prior probability P [H0 ] =
i=1
i=1
n
X
n
X
Likelihood
f () d
Z0
f ( | xn ) d
f (xn | Hk )P [Hk ]
P [Hk | xn ] = PK
,
n
k=1 f (x | Hk )P [Hk ]
15
Marginal Likelihood
f (xn | Hi ) =
f (xn | , Hi )f ( | Hi ) d
f (xn | Hi )
f (xn | Hj )
| {z }
P [Hi ]
P [Hj ]
| {z }
prior odds
Bayes Factor
log10 BF10 BF10
evidence
0 0.5
1 1.5
Weak
0.5 1
1.5 10 Moderate
12
10 100 Strong
>2
> 100
Decisive
p
BF
10
1p
p =
where p = P [H1 ] and p = P [H1 | xn ]
p
1 + 1p
BF10
15
Exponential Family
b=1
16.1.1
Normal-based Interval
Tn z/2 se
boot
1.
2.
3.
4.
!2
Pivotal Interval
Scalar parameter
vboot = V
Fn
B
B
1 X
1 X
Tn,b
T
=
B
B r=1 n,r
s
X
Location parameter = T (F )
Pivot Rn = bn
Let H(r) = P [Rn r] be the cdf of Rn
Let Rn,b
= bn,b
bn . Approximate H using bootstrap:
B
1 X
H(r)
=
I(Rn,b
r)
B
i=1
b=1
, . . . , bn,B
)
5. Let denote the sample quantile of (bn,1
a
=
b =
16
16.1
Sampling Methods
The Bootstrap
1 1 =
bn H
2
1 =
bn H
2
bn r1/2
=
2bn 1/2
bn r/2
=
2bn /2
Percentile Interval
Cn = /2
, 1/2
16
16.2
Rejection Sampling
Setup
We can easily sample from g()
We want to sample from h(), but it is difficult
k()
k() d
Envelope condition: we can find M > 0 such that k() M g()
We know h() up to proportional constant: h() = R
Algorithm
1. Draw cand g()
2. Generate u Unif (0, 1)
k(cand )
3. Accept cand if u
M g(cand )
4. Repeat until B values of cand have been accepted
Example
Loss functions
Squared error loss: L(, a) = ( a)2
(
K1 ( a) a < 0
Linear loss: L(, a) =
K2 (a ) a 0
Absolute error loss: L(, a) = | a| (linear loss with K1 = K2 )
Lp loss: L(, a) = | a|p
(
0 a=
Zero-one loss: L(, a) =
1 a 6=
17.1
16.3
Posterior Risk
Z
h
i
b
b
L(, (x))f
( | x) d = E|X L(, (x))
h
i
b
b
L(, (x))f
(x | ) dx = EX| L(, (X))
r(b | x) =
(Frequentist) Risk
b =
R(, )
Bayes Risk
ZZ
Importance Sampling
b =
r(f, )
Ln (i )
2. For each i = 1, . . . , B, calculate wi = PB
i=1 Ln (i )
PB
n
3. E [q() | x ] i=1 q(i )wi
Decision Theory
Definitions
Unknown quantity affecting our decision:
h
i
b
b
L(, (x))f
(x, ) dx d = E,X L(, (X))
h
h
ii
h
i
b = E EX| L(, (X)
b
b
r(f, )
= E R(, )
h
h
ii
h
i
b = EX E|X L(, (X)
b
r(f, )
= EX r(b | X)
17
Risk
17.2
Admissibility
b0 dominates b if
b
: R(, b0 ) R(, )
b
: R(, b0 ) < R(, )
17
17.3
Bayes Rule
b = inf e r(f, )
e
r(f, )
R
b
b
b = r(b | x)f (x) dx
(x) = inf r( | x) x = r(f, )
n
X
2i
i=1
17.4
n
b0 = Yn b1 X
Pn
Pn
n )(Yi Yn )
(Xi X
i=1 Xi Yi nXY
Pn
b1 = i=1
=
P
n
2
2
2
i=1 (Xi Xn )
i=1 Xi nX
h
i
0
E b | X n =
1
P
h
i
2 n1 ni=1 Xi2 X n
n
b
V |X =
X n
1
nsX
r Pn
2
b
i=1 Xi
b b0 ) =
se(
n
sX n
b b1 ) =
se(
sX n
Minimax Rules
Maximum Risk
b = sup R(, )
b
)
R(
R(a)
= sup R(, a)
Minimax Rule
e
e = inf sup R(, )
b = inf R(
)
sup R(, )
b =c
b = Bayes rule c : R(, )
Least Favorable Prior
bf = Bayes rule R(, bf ) r(f, bf )
18
Pn
where s2X = n1 i=1 (Xi X n )2 and
b2 =
of . Further properties:
Pn
2i
i=1
an (unbiased) estimate
Linear Regression
P
P
Consistency: b0 0 and b1 1
Asymptotic normality:
Definitions
Response variable Y
Covariate X (aka predictor variable or feature)
18.1
1
n2
b0 0 D
N (0, 1)
b b0 )
se(
b1 1 D
N (0, 1)
b b1 )
se(
Model
Yi = 0 + 1 Xi + i
and
E [i | Xi ] = 0, V [i | Xi ] = 2
b b0 )
b0 z/2 se(
b b1 )
and b1 z/2 se(
Fitted Line
The Wald test for testing H0 : 1 = 0 vs. H1 : 1 6= 0 is: reject H0 if
b b1 ).
|W | > z/2 where W = b1 /se(
rb(x) = b0 + b1 x
Predicted (Fitted) Values
Ybi = rb(Xi )
Residuals
i = Yi Ybi = Yi b0 + b1 Xi
R2
Pn b
Pn 2
2
rss
i=1 (Yi Y )
R = Pn
= 1 Pn i=1 i 2 = 1
2
tss
i=1 (Yi Y )
i=1 (Yi Y )
2
18
Likelihood
L=
L1 =
L2 =
n
Y
i=1
n
Y
i=1
n
Y
f (Xi , Yi ) =
n
Y
fX (Xi )
i=1
n
Y
b = (X T X)1 X T Y
i
h
V b | X n = 2 (X T X)1
fY |X (Yi | Xi ) = L1 L2
i=1
b N , 2 (X T X)1
fX (Xi )
(
fY |X (Yi | Xi )
i=1
2
1 X
Yi (0 1 Xi )
exp 2
2 i
1X 2
b2 =
n i=1 i
18.2
k
X
bj xj
j=1
Under the assumption of Normality, the least squares estimator is also the mle
n
b2 =
n
1 X 2
n k i=1 i
= X b Y
mle
Prediction
b=X
b2 =
nk 2
1 Confidence Interval
Yb = b0 + b1 x
h i
h i
h i
h
i
V Yb = V b0 + x2 V b1 + 2x Cov b0 , b1
Prediction Interval
Procedure
1. Assign a score to each model
2. Search through all models to find the one with the highest score
Y = X +
where
X11
..
X= .
Xn1
..
.
Model Selection
Multiple Regression
18.4
Pn
2
2
2
i=1 (Xi X )
b
P
n =
b
2j + 1
n i (Xi X)
Yb z/2 bn
18.3
b bj )
bj z/2 se(
X1k
..
.
1
= ...
Xnk
1
..
=.
n
Likelihood
2 n/2
L(, ) = (2 )
1
exp 2 rss
2
N
X
i=1
(Yi xTi )2
Hypothesis Testing
H0 : j = 0 vs. H1 : j 6= 0 j J
Mean Squared Prediction Error (mspe)
h
i
mspe = E (Yb (S) Y )2
Prediction Risk
R(S) =
n
X
i=1
mspei =
n
X
i=1
h
i
E (Ybi (S) Yi )2
19
19
Training Error
n
X
btr (S) =
R
(Ybi (S) Yi )2
19.1
i=1
R2 (S) = 1
btr (S)
R
rss(S)
=1
=1
tss
tss
Pn b
2
i=1 (Yi (S) Y )
P
n
2
i=1 (Yi Y )
Density Estimation
R
Estimate f (x), where f (x) = P [X A] = A f (x) dx.
Integrated Square Error (ise)
Z
Z
2
L(f, fbn ) =
f (x) fbn (x) dx = J(h) + f 2 (x) dx
Frequentist Risk
Z
h
i Z
R(f, fbn ) = E L(f, fbn ) = b2 (x) dx + v(x) dx
h
i
b(x) = E fbn (x) f (x)
h
i
v(x) = V fbn (x)
n
i
h
i
X
btr (S)) = E R
btr (S) R(S) = 2
bias(R
Cov Ybi , Yi
i=1
Adjusted R2
19.1.1
n 1 rss
R (S) = 1
n k tss
Histograms
Definitions
Mallows Cp statistic
b
btr (S) + 2kb
R(S)
=R
2 = lack of fit + complexity penalty
Akaike Information Criterion (AIC)
`n (bS ,
bS2 )
fbn (x) =
k
BIC(S) = `n (bS ,
bS2 ) log n
2
Validation and Training
(Ybi (S) Yi )2
i=1
Leave-one-out Cross-validation
bCV (S) =
R
n
X
i=1
m
X
pbj
j=1
m
X
Number of bins m
1
Binwidth h = m
Bin Bj has j observations
R
Define pbj = j /n and pj = Bj f (u) du
Histogram Estimator
AIC(S) =
bV (S) =
R
(Yi Yb(i) ) =
n
X
i=1
Yi Ybi (S)
1 Uii (S)
!2
n
n
or
4
2
I(x Bj )
h
i p
j
E fbn (x) =
h
h
i p (1 p )
j
j
V fbn (x) =
nh2
Z
h2
1
2
b
R(fn , f )
(f 0 (u)) du +
12
nh
!1/3
1
6
h = 1/3 R
2 du
n
(f 0 (u))
2/3 Z
1/3
C
3
2
R (fbn , f ) 2/3
C=
(f 0 (u)) du
4
n
Cross-validation estimate of E [J(h)]
Z
n
m
2Xb
2
n+1 X 2
2
b
b
f(i) (Xi ) =
pb
JCV (h) = fn (x) dx
n i=1
(n 1)h (n 1)h j=1 j
20
19.1.2
Kernel K
i:xi Nk (x)
K(x) 0
R
K(x) dx = 1
R
xK(x) dx = 0
R 2
2
>0
x K(x) dx K
n
X
wi (x)Yi
i=1
KDE
xxi
h
Pn
xxj
j=1 K
h
wi (x) =
n
1X1
x Xi
fbn (x) =
K
n i=1 h
h
Z
Z
1
1
4
00
2
b
R(f, fn ) (hK )
(f (x)) dx +
K 2 (x) dx
4
nh
Z
Z
2/5 1/5 1/5
c
c2 c3
2
2
h = 1
c
=
,
c
=
K
(x)
dx,
c
=
(f 00 (x))2 dx
1
2
3
K
n1/5
Z
4/5 Z
1/5
5 2 2/5
c4
c4 = (K
R (f, fbn ) = 4/5
)
K 2 (x) dx
(f 00 )2 dx
4
n
|
{z
}
R(b
rn , r)
h4
4
Z
[0, 1]
4 Z
2
f 0 (x)
x2 K 2 (x) dx
r00 (x) + 2r0 (x)
dx
f (x)
R
2 K 2 (x) dx
dx
nhf (x)
Z
c1
n1/5
c2
R (b
rn , r) 4/5
n
h
C(K)
Epanechnikov Kernel
(
K(x) =
4 5(1x2 /5)
|x| <
JbCV (h) =
19.2
19.3
n
n
n
2Xb
1 X X Xi Xj
2
2
b
fn (x) dx
f(i) (Xi )
K
+
K(0)
2
n i=1
hn i=1 j=1
h
nh
K (2) (x) =
n
X
i=1
otherwise
JbCV (h) =
i=1
!2
K(0)
xx
Pn
j
j=1 K
h
Approximation
r(x) =
X
j=1
j j (x)
J
X
j j (x)
i=1
Multivariate Regression
Non-parametric Regression
Z
K(x y)K(y) dy
n
X
where
i = i
Y = +
0 (x1 )
..
and = .
..
.
0 (xn )
J (x1 )
..
.
J (xn )
b = (T )1 T Y
1
T Y (for equallly spaced observations only)
n
21
2
n
J
X
X
bCV (J) =
Yi
R
j (xi )bj,(i)
i=1
20
j=1
20.2
Poisson Processes
Poisson Process
{Xt : t [0, )} number of events up to and including time t
X0 = 0
Independent increments:
Stochastic Processes
Stochastic Process
(
{0, 1, . . . } = Z discrete
T =
[0, )
continuous
{Xt : t T }
Notations: Xt , X(t)
State space X
Index set T
20.1
Markov Chains
Markov Chain
Rt
0
(s) ds
n T, x X
(t) = Xt Po (t)
Transition probabilities
pij P [Xn+1 = j | Xn = i]
pij (n) P [Xm+n = j | Xm = i]
>0
Waiting Times
n-step
Wt := time at which Xt occurs
i pij = 1
1
Wt Gamma t,
Chapman-Kolmogorov
Interarrival Times
pij (m + n) =
St = Wt+1 Wt
Pm+n = Pm Pn
Pn = P P = Pn
St Exp
1
Marginal probability
n = (n (1), . . . , n (N ))
where
i (i) = P [Xn = i]
St
0 , initial distribution
n = 0 Pn
Wt1
Wt
t
22
21
Time Series
21.1
Mean function
Strictly stationary
xt = E [xt ] =
xft (x) dx
Autocovariance function
x (s, t) = E [(xs s )(xt t )] = E [xs xt ] s t
x (t, t) = E (xt t )2 = V [xt ]
Autocorrelation function (ACF)
k N, tk , ck , h Z
Weakly stationary
E x2t <
t Z
2
E xt = m
t Z
x (s, t) = x (s + r, t + r)
(s, t)
Cov [xs , xt ]
=p
(s, t) = p
V [xs ] V [xt ]
(s, s)(t, t)
Cross-covariance function (CCV)
r, s, t Z
Autocovariance function
xy (s, t)
x (s, s)y (t, t)
h Z
Backshift operator
B k (xt ) = xtk
Difference operator
Cov [xt+h , xt ]
(t + h, t)
(h)
x (h) = p
=p
=
(0)
V [xt+h ] V [xt ]
(t + h, t + h)(t, t)
d = (1 B)d
White Noise
2
wt wn(0, w
)
iid
2
0, w
Gaussian: wt N
E [wt ] = 0 t T
V [wt ] = 2 t T
w (s, t) = 0 s 6= t s, t T
xy (h) = p
Random Walk
Linear Process
Drift
Pt
xt = t + j=1 wj
E [xt ] = t
xt = +
k
X
j=k
aj xtj
j wtj
where
j=
xy (h)
x (0)y (h)
where aj = aj 0 and
k
X
j=k
aj = 1
(h) =
|j | <
j=
2
w
j+h j
j=
23
21.2
Estimation of Correlation
21.3.1
Detrending
Least Squares
Sample mean
n
1X
x
=
xt
n t=1
Sample variance
n
|h|
1 X
1
x (h)
V [
x] =
n
n
h=n
b(h) =
(xt+h x
)(xt x
)
n t=1
vt =
1
2k+1 :
k
X
1
xt1
2k + 1
i=k
Pk
1
If 2k+1
i=k wtj 0, a linear trend function t = 0 + 1 t passes
without distortion
b(h)
b(0)
Differencing
t = 0 + 1 t = xt = 1
bxy (h) =
nh
1 X
(xt+h x
)(yt y)
n t=1
21.4
ARIMA models
Autoregressive polynomial
(z) = 1 1 z p zp
bxy (h)
bxy (h) = p
bx (0)b
y (0)
z C p 6= 0
Autoregressive operator
(B) = 1 1 B p B p
Properties
1
bx (h) = if xt is white noise
n
1
bxy (h) = if xt or yt is white noise
n
21.3
k1
X
j (wtj )
k,||<1
j=0
j (wtj )
j=0
{z
linear process
xt = t + st + wt
t = trend
st = seasonal component
wt = random noise term
E [xt ] =
j=0
j (E [wtj ]) = 0
(h)
(0)
2 h
w
12
= h
(h) = (h 1) h = 1, 2, . . .
24
Seasonal ARIMA
(z) = 1 + 1 z + + q zq
z C q 6= 0
21.4.1
q
X
j E [wtj ] = 0
2
w
0
Pqh
j=0
j j+h
j=0
wtj = (B)wt
j=0
j=0
0hq
h>q
MA (1)
xt = wt + wt1
2 2
(1 + )w h = 0
2
(h) = w
h=1
0
h>1
(
h=1
2
(h) = (1+ )
0
h>1
(B)xt =
j=0
Xtj = wt
j=0
Properties
ARMA (p, q) causal roots of (z) lie outside the unit circle
(z) =
j z j =
j=0
(z)
(z)
|z| 1
ARMA (p, q)
xt = 1 xt1 + + p xtp + wt + 1 wt1 + + q wtq
ARMA (p, q) invertible roots of (z) lie outside the unit circle
(B)xt = (B)wt
(z) =
j z j =
j=0
xh1
, regression of xi on {xh1 , xh2 , . . . , x1 }
i
hh = corr(xh xh1
, x0 xh1
) h2
0
h
E.g., 11 = corr(x1 , x0 ) = (1)
(z)
(z)
|z| 1
Behavior of the ACF and PACF for causal and invertible ARMA models
ACF
PACF
ARIMA (p, d, q)
d xt = (1 B)d xt is ARMA (p, q)
AR (p)
tails off
cuts off after lag p
MA (q)
cuts off after lag q
tails off q
ARMA (p, q)
tails off
tails off
(1 )j1 xtj + wt
when || < 1
21.5
Spectral Analysis
Periodic process
xt = A cos(2t + )
= U1 cos(2t) + U2 sin(2t)
j=1
x
n+1 = (1 )xn +
xn
25
Amplitude A
Phase
U1 = A cos and U2 = A sin often normally distributed rvs
xt =
q
X
Fourier/Fundamental frequencies
(Uk1 cos(2k t) + Uk2 sin(2k t))
j = j/n
k=1
Inverse DFT
xt = n1/2
j=0
Scaled Periodogram
2 2i0 h 2 2i0 h
e
+
e
=
2
2
Z 1/2
=
e2ih dF ()
4
I(j/n)
n
!2
n
2X
=
xt cos(2tj/n +
n t=1
P (j/n) =
1/2
22
22.1
(h)e2ih
h=
h=
!2
Gamma Function
ts1 et dt
0
Z
Upper incomplete: (s, x) =
ts1 et dt
x
Z x
Lower incomplete: (s, x) =
ts1 et dt
Ordinary: (s) =
Spectral density
2X
xt sin(2tj/n
n t=1
Math
Z
F () = F (1/2) = 0
F () = F (1/2) = (0)
f () =
d(j )e2ij t
I(j/n) = |d(j/n)|2
(h) = 2 cos(20 h)
0
F () = 2 /2
n1
X
Periodogram
xt e2ij t
i=1
Periodic mixture
Needs
n
X
1
1
2
2
R 1/2
1/2
e2ih f () d
f () 0
f () = f ()
f () = f (1 )
R 1/2
(0) = V [xt ] = 1/2 f () d
h = 0, 1, . . .
( + 1) = ()
>1
(n) = (n 1)!
nN
(1/2) =
22.2
Beta Function
Z 1
(x)(y)
tx1 (1 t)y1 dt =
Ordinary: B(x, y) = B(y, x) =
(x + y)
0
Z x
Incomplete: B(x; a, b) =
ta1 (1 t)b1 dt
2
White noise: fw () = w
ARMA (p, q) , (B)xt = (B)wt :
|(e2i )|2
fx () =
|(e2i )|2
Pp
Pq
where (z) = 1 k=1 k z k and (z) = 1 + k=1 k z k
2
w
Regularized incomplete:
a+b1
B(x; a, b) a,bN X
(a + b 1)!
Ix (a, b) =
=
xj (1 x)a+b1j
B(a, b)
j!(a
+
b
j)!
j=a
26
I0 (a, b) = 0
I1 (a, b) = 1
Ix (a, b) = 1 I1x (b, a)
22.3
Series
Finite
n(n + 1)
k=
2
(2k 1) = n2
k=1
n
X
k=1
n
X
k=1
n
X
k2 =
ck =
k=0
cn+1 1
c1
n
X
n
k=0
n
X
=2
k=0
Binomial
Theorem:
n
X
n nk k
a
b = (a + b)n
k
B : D, U : D
k=0
c 6= 1
k > n : Pn,k = 0
f :BU
f arbitrary
n
m
B : D, U : D
n+n1
n
m
X
n
k=1
1
,
1p
p
|p| < 1
1p
k=1
!
X
d
1
1
k
=
p
=
dp 1 p
1 p2
pk =
kpk1 =
B : D, U : D
|p| < 1
ordered
(n i) =
i=0
unordered
Pn,k
f injective
(
mn m n
0
else
m
n
(
1 mn
0 else
(
1 mn
0 else
f surjective
n
m!
m
n1
m1
n
m
Pn,m
f bijective
(
n! m = n
0 else
(
1 m=n
0 else
(
1 m=n
0 else
(
1 m=n
0 else
References
[3] R. H. Shumway and D. S. Stoffer. Time Series Analysis and Its Applications
With R Examples. Springer, 2006.
w/o replacement
nk =
D = distinguishable, D = indistinguishable.
Sampling
k1
Y
n 1 : Pn,0 = 0, P0,0 = 1
Combinatorics
k out of n
m
X
k=1
k=0
22.4
Pn,i
k=0
d
dp
k=0
k=0
X
r+k1 k
x = (1 x)r r N+
k
k=0
X
k
p = (1 + p) |p| < 1 , C
k
n
X
Vandermondes Identity:
r
X
m
n
m+n
=
k
rk
r
k=0
Infinite
pk =
Pn+k,k =
i=1
r+k
r+n+1
=
k
n
k=0
n
X k
n+1
=
m
m+1
n(n + 1)(2n + 1)
6
k=1
2
n
X
n(n + 1)
k3 =
2
Partitions
Binomial
n
X
1kn
(
1 n=0
n
=
0
0 else
n!
(n k)!
n
nk
n!
=
=
k
k!
k!(n k)!
w/ replacement
nk
n1+r
n1+r
=
r
n1
28