Professional Documents
Culture Documents
Lec 16
Lec 16
Lec 16
Statistics 104 2005 - http:// stat.duke.edu/ courses/ Fall10/ sta104/ exams/ 104f05t2b.pdf
Colin Rundel
2008 - http:// stat.duke.edu/ courses/ Fall10/ sta104/ exams/ 104f08t2.pdf
March 19, 2012
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 2 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 3 / 33
Review Functions of Random Variables Review Joint Distributions of Discrete RVs
Cov (X , c) = 0
0 1 2
Cov (X , X ) = Var (X )
6 5 15 6 6 36 6 5 15
P(B=k) 12 11 = 66 2 12 11 = 66 12 11 = 66
Cov (aX , bY ) = ab Cov (X , Y )
8 7 28 4 8 32 4 3 6
Cov (X + a, Y + b) = Cov (X , Y ) P(W=k) 12 11 = 66 2 12 11 = 66 12 11 = 66
6 6 4 8
k
Note - B ∼ HyperGeo(12, 6, 2) = 2−k
and W ∼ HyperGeo(12, 4, 2) = k
2−k
12 12
2 2
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 4 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 5 / 33
Review Joint Distributions of Discrete RVs Review Joint Distributions of Discrete RVs
Let B be the number of Black socks, W the number of White socks Note that the column and row sums are the distributions of B and W
drawn, then the distributions of B and W are given by: respectively.
W
P(B = b) = P(B = b, W = 0) + P(B = b, W = 1) + P(B = b, W = 2)
0 1 2
1 8 6 15 P(W = w ) = P(B = 0, W = w )+P(B = 1, W = w )+P(B = 2, W = w )
0 66 66 66 66
12 24 36 These are the marginal distributions of B and W . In general,
B 1 66 66 0 66
X X
15 15 P(X = x) = P(X = x, Y = y ) = P(X = x|Y = y )P(Y = y )
2 66 0 0 66
y y
28 32 6 66
66 66 66 66
6 4 2
b w 2−b−w
P(B = b, W = w ) = 12
2
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 6 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 7 / 33
Review Joint Distributions of Discrete RVs Review Joint Distributions of Discrete RVs
Review Joint Distributions of Discrete RVs Review Joint Distributions of Discrete RVs
Remember that Cov (X , Y ) = 0 when X and Y are independent. Works like any other distribution
X
E (X |Y = y ) = xP(X = x|Y = y )
Cov (B, W ) = E [(B − E [B])(W − E [W ])] x
= E (BW ) − E (B)E (W ) Therefore we can calculating things like conditional mean and variance,
= 4/11 − 2/3 = −10/33 = −0.30303
E (W |B = 0) = 0 · 1/15 + 1 · 8/15 + 2 · 6/15 = 20/15 = 1.333
E (W 2 |B = 0) = 02 · 1/15 + 12 · 8/15 + 22 · 6/15 = 32/15 = 2.1333
Var (W |B = 0) = E (W 2 |B = 0) − E (W |B = 0)2
= 32/15 − (4/3)2 = 16/45 = 0.3556
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 10 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 11 / 33
Review Discrete Distributions Review LLN and CLT
Let X1 , X2 , · · · , Xk be the k random variables that reflect the number of For any random variable X ≥ 0 and constant a > 0 then
outcomes belonging to category k in n trials with the probability of
success for category k being pk , X1 , · · · , Xk ∼ Multinom(n, p1 , · · · , pk ) Markov’s Inequality:
E (X )
P(X ≥ a) ≤
a
P(X1 = x1 , · · · , Xk = xk ) = f (x1 , · · · , xk |n, p1 , · · · , pk )
n! x
Chebyshev’s Inequality:
= p x1 · · · pkk
x1 ! · · · xk ! 1
Var (X )
k
X X k
P(|X − E (X )| ≥ a) ≤
where xi = n and pi = 1 a2
i=1 i=1
E (Xi ) = npi
Var (Xi ) = npi (1 − pi )
Cov (Xi , Xj ) = −npi pj
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 12 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 13 / 33
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 14 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 15 / 33
Review Moments of Distributions Review Moments of Distributions
The moment generating function of a random variable X is defined for all If X and Y are independent random variables then the moment generating
real values of t by function for the distribution of X + Y is
(P
tX e tx P(X = x) If X is discrete
MX (t) = E [e ] = R x tx
x e P(X = x)dx If X is continuous MX +Y (t) = E [e t(X +Y ) ] = E [e tX e tY ] = E [e tX ]E [e tY ] = MX (t)MY (t)
This is called the moment generating function because we can obtain the
raw moments of X by successively differentiating MX (t) and evaluating at Similarly, the moment generating function for Sn , the sum of iid random
t = 0. variables X1 , X2 , . . . , Xn is
0 0
MX (0) = E [e ] = 1 = µ0
d d d
00 0 tX tX 2 tX
MX (t) = MX (t) = E [Xe ]=E (Xe ) = E [X e ]
dt dt dt
00 2 0 2 0
MX (0) = E [X e ] = E [X ] = µ2
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 16 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 17 / 33
We have seen a variety of problems where we find P(X <= x) or For a continuous probability distribution
P(X > x) etc. The former is given a special name - the cumulative
distribution function. P(X = x) = 0 for all x
If X is discrete with probability mass function f (x) then As such we define the probability density function to be
x
fX (x) = lim P(X ∈ [x, x + ])/
X
P(X ≤ x) = F (x) = f (z)
→0
z=−∞
A pdf is defined for for all −∞ < x < ∞ and follows the following rules:
If X is continuous with probability density function f (x) then R∞
f (x)dx
−∞ X
=1
Z x
P(X ≤ x) = F (x) =
Rx
f (z)dz f (t)dt = FX (x) ⇔ fX (x) = d
F (x)
−∞ −∞ X dx X
fX (x) ≥ 0 for all x
CDF is defined for for all −∞ < x < ∞ and follows the following rules:
lim F (x) = 0 lim F (x) = 1 x < y ⇒ F (x) <
x→−∞ x→∞
F (y )
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 18 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 19 / 33
Review Hazard Review Change of Variables
We define the hazard rate for a distribution function F with density f to be Monotonically increasing (increasing, non-decreasing) function:
f (x) x ≤ y =⇒ f (x) ≤ f (y )
λ(x) =
1 − F (x)
which we can use to uniquely identify a distribution Monotonically decreasing (decreasing, non-increasing) function:
Z x Z t
f (x) x ≤ y =⇒ f (x) ≥ f (y )
λ(t) dt = dt
0 0 1 − F (x)
Z t d
F (x) Strictly increasing function:
dt
= dt
0 1 − F (x)
x < y =⇒ f (x) < f (y )
= − log(1 − F (t)) + log(1 − F (0))
Z x
λ(t) dt = − log(1 − F (t)) Strictly decreasing function:
0
Z x
1 − F (t) = exp(− λ(t) dt) x < y =⇒ f (x) > f (y )
0
Z x
F (t) = 1 − exp − λ(t) dt
0
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 20 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 21 / 33
Let X be a random variable with density fX (x) on the range (a, b) and let Let X1 , X2 , X3 , X4 , X5 be iid random variables with a distribution F with a
Y = g (X ) which has the range (g (a), g (b)), if g (x) is either strictly range of (a, b). We can relabel these X’s such that their labels correspond
increasing or decreasing on (a, b) then to arranging them in increasing order so that
dy
fY (y ) = fX (x) X(1) ≤ X(2) ≤ X(3) ≤ X(4) ≤ X(5)
dx
X(1) X(2) X(3) X(4) X(5)
a X5 X1 X4 X2 X3 b
For X1 , X2 , . . . , Xn iid random variables Xk is the kth smallest X , usually For X1 , X2 , . . . , Xn iid continuous random variables with pdf f and cdf F
called the kth order statistic. then
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 24 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 25 / 33
If X is a random variable with constant density on (a, b) then X is said to If X is random variable with a normal distribution with a mean µ and
be Uniformly distributed on (a, b), X ∼ Unif(a, b), then variance σ 2 , X ∼ N (µ, σ 2 ), then
2
1
1 (x−µ)
0 if x < a f (x) = √ e − 2 σ2
(
1
b−a if a < x < b x−a 2πσ
f (x) = F (x) = b−a if a < x < b
0 otherwise
x −µ
1 if x > b
F (x) = Φ
σ
b+a
E (X ) = E (X ) = µ
2
(b − a)2 Var (X ) = σ 2
Var (X ) =
12 Mode(X ) = µ
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 26 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 27 / 33
Review Continuous Distributions Review Continuous Distributions
Let X be a random variable that reflects the time between events which Based on the relationship for the nth raw moment, µ0n , of an exponential
occur continuously with a given rate λ, X ∼ Exp(λ) distribution
n!
E (X n ) = n
f (x|λ) = λe −λx λ
Let set λ = 1 and define an new value α = n + 1
P(X ≤ x) = F (x|λ) = 1 − e −λx
t −1
λ
E (X α−1 ) = (α − 1)!
MX (t) = 1 − = ∞
λ−t
Z
λ
x α−1 e −x dx = (α − 1)!
0
Z ∞
E (X ) = λ−1
n!
Γ(α) ≡ x α−1 e −x dx = (α − 1)!
E (X n ) = 0
λn
Using a tradition definition of the factorial it only makes sense when n ∈ N
Var (X ) = λ−2
but we can use this new definition of the Gamma function Γ(α) for any
log 2
Median(X ) =
λ α ∈ R+ .
Memoryless property - P(X > s + t|X > s) = P(X > t) Minimum of Exponentials -
∼ Exp(λ
Is Statistics
commonly used with the Gamma, Beta, and negative binomial
Statistics 104 (Colin Rundel) min(X1 , . . . , Xn )Lecture 16 1 + · · · + λn ) March 19, 2012 28 / 33 104 (Colin Rundel) Lecture 16 March 19, 2012 29 / 33
distribution to generalize a parameter to R+ .
Let X reflect the time until the nth event occurs when the events occur We can generalize the Erlang distribution by using the gamma function
according to a Poisson process with rate λ, X ∼ Er(n, λ) instead of the factorial function, thereby allowing for R+ values of n. Often
the distribution is reparameterized such that θ = 1/λ, X ∼ Gamma(n, θ).
e −λx λn x n−1
f (x|n, λ) =
(n − 1)! e −x/θ x n−1
∞ f (x|n, λ) =
X e −λx (λx)j θn Γ(n)
F (x|n, λ) = R x −t/θ n−1
j=n
j! e t dt γ(n, x/θ)
F (x|n, λ) = 0 =
θn Γ(n) Γ(n)
n
λ
MX (t) =
1
n
λ−t MX (t) =
1 − θt
E (X ) = n/λ
E (X ) = nθ
Var (X ) = n/λ2
Var (X ) = nθ2
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 30 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 31 / 33
Review Continuous Distributions Review Sample Problem
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 32 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 33 / 33