Lec 16

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Review Practice Exams

Robert Wolpert’s Previous Exams

Lecture 16: Midterm 2 Review

Statistics 104 2005 - http:// stat.duke.edu/ courses/ Fall10/ sta104/ exams/ 104f05t2b.pdf
Colin Rundel
2008 - http:// stat.duke.edu/ courses/ Fall10/ sta104/ exams/ 104f08t2.pdf
March 19, 2012

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 1 / 33

Review Functions of Random Variables Review Functions of Random Variables

Properties of Expected Value Properties of Variance

Var (aX ) = a2 Var (X )


E (c) = c
Var (X + c) = Var (X )
E (IA ) = P(A)
P Var (c) = 0
E [g (X )] = all x g (x) P(X = x)
Var (X + Y ) = Var (X ) + Var (Y ) + 2Cov (X , Y )
E (cX ) = cE (x) Var (aX + bY + c) = a2 Var (X ) + b 2 Var (Y ) + 2ab Cov (X , Y )
Pn  Pn Pn
E (X + Y ) = E (X ) + E (Y ) Var i=1 ci Xi = i=1 j=1 Cov (ci Xi , cj Xj )
Pn 2 Pn Pn
= i=1 ci Var (Xi ) + i=1 j=1 ci cj Cov (Xi , Xj )
E (XY ) = E (X )E (Y ) if X and Y are independent. i6=j

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 2 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 3 / 33
Review Functions of Random Variables Review Joint Distributions of Discrete RVs

Properties of Covariance Joint Distributions - Example

Draw two socks at random, without replacement, from a drawer full of


twelve colored socks:
Cov (X , Y ) = Cov (Y , X )
6 black, 4 white, 2 purple
Cov (X , Y ) = E [(X − µx )(Y − µy )] = E (XY ) − µx µy
Let B be the number of Black socks, W the number of White socks
Cov (X , Y ) = 0 if X and Y are independent drawn, then the distributions of B and W are given by:

Cov (X , c) = 0
0 1 2
Cov (X , X ) = Var (X )
6 5 15 6 6 36 6 5 15
P(B=k) 12 11 = 66 2 12 11 = 66 12 11 = 66
Cov (aX , bY ) = ab Cov (X , Y )
8 7 28 4 8 32 4 3 6
Cov (X + a, Y + b) = Cov (X , Y ) P(W=k) 12 11 = 66 2 12 11 = 66 12 11 = 66
     
6 6 4 8
k
Note - B ∼ HyperGeo(12, 6, 2) =  2−k
 and W ∼ HyperGeo(12, 4, 2) = k
 2−k

12 12
2 2

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 4 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 5 / 33

Review Joint Distributions of Discrete RVs Review Joint Distributions of Discrete RVs

Joint Distributions - Example, cont. Marginal Distribution

Let B be the number of Black socks, W the number of White socks Note that the column and row sums are the distributions of B and W
drawn, then the distributions of B and W are given by: respectively.

W
P(B = b) = P(B = b, W = 0) + P(B = b, W = 1) + P(B = b, W = 2)
0 1 2
1 8 6 15 P(W = w ) = P(B = 0, W = w )+P(B = 1, W = w )+P(B = 2, W = w )
0 66 66 66 66
12 24 36 These are the marginal distributions of B and W . In general,
B 1 66 66 0 66
X X
15 15 P(X = x) = P(X = x, Y = y ) = P(X = x|Y = y )P(Y = y )
2 66 0 0 66
y y
28 32 6 66
66 66 66 66

6 4 2
  
b w 2−b−w
P(B = b, W = w ) = 12

2

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 6 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 7 / 33
Review Joint Distributions of Discrete RVs Review Joint Distributions of Discrete RVs

Conditional Distribution Expectation of Joint Distributions

Conditional distributions are defined as we have seen previously with XX


E [g (X , Y )] = g (x, y )P(X = x, Y = y )
P(X = x, Y = y ) joint pmf x y
P(X = x|Y = y ) = =
P(Y = y ) marginal pmf For example we can define g (x, y ) = x · y then
Therefore the pmf for white socks given no black socks were drawn is
E (BW ) =(0 · 0 · 1/66) + (0 · 1 · 8/66) + (0 · 2 · 6/66)
  + (1 · 0 · 12/66) + (1 · 1 · 24/66) + (1 · 2 · 0/66)
1 15 1
 66  66 = 15 if W = 0 + (2 · 0 · 15/66) + (2 · 1 · 0/66) + (1 · 2 · 0/66)
P(W = w , B = 0) 
8 15 8
P(W = w |B = 0) = = 66 66 = 15 if W = 1 =24/66 = 4/11
P(B = 0) 
 6 15
 6
66 66 = 15 if W = 2 Note that E (BW ) 6= E (B)E (W ) since
E (B)E (W ) = (0 · 15/66 + 1 · 36/66 + 2 · 15/66)
× (0 · 28/66 + 1 · 32/66 + 2 · 6/66)
=66/66 × 44/66 = 2/3

This implies that B and W are not independent.


Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 8 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 9 / 33

Review Joint Distributions of Discrete RVs Review Joint Distributions of Discrete RVs

Independence Expectation of Conditional Probability

Remember that Cov (X , Y ) = 0 when X and Y are independent. Works like any other distribution
X
E (X |Y = y ) = xP(X = x|Y = y )
Cov (B, W ) = E [(B − E [B])(W − E [W ])] x
= E (BW ) − E (B)E (W ) Therefore we can calculating things like conditional mean and variance,
= 4/11 − 2/3 = −10/33 = −0.30303
E (W |B = 0) = 0 · 1/15 + 1 · 8/15 + 2 · 6/15 = 20/15 = 1.333
E (W 2 |B = 0) = 02 · 1/15 + 12 · 8/15 + 22 · 6/15 = 32/15 = 2.1333

Var (W |B = 0) = E (W 2 |B = 0) − E (W |B = 0)2
= 32/15 − (4/3)2 = 16/45 = 0.3556

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 10 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 11 / 33
Review Discrete Distributions Review LLN and CLT

Multinomial Distribution Markov’s and Chebyshev’s Inequalities

Let X1 , X2 , · · · , Xk be the k random variables that reflect the number of For any random variable X ≥ 0 and constant a > 0 then
outcomes belonging to category k in n trials with the probability of
success for category k being pk , X1 , · · · , Xk ∼ Multinom(n, p1 , · · · , pk ) Markov’s Inequality:
E (X )
P(X ≥ a) ≤
a
P(X1 = x1 , · · · , Xk = xk ) = f (x1 , · · · , xk |n, p1 , · · · , pk )
n! x
Chebyshev’s Inequality:
= p x1 · · · pkk
x1 ! · · · xk ! 1
Var (X )
k
X X k
P(|X − E (X )| ≥ a) ≤
where xi = n and pi = 1 a2
i=1 i=1

E (Xi ) = npi
Var (Xi ) = npi (1 − pi )
Cov (Xi , Xj ) = −npi pj

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 12 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 13 / 33

Review LLN and CLT Review Moments of Distributions

LLN and CLT Moments

Law of large numbers:


Raw moment:
Sn − nµ
lim = lim (X̄n − µ) → 0 µ0n = E (X n )
n→∞ n n→∞

Central Limit Theorem:


Central moment:
Sn − n ∗ mu √ d µn = E [(X − µ)2 ]
lim √ = lim n(X̄n − µ) → N(0, σ 2 )
n→∞ n n→∞

   √  Normalized / Standardized moment:


Sn − nµ n(X̄n − µ)
P a≤ √ ≤b =P a≤ ≤ b ≈ Φ(b) − Φ(a) µn
σ n σ
σn

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 14 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 15 / 33
Review Moments of Distributions Review Moments of Distributions

Moment Generating Function Moment Generating Function - Properties

The moment generating function of a random variable X is defined for all If X and Y are independent random variables then the moment generating
real values of t by function for the distribution of X + Y is
(P
tX e tx P(X = x) If X is discrete
MX (t) = E [e ] = R x tx
x e P(X = x)dx If X is continuous MX +Y (t) = E [e t(X +Y ) ] = E [e tX e tY ] = E [e tX ]E [e tY ] = MX (t)MY (t)
This is called the moment generating function because we can obtain the
raw moments of X by successively differentiating MX (t) and evaluating at Similarly, the moment generating function for Sn , the sum of iid random
t = 0. variables X1 , X2 , . . . , Xn is
0 0
MX (0) = E [e ] = 1 = µ0

MSn (t) = [MXi (t)]n


d d
 
0 tX tX tX
MX (t) = E [e ]=E e = E [Xe ]
dt dt
0 0 0
MX (0) = E [Xe ] = E [X ] = µ1

d d d
 
00 0 tX tX 2 tX
MX (t) = MX (t) = E [Xe ]=E (Xe ) = E [X e ]
dt dt dt
00 2 0 2 0
MX (0) = E [X e ] = E [X ] = µ2

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 16 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 17 / 33

Review Continuous Random Variables Review Continuous Random Variables

Cumulative Distribution Function Probability Density Function

We have seen a variety of problems where we find P(X <= x) or For a continuous probability distribution
P(X > x) etc. The former is given a special name - the cumulative
distribution function. P(X = x) = 0 for all x

If X is discrete with probability mass function f (x) then As such we define the probability density function to be
x
fX (x) = lim P(X ∈ [x, x + ])/
X
P(X ≤ x) = F (x) = f (z)
→0
z=−∞
A pdf is defined for for all −∞ < x < ∞ and follows the following rules:
If X is continuous with probability density function f (x) then R∞
f (x)dx
−∞ X
=1
Z x
P(X ≤ x) = F (x) =
Rx
f (z)dz f (t)dt = FX (x) ⇔ fX (x) = d
F (x)
−∞ −∞ X dx X
fX (x) ≥ 0 for all x
CDF is defined for for all −∞ < x < ∞ and follows the following rules:
lim F (x) = 0 lim F (x) = 1 x < y ⇒ F (x) <
x→−∞ x→∞
F (y )

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 18 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 19 / 33
Review Hazard Review Change of Variables

Hazard Rate Some Quick Definitions

We define the hazard rate for a distribution function F with density f to be Monotonically increasing (increasing, non-decreasing) function:

f (x) x ≤ y =⇒ f (x) ≤ f (y )
λ(x) =
1 − F (x)
which we can use to uniquely identify a distribution Monotonically decreasing (decreasing, non-increasing) function:
Z x Z t
f (x) x ≤ y =⇒ f (x) ≥ f (y )
λ(t) dt = dt
0 0 1 − F (x)
Z t d
F (x) Strictly increasing function:
dt
= dt
0 1 − F (x)
x < y =⇒ f (x) < f (y )
= − log(1 − F (t)) + log(1 − F (0))
Z x
λ(t) dt = − log(1 − F (t)) Strictly decreasing function:
0
Z x
1 − F (t) = exp(− λ(t) dt) x < y =⇒ f (x) > f (y )
0
 Z x 
F (t) = 1 − exp − λ(t) dt
0
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 20 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 21 / 33

Review Change of Variables Review Order Statistics

Change of Variables for Continuous RV Order Statistics

Let X be a random variable with density fX (x) on the range (a, b) and let Let X1 , X2 , X3 , X4 , X5 be iid random variables with a distribution F with a
Y = g (X ) which has the range (g (a), g (b)), if g (x) is either strictly range of (a, b). We can relabel these X’s such that their labels correspond
increasing or decreasing on (a, b) then to arranging them in increasing order so that

dy
fY (y ) = fX (x) X(1) ≤ X(2) ≤ X(3) ≤ X(4) ≤ X(5)
dx
X(1) X(2) X(3) X(4) X(5)

a X5 X1 X4 X2 X3 b

In the case where the distribution F is continuous we can make the


stronger statement that

X(1) < X(2) < X(3) < X(4) < X(5)


Since P(Xi = Xj ) = 0 for all i 6= j for continuous random variables.
Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 22 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 23 / 33
Review Order Statistics Review Order Statistics

Order Statistics, cont. Distributions of order statistics

For X1 , X2 , . . . , Xn iid random variables Xk is the kth smallest X , usually For X1 , X2 , . . . , Xn iid continuous random variables with pdf f and cdf F
called the kth order statistic. then

X(1) is therefore the smallest X and f(1) (x) = nf (x)(1 − F (x))n−1


!
n−1
X(1) = min(X1 , . . . , Xn ) f(k) (x) = nf (x) F (x)k−1 (1 − F (x))n−k
k −1
Similarly, X(n) is the largest X and f(n) (x) = nf (x)F (x)n−1

X(n) = max(X1 , . . . , Xn ) F(1) (x) = 1 − (1 − F (x))n


F(n) (x) = F (x)n

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 24 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 25 / 33

Review Continuous Distributions Review Continuous Distributions

Normal Distribution Normal Distribution

If X is a random variable with constant density on (a, b) then X is said to If X is random variable with a normal distribution with a mean µ and
be Uniformly distributed on (a, b), X ∼ Unif(a, b), then variance σ 2 , X ∼ N (µ, σ 2 ), then

2
1

1 (x−µ)
0 if x < a f (x) = √ e − 2 σ2
(
1 
b−a if a < x < b x−a 2πσ
f (x) = F (x) = b−a if a < x < b
0 otherwise  
x −µ

1 if x > b

F (x) = Φ
σ

b+a
E (X ) = E (X ) = µ
2
(b − a)2 Var (X ) = σ 2
Var (X ) =
12 Mode(X ) = µ

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 26 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 27 / 33
Review Continuous Distributions Review Continuous Distributions

Exponential Distribution Gamma Function

Let X be a random variable that reflects the time between events which Based on the relationship for the nth raw moment, µ0n , of an exponential
occur continuously with a given rate λ, X ∼ Exp(λ) distribution
n!
E (X n ) = n
f (x|λ) = λe −λx λ
Let set λ = 1 and define an new value α = n + 1
P(X ≤ x) = F (x|λ) = 1 − e −λx

t −1
 
λ
 E (X α−1 ) = (α − 1)!
MX (t) = 1 − = ∞
λ−t
Z
λ
x α−1 e −x dx = (α − 1)!
0
Z ∞
E (X ) = λ−1
n!
Γ(α) ≡ x α−1 e −x dx = (α − 1)!
E (X n ) = 0
λn
Using a tradition definition of the factorial it only makes sense when n ∈ N
Var (X ) = λ−2
but we can use this new definition of the Gamma function Γ(α) for any
log 2
Median(X ) =
λ α ∈ R+ .
Memoryless property - P(X > s + t|X > s) = P(X > t) Minimum of Exponentials -
∼ Exp(λ
Is Statistics
commonly used with the Gamma, Beta, and negative binomial
Statistics 104 (Colin Rundel) min(X1 , . . . , Xn )Lecture 16 1 + · · · + λn ) March 19, 2012 28 / 33 104 (Colin Rundel) Lecture 16 March 19, 2012 29 / 33
distribution to generalize a parameter to R+ .

Review Continuous Distributions Review Continuous Distributions

Erlang Distribution Gamma Distribution

Let X reflect the time until the nth event occurs when the events occur We can generalize the Erlang distribution by using the gamma function
according to a Poisson process with rate λ, X ∼ Er(n, λ) instead of the factorial function, thereby allowing for R+ values of n. Often
the distribution is reparameterized such that θ = 1/λ, X ∼ Gamma(n, θ).
e −λx λn x n−1
f (x|n, λ) =
(n − 1)! e −x/θ x n−1
∞ f (x|n, λ) =
X e −λx (λx)j θn Γ(n)
F (x|n, λ) = R x −t/θ n−1
j=n
j! e t dt γ(n, x/θ)
F (x|n, λ) = 0 =
θn Γ(n) Γ(n)
 n
λ
MX (t) = 
1
n
λ−t MX (t) =
1 − θt

E (X ) = n/λ
E (X ) = nθ
Var (X ) = n/λ2
Var (X ) = nθ2

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 30 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 31 / 33
Review Continuous Distributions Review Sample Problem

Beta Distribution Example 1

If X ∼ Beta(r , s) then The random variable X has the pdf


(
1 6/x c if x > 1
f (x) = x r −1 (1 − x)s−1 fX (x) =
B(r , s) 0 otherwise
Z x
1 Bx (r , s)
F (x) = x r −1 (1 − x)s−1 dx =
0 B(r , s) B(r , s) (a) Find c.
(b) Find FX (x).
1
(c) Find the pdf of Y = X 2 , for all y ∈ R.
Z
(r − 1)!(s − 1)! Γ(r )Γ(s)
B(r , s) = x r −1 (1 − x)s−1 dx = =
(r + s − 1)! Γ(r + s) (d) Find the hazard function of X .
Z0 x
r −1 s−1
Bx (r , s) = x (1 − x) dx (e) Suppose X is the life-time (in years) of some replaceable item, like a light-bulb.
0 What is the probability that the item fails during the first year?
(f) Which is more likely to fail in the next short interval (say, a day or two): a
r
E (X ) = three-year old item, or a two-year-old item, both still working?
r +s
rs
Var (X ) =
(r + s)2 (r + s + 1)

Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 32 / 33 Statistics 104 (Colin Rundel) Lecture 16 March 19, 2012 33 / 33

You might also like