SMA 2231 Probability and Statistics III

SMA 2231
PROBABILITY AND STATISTICS III
COURSE OUTLINE
1. Bivariate probability mass and distribution function (Discrete case)
2. Bivariate probability density function (continuous case)
3. Joint, marginal and conditional distribution function.
4. Bivariate moment generating function (MGF) and change of variable techniques for
bivariate distribution.
5. Stochastic independence.
6. Multiple regression and correlation.
7. Bivariate normal distribution.
8. Independence of sample mean and Variance for normal mean and variance for normal
distribution.
9. The t, chi-square and F distribution.
10. Distribution of order statistics.
References:
1. Probability and statistical inference by Hogg and Tannis.
2. Introduction to Mathematical statistics by Hogg and Craig.
3. Introduction to the Theory of Statistics, by Mood A. MGraybill.
4. Mathematical statistics with applications by W. mendenhall, D.D Wackerls and R. L.
Sheaffer.
Discrete Bivariate Probability Distribution
Let X and Y be discrete random variables, denoted by x a realizable value of X and by y a
realizable value of Y. let the probability that X takes values x and Y takes values y be
denoted by
P ( X = x, Y = y )
Then the function f ( x, y ) = P( X = x, Y = y ) is said to be the joint probability function of X
and Y if it satisfies the following two conditions
i) f ( x, y ) ≥ 0
ii) ∑∑ f (x, y ) = 1
x y
The double summation extends over all possible pairs ( x, y ) .
Example
Consider an experiment of tossing a pair of dice. The sample space contains 36 sample points
corresponding to 6 x 6 =36 ways which the numbers may appear on the faces of the die
2nd dice
1 2 3 4 5 6
1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)

1st dice
2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
-2-
Any of the following random variables could be defined over the sample space and might be of
interest to the experimenter.
X1: The numbers of dots appearing on dice 1.
X2: The number of the number of dots appearing on dice 2.
X3: the sum of the number of dots on both dice
X4: the product of the number of dots on both dice.
The 36 sample points associated with the experiment are equiprobable and correspond to the
36 numerical events ( x1 , x 2 ) . Thus if ones are obtained in the throw of the dice, the simple
event is (1, 1) . Throw a 2 on die 1 and a 3 on die 2 would be the sample event (2, 3) . Because all
pairs ( x1 , x 2 ) occur with the same relative frequency a probability of 1/36 would be assigned to
each sample point.
For this example, the intersection ( x1 , x 2 ) contains only one sample point. Hence the bivariate
probability function is
⎧1
⎪ x1 = 1, 2, 3, 4 ,5, 6 x 2 = 1, 2, 3, 4 ,5, 6
P(x1 , x 2 ) = ⎨ 36
⎪⎩0 Otherwise
Definition:
Let the random variable X1 , X2 take a countable number of pairs of real values ( X 1 , X 2 ) .If
there is a function P( X1 , X 2 ) = P(X1 = x1 ) P(X 2 = x2 ) with the following properties;
a) P( X1 , X 2 ) ≥ 0
b) ∑∑ P( X , X
x1 x2
1 2 ) = 1 and
b d
c) For any constants a, b, c and d , P (a ≤ X 1 ≤ b , and , c ≤ X 2 ≤ d ) = ∑ ∑ P( X 1 , X2)
x1 = a x 2 = c
-3-
Then X1 and X2 are said to have a joint (or bivariate) discrete probability distribution with joint
probability mass function P ( x1 , x 2 )
Example:
Using the results of tossing of two dice, calculate the P(2 ≤ X 1 ≤ 3,1 ≤ X 2 ≤ 2)
Solution:
P(2 ≤ X 1 ≤ 3, 1 ≤ X 2 ≤ 2) = P(2, 1) + P(2, 2) + P(3, 1) + P(3, 2)
4 1
= =
36 9
Definition:
For any random variables X1 and X2, the joint (bivariate) distribution function, F (a, b ) is
given by
F (a, b ) = P ( X 1 ≤ a, X 2 ≤ b )
For two discrete variables X1 andX2,
F (a, b ) = P( X 1 ≤ a, X 2 ≤ b ) = ∑ ∑ P(x x ) 1 2
all x1 all x2
Example
Tossing of two – dice experiment
F (2, 3) = P(X 1 ≤ 2 , X 2 ≤ 3)
= P(1,1) + P(1, 2) + P(1, 3) + P(2, 1) + P(2, 2) + P(2, 3)
1 1 1 1 1 1
= + + + + +
36 36 36 36 36 36
1
=
36
Example
Suppose the random variables X1 and X2 have the following probability distribution.
-4-
X1
0 1 2
0 1 2 1
X2 9 9 9
1 2 2 0
9 9
2 1 0 0
9
Find
(a) F (- 1, 2) (b). F (1.5, 2) , (c). F (5, 7 )
Solution
a) F (-1, 2) = P(X 1 ≤ -1 , X 2 ≤ 2) = P( φ ) = 0
b) F (1.5, 2) = P(X 1 ≤ 1.5 , X 2 ≤ 2)
= P(0,0) + P(0,1) + P(0,2) + P(1,0) + P(1,1) + P(1,2)
=8/9
c) In a similar way as in (b) above.
F (5,7) = P(X 1 ≤ 5 , X 2 ≤ 7) = 1
Continuous Bivariate Distribution
Definition:
The random variables X1, X2 are said to have a bivariate (joint) continuous distribution with
probability density function f ( x1 , x 2 ) if the following are satisfied:-
1. f ( x1 , x 2 ) ≥ 0 on the give domain
-5-
2. f ( x1 , x 2 ) is continuous except along a countable number of points and curves.
∞ ∞
3. ∫ ∫ f (x , x )dx dx
− ∞− ∞
1 2 1 2 =1
b d
4. For constants a, b, c and d, P(a ≤ X 1 ≤ b, and , c ≤ X 2 ≤ d ) = ∫ ∫ f ( x1 , x2 )dx1 dx2
a c
Definition
Let X1 and X2 be continuous random variables with joint distribution function f ( x1 , x 2 ) . If
there exists a non negative function F ( x1 , x 2 ) such that
x1 x 2
F( x1 , x2 ) = ∫ ∫ f (t , t
- ∞- ∞
1 2 )dt2 dt1
For any real numbers x1 and x 2 , then X1 and X2 are said to be jointly continuous random
variables. The function f ( x1 , x 2 ) is called the Joint probability density function.
Example:
Suppose that a radioactive particle is randomly located in a square with sides of unit length.
That is, if two regions of equal areas are considered, the particle is equally likely to be in
either. Let X1 and X2 denote the co ordinates locating the particles. A reasonable model for the
relative frequency histogram for X1 and X2 would be the bivariate analogues to the univariate
uniform distribution.
⎧1 ,0 ≤ x1 ≤ 1 , 0 ≤ x 2 ≤ 1
f ( x1 , x 2 ) = ⎨
⎩0 ,Otherwise
a) Sketch the probability density Surface
b) Find F (0.2, 0.4 )
c) Find P(0.1 ≤ x1 ≤ 0.3 ; 0 ≤ x 2 ≤ 0.5)
-6-
Solution
i) Diagram
0.40.2
ii) F (0.2, 0.4 ) = ∫ ∫ f (x , x )dx dx 1 2 1 2
-∞ -∞
0.40.2
= ∫ ∫ f (1)dx dx
- ∞ -∞
1 2
0 .4
∫ [x ]
0 .4
= 1 0 dx 2
-∞
0 .4
= ∫ 0 . 2 dx
-∞
2
= [0.2 x2 ]0
0.4
= 0.08
Note: The probability F (0.2, 0.4) corresponds to the volume under f (x1 , x 2 ) = 1 over the
region (0 ≤ x1 ≤ 0.2 , 0 ≤ x 2 ≤ 0.4 )
iii) P (0.1 ≤ X 1 ≤ 0.3, 0 ≤ X 2 ≤ 0.5 )
0.50.3
= ∫ ∫ f ((x , x ) )dx dx
0 0.1
1 2 1 2
0.50.3
= ∫ ∫ dx dx
0 0.1
1 2
= 0.1
This probability corresponds to the volume under f ( x1 , x 2 ) = 1 over the region
(0.1 ≤ x1 ≤ 0.3 , 0 ≤ x2 ≤ 0.5)

The properties of a bivariate cumulative distribution function are given in the following
theorem
-7-
Theorem
Let X1 and X2 be random variables discrete or continuous with the joint distribution
function f ( x1 , x 2 ) , then
a) F (− ∞,−∞ ) = F (− ∞, X 2 ) = F ( X 1 ,−∞ ) = 0
b) F (∞, ∞ ) = 1
c) If a 2 ≥ a1 and b2 ≥ b1 then F (a 2 ,b2 ) - F (a 2 ,b1 ) - F (a1 ,b2 ) + F (a1 ,b1 ) ≥ 0
Example
Suppose X1 and X2 have the joint bivariate PDF given as
⎧3x ,0 ≤ x 2 ≤ x1 ≤ 1
f ( x1 , x 2 ) = ⎨ 1
⎩0 ,Otherwise
Find P (0.25 ≤ X 1 ≤ 0.5, X 2 ≥ 0.25 )
Solution
P (0.25 ≤ X 1 ≤ 0.5, X 2 ≥ 0.25 )
0 .5 x1
= ∫ ∫ 3x
0 . 25 0 . 25
1 dx 2 dx 1
0.5
= ∫ 3 x [x ]
x1
1 2 0.25 dx1
0.25
0.5
1
= ∫
0 .25
3 x1 [ x1 − ]dx1
4
0 .5
⎡ 3 3 2⎤
= ⎢ x1 − x1 ⎥
⎣ 8 ⎦ 0.25
⎡ 1 3 ⎛ 1 ⎞⎤ ⎡ 1 3 ⎛ 1 ⎞⎤
= ⎢ − ⎜ ⎟⎥ ⎢ − ⎜ ⎟⎥
⎣ 8 8 ⎝ 4 ⎠ ⎦ ⎣ 64 8 ⎝ 16 ⎠ ⎦
-8-
5
=
128
Question
Three fair coins are tossed independently one of the variables of interest is X1 = the umber of
heads. Let X2 denote the amount of money won on a side set in the following manner. If the
first head occurs on the first toss, you win $ 1. If the first head occurs on toss 2 or on toss 3 you
win $ 2 or $ 3 respectively. If no head appears, you loss $ 1 (that is win – $ 1)
a) Find the joint probability distribution function of X1 and X2.
b) What is the probability that less than three heads occur and you win $ 1 or less? (I.e.
F (2, 1)
Solution
X1
a)
0 1 2 3
-1 1 0 0 0
8
1 0 1 2 1
8 8 8
X2
2 0 1 1 0
8 8
3 0 1 0 0
8
i)
ii) F (2 ,1) = 1
2
Question
Let X1 and X2 have the joint PDF given by
-9-
⎧k (1 - x 2 ) ,0 ≤ x1 ≤ x 2 ≤ 1
f (x1 , x 2 ) = ⎨
⎩0 ,Otherwise
a) Find the values of K that makes this a PDF
1 x2
∫ ∫ k (1 − x
0 0
2 )dx 1 dx 2 = 1
b) Find P ( X 1 ≤ 0.75, X 2 ≥ 0.5 ) answer K=6, b=31/64
Question
Let X1 and X2 denote the proportions of time, out of one working day, that employee A and B,
respectively, actually spend performing their assigned tasks. The joint relative frequency
behavior of X1 and X2 is modeled by the density function.
⎧ x + x2 ,0 ≤ x1 ≤ 1;0 ≤ x 2 ≤ 1
f (x1 , x 2 ) = ⎨ 1
⎩0 ,elsewhere
a) Find P ( X 1 ≤ 0.5, X 2 ≥ 0.25 ) answer 21/64
b) Find P ( X 1 + X 2 ≤ 1) answer 1/3
Question
A joint probability distribution function for random variables X, Y is given by
⎧6xy 2 ,0 ≤ x ≤ 1 , 0≤ y ≤1
f ( x, y ) = ⎨
⎩0 , elsewhere
i) Check that f ( x , y ) is a probability density function
ii) Find P[0 ≤ x ≤ 0.5 and 0 .5 ≤ y ≤ 0.75]
iii) Find P(x + y ≥ 1)
Solution
Check the conditions given above
- 10 -
∞ ∞
i) ∫ ∫ f (x , x )dx dx
− ∞− ∞
1 2 1 2 =1
ii) Required probability
∫ [2 xy ]
0.5 0.75 0.5
3 0.75
= ∫ ∫ 6 xy dydx =
2
0.5 dx
0 0.5 0
0.5 0.5
= ∫ 2 x [ 27 ] [19 ]
64 − 8 dx = ∫ 2 x 64 dx
1
0 0
= x2[ 19
0.5
64 0
] = 19
64 ⋅ 14 = 19
256
Question
Find k for f ( x, y ) to be a density function of x and y .
i) f ( x, y ) = ⎨
(
⎧k x 2 + 2 y ) 0 < x < 1; 1 < y < 3
⎩0 elsewhere
(
⎧k x + e 2 x
ii) f ( x, y ) = ⎨
) 0 < x < 2; 0 < y < 1
⎩0 elsewhere
iii) For the above , find P[0.5 < x < 1 and 0 .5 ≤ y ≤ 1]
Question
Show that f ( x, y ) is a joint density function
⎧3
⎪ x( y + x ) 0 < x < 1; 0 < y < 2
f ( x, y ) = ⎨ 5
⎪⎩0 elsewhere
Question
⎧( y + x ) 0 < x < 1; 0 < y < 1

Let f ( x, y ) = ⎨
⎩0 elsewhere
Find F (x, y )
Check div book
- 11 -
Question
( )(
⎧ 1 − e−x 1 − e−y
f ( x, y ) = ⎨
)
x > 0; y > 0
⎩0 elsewhere
Find the density of ( x, y ) and P[1 < x < 2 and 3 < y < 5 ]
Solutions
Marginal Probability Function
Definition
Let X1 and X2 be jointly discrete random variables with probability function P (x1 , x 2 ) . Then
the marginal probability functions of X1 and X2, respectively, are given by
P1 ( x1 ) = ∑ P(x x ) , and P (x ) = ∑ P(x x )

1 2 2 2 1 2
all x2 all x1
Similarly, If X1 and X2 are jointly continuous random variables with joint density
function f (x1 , x 2 ) , then the marginal density function of X1 and X2, respectively are given by
∞ ∞
f 1 ( x1 ) = ∫ f (x1 x 2 )dx 2 , and f 2 ( x 2 ) = ∫ f (x x )dx

1 2 1
-∞ -∞
Example
For the joint PMF between X and Y,
⎧a( y + 3x + 1) , x = 0, 1, 2; y = 1, 3
f ( x, y ) = ⎨
⎩0 ,elsewhere
i) Evaluate the constant a
ii) Find the marginal PMFs and Var (x )
- 12 -
Solution
0 1 2 P1(y)
1 2a 5a 8a 15a
Y
3 4a 7a 10a 21a
P1(x) 6a 12a 18a 36a
36a = 1
1
a=
36
The marginal PMF f ( x ) is given by
6 1 ⎫
f (0 ) = 6a = =
36 6 ⎪
⎪ ⎧ 6( x + 1) (x + 1)
12 1 ⎪ ⎪ = for x = 0, 1, 2
f (1) = 12a = = ⎬ f ( x ) = ⎨ 36 6
36 3 ⎪ ⎪⎩0 elsewhere
18 1 ⎪
f (2 ) = 18a = =
36 2 ⎪⎭
The marginal PMF f ( y ) is given by
15 5 ⎫
f (1) = 15a = =
36 12 ⎪⎪ ⎧
⎬ f (y) = ⎨
21 7 ⎪ ⎩ checkk
f (3) = 21a = =
36 12 ⎪⎭
E (x ) =
4
3
( )
E x2 =
7
3
Var ( x ) = E ( x ) − (E ( x ))
2 2
7 16 5 35
Var (x ) = − = Check Var ( y ) =
3 9 9 36
- 13 -
Example
The probability distribution of X1 and X2 is given below
X1
0 1 2 P2(x2)
0 0 3/15 3/15 6/15
1 2/15 6/15 0 8/15

X2
2 1/15 0 0 1/15
P1(x1) 3/15 9/15 3/15 1
Find the marginal probability distribution function of
a) X1
b) X2
Solution
I.
X1 0 1 2
P1(x1) 3/15 9/15 3/15
II.
X2 0 1 2
P2(x2) 6/15 8/15 1/15
Example
The random variables X and Y have the joint distribution
- 14 -
X
1 2 3 P1(y)
2 0.5 1/6 1/12 1/3

Y
3 1/6 0 1/6 1/3
4 0 1/3 0 1/3
P1(x) 1/4 1/2 1/4 1
1 1
i) Find the marginal values of PMFs F2 (4), F1 (2 ) [ F2 (4 ) = F1 (2 ) = ]
3 2
ii) Are X, Y independent? [NO f (2,4 ) = 13 ≠ f1 (2 ) f 2 (4 ) = 16 ]
Example
The random variables X, Y have the joint PMF
⎧x + y
⎪ , for x = 1, 2, 3; y = 1, 2
f (x, y ) = ⎨ 21
⎪⎩0 , elsewhere
Find the P( x = 1), P( x = 2), P ( x = 3), P ( y = 1), P(2 = 1)
Can you find the formula for the marginal PMFs f 2 ( y ), f1 ( x )
3y + 6 y + 2 2x + 3
Answer: [ f 2 ( y ) = = f1 (x ) =
21 7 21
Example
The joint PMF
⎧ λ x + y e −2 λ
⎪ x = 0, 1, 2; y = 0, 1, 2
f ( x, y ) = ⎨ x! y!
⎪0
⎩ elsewhere
- 15 -
Example
Let X1 and X2 have probability density function
⎧2x ,0 ≤ x1 ≤ 1;0 ≤ x 2 ≤ 1
f (x1 , x 2 ) = ⎨ 1
⎩0 ,elsewhere
Sketch f ( x1 , x2 ) and find the marginal density functions for X1 and X2
Solution
Marginal density are given by
∞
f 1 ( x1 ) =
-∞
∫ f( x x 1 2 )dx 2
1
= ∫2x
0
1 dx 2
= [x1 x 2 ]0 = 2x1
1
⎧2 x ,0 ≤ x1 ≤ 1
⇒ f 1 ( x1 ) = ⎨ 1
⎩0 , elsewhere
Similarly
∞
f 2 ( x2 ) = ∫ f( x x
-∞
1 2 )dx1
1
= ∫2x
0
1 dx 1
=x1
2 1
0
] =1
⎧1 ,0 ≤ x 2 ≤ 1
⇒ f 2 ( x2 ) = ⎨
⎩0 ,elsewhere
- 16 -
Conditional Distribution Function
Definition
Suppose X1 and X2 are jointly discrete random variables with probability function
P ( x1 , x 2 ) and marginal probability functions P1 (x1 ) and P2 ( x 2 ) respectively. Then the
conditional discrete probability function of X1 given X2 is
P (x1 x 2 ) = P ( X 1 = x1 X 2 = x 2 )
P( X 1 = x1 X 2 = x 2 )
=
P( X 2 = x 2 )
P(x1 , x 2 ) f (x1 , x 2 )
= = Provided P2 ( x 2 ) > 0
P( x 2 ) f (x 2 )
Similarly the conditional probability function of X 2 = x 2 given X 1 = x1 is
P (x1 , x 2 ) f ( x1 , x 2 )
P (x 2 x1 ) = = Provided P ( x1 ) > 0
P (x1 ) f ( x1 )
Special cases
If ( X , Y ) are independent then
f ( x, y ) f ( x ) f ( y )
f (x y ) = = = f (x ) .
f (y) f (y)
Similarly
f ( x, y ) f ( x ) f ( y )
f (y x) = = = f (y)
f (x ) f (x )
- 17 -
Example
Consider the distribution of X1 and X2 given below
0 1 2 P2(x2)
0 0 3/15 3/15 6/15
1 2/15 6/15 0 8/15
2 1/15 0 0 1/15
P1(x1) 3/15 9/15 3/15 1
a) Find the conditional distribution of X1 given that (i). X2=1, (ii). X2=2
b) Find the conditional distribution of X2 given that X1=1
Solution
P(x1 , x2 )
(
a) (i). P x1 x2 =) where x2 =1
P2 (x2 )
P( x1 , x2 = 1)
P(x1 x2 = 1) =
P2 ( x2 = 1)
P(x1 ,1)
=
P2 (1)
2
P(0,1) 15 1
= =
P2 (1) 8 4
15
6
P(1,1) 15 3
= =
P2 (1) 8 4
15
- 18 -
0
P(2,1) 15
= =0
P2 (1) 8
15
X1 0 1 2
P(x1/x2=1) 1/4 3/4 0
b) Complete
Definition
Let X1 and X2 be jointly continuous random variables with joint density f (x1 , x 2 ) and
marginal densities f ( x1 ) and f (x 2 ) respectively. Then the conditional density of X1 given
X 2 = x 2 is given by
⎧ f ( x1 , x2 )
⎪ , f 2 (x2 ) > 0
f1 (x1 x2 ) = ⎨ f 2 ( x2 )
⎪0
⎩ ,elsewhere
and the conditional density of X2 given X 1 = x1 is given by
⎧ f ( x1 , x 2 )
⎪ , f1 (x1 ) > 0
f 2 (x 2 x1 ) = ⎨ f 1 ( x1 )
⎪0
⎩ ,elsewhere
Example
Suppose X1 and X2 have the joint PDF
⎧1
⎪ ,0 ≤ x1 ≤ x2 , 0 ≤ x2 ≤ 2
f ( x2 , x1 ) = ⎨ 2
⎪⎩0 ,elsewhere
Find
a) The conditional density of X1 given X 2 = x 2 and evaluate P ( X 1 ≤ 0.5/X 2 = 1)
- 19 -
Solution
The marginal density of X2 is given by
∞
f 2 (x 2 ) = ∫ f (x x )dx 1 2 1
-∞
1
1 1
= ∫
0
2
dx 1 =
2
x2
⎧1
⎪ x , x1 ≤ x2 ≤ 2
i.e. f 2 (x 2 ) = ⎨ 2 2
⎪⎩0 ,elsewhere
f ( x1 , x 2 )
By definition, f (x1 x 2 ) =
f 2 (x 2 )
1
2 1
=
1
(x2 ) x2
2
⎧1
⎪ , 0 ≤ x1 ≤ x 2 ≤ 2
f 1 ( x1 x 2 ) = ⎨ x 2
⎪0, , elsewhere
⎩
Now
0.5
P ( X 1 ≤ 0.5 X 2 = 1) = ∫ f (x 1 x 2 = 1)dx1
-∞
0.5
1 1
∫ 1 dx
-∞
1 =
2
Example
In a group of nine executives of a certain business firm, four are married, three have never
married and two are divorced. Three of the executives are to be selected for promotion. Let X1
denote the number of married executives and X2 the number of never married executives
- 20 -
among the three selected for promotion. Assuming that the three are randomly selected from
the nine available,
a) Find the joint probability distribution of X1 and X2
b) Find the marginal probability distribution of X1, the number of married executives
among the three selected
c) Find (i). P (X 1 = 1 X 2 = 2 ) (ii) P ( X 2 = 2 X 1 = 1)
d) Let X3 denote the number of divorced executives among the three selected for
promotion, then X 3 = 3 − X 1 − X 2 . Find P ( X 3 = 1 X 2 = 1)
Solution
a) The joint probability distribution is a hyper geometric distribution with
N = 9, n = 3, r1 = 4, r2 = 3, r3 = 2 , r = r1 + r2 + r3 = N
⎧ ⎛ 4 ⎞⎛ 3 ⎞⎛ 2 ⎞
⎪ ⎜⎜ ⎟⎟⎜⎜ ⎟⎟⎜⎜ ⎟⎟
⎪⎪ ⎝ x1 ⎠⎝ x 2 ⎠ ⎝ 3 − x1 − x 2 ⎠ ⎧0 ≤ x1 ≤ 3
⎪
P ( x1 , x 2 ) = ⎨ ⎛9⎞ ⎨0 ≤ x 2 ≤ 3, and
⎪ ⎜⎜ ⎟⎟ ⎪0 ≤ x + x ≤ 3
⎪ ⎝3⎠ ⎩ 1 2
⎪⎩0, elsewhere
3
b) P1 ( x1 ) = ∑ P(x , x ) = ∑ P(x , x )
1 2 1 2
all x2 x =0
Then
X1 0 1 2 3
P1 (x1 ) 5 20 15 2
42 42 42 42
P( x1 = 1, x 2 = 2 )
c) (i) P1 (x1 = 1 X 2 = 2 ) =
P2 (x 2 = 2 )
- 21 -
= P2 ( x2 = 2 ) = ∑ p ( X ,2 )
1
all x1
2
=
3
ii. = P (x 2 = 2 X 1 = 1) = ?
P( X 3 = 1, X 2 = 1)
a. P( X 3 = 1 X 2 = 1) =
P2 ( X 2 = 1)
Note that. X 1 = 1, X 2 = 1, X 3 = 1
⎛ 4 ⎞⎛ 3 ⎞⎛ 2 ⎞
⎜⎜ ⎟⎟⎜⎜ ⎟⎟⎜⎜ ⎟⎟
1 1 1
P(x3 = 1, X 2 = 1) = ⎝ ⎠⎝ ⎠⎝ ⎠
⎛9⎞
⎜⎜ ⎟⎟
⎝3⎠
P2 (x 2 = 1) = ∑ P( X 1 ,1) 8
allx1 =
15
=?
Question
For X, Y having a joint probability distribution functions
⎧e − y 0 < x < y < ∞

f (x , y ) = ⎨
⎩0 elsewhere
Determine the conditional PDF for y given x
Solution
f (x , y )
f (y x) =
f (x )
∞ ∞
f (x ) = ∫ f (x , y ) dy = ∫ e
y
dy
−∞ x
[
= − e −y ] ∞
x
= e−x
- 22 -
The conditional PDF for y given x is
f (x , y ) e − y
f (y x) = = − x = e x − y for y > x ,0 < x < y < ∞
f (x ) e
Question
For random variables X, Y having a joint probability distribution function
⎧x + y 0 < x < 1; 0 < y < 1

f (x , y ) = ⎨
⎩0 elsewhere
Determine the conditional PDF for y given x and P ( y ≤ 0.5 x = 0.6 )
Solution
f (x , y )
f (y x) =
f (x )
[ ]
1
f ( x ) = ∫ ( x + y ) dy = xy + 12 y 2
1
0
= x+ 1
2
0
f (y x) =
(x + y ) = 2(x + y )
x + 12 2x + 1
2( x + y )
y
F (y x) = ∫ dy
0
2x + 1
=
2
2x + 1
[
xy + 12 y 2 ]y
0
=
2
2x + 1
( )
xy + 12 y 2 or
2 xy + y 2
2x + 1
x y
Note F ( x , y ) = P[X ≤ x ,Y ≤ y ] = ∫ ∫ f (x , y ) dydx
− ∞− ∞
2(0.6 )(0.5) + 0.25

F (0.5 X = 0.6 ) = = 0.386
2(0.6) + 1
- 23 -
Distribution of continuous variables
Definition
If X1 and X2 are jointly continuous random variables with joint density function f ( X 2 , X 1 )
then the conditional distribution function of X1 given X 2 = x2 is given by
f ( x1 , x 2 )
x1
f (x1 x 2 ) = ∫−∞ f 2 (x2 ) dx1

Example
The random variables X1 and X2 have the following joint probability density function
⎧( x + x 2 ) 0 < x1 < 1; 0 < x2 < 1

f ( x1 , x 2 ) = ⎨ 1
⎩0, elsewhere
a)
i) Find the conditional probability density of X2 given X1, f ( X 2 x1 )
ii) Find P(0 < x 2 < 0.5 x1 = 0.25)
Solution
The conditional probability density function
f ( x1 , x 2 )
f (x 2 x1 ) =
f ( x1 )
f ( x 1 , x 2 ) = x1 + x 2 , 0 < x1 < 1, 0 < x 2 < 1
Now
1
f ( x1 ) = ∫ f ( x1 , x2 )dx2
0
1
= ∫ (x1 + x2 )dx2
0
- 24 -
1
⎡ x2 2 ⎤
= ⎢x x + ⎥
1 2
⎣ 2 ⎦ 0
⎛ 1⎞
= ⎜ x1 + ⎟-0
⎝ 2⎠
Now
f ( x1 , x 2 )
f (x 2 x1 ) =
f 1 (x1 )
⎧ ( x1 + x 2 )
⎪ ,0 < x2 < 1
= ⎨ (x1 + 0.5)
⎪0
⎩ ,otherwise
ii). P(0 < x 2 < 0.5 x1 = 0.25)
0 .5
0 . 25 + x 2
= ∫ 0 .25 + 0 .5 dx
0
2
0 .5
⎡ 1 x2 ⎤
2
= ⎢ (0 .25 x 2 + )⎥
⎣⎢ 0 .75 2 ⎦⎥ 0
1
=
3
Question
If X1 is the total time between a customer’s arrival in the store and leaving the service window
and X2 is the time spent in line before reaching the window, then the joint density of these
variables is given as
⎧e -x1 ,0 ≤ x 2 ≤ 1 < ∞
f (x1 , x 2 ) = ⎨
⎩0 ,elsewhere
a) Find P (x 1 < 2 , x 2 > 1 ) [answer e -1 − 2e −2 ]
b) Find P ( x 1 > 2 x 2 ) [Answer ½]
- 25 -
c) Find P ( x 1 − x 2 ) ≥ 1 (Note the X 1 − X 2 is the time spent at the service window).
Answer [ e-1]
d) If 2 minutes elapse between a customer’s arrival at the store and his departure from the
service window, find the probability that he waited in line less than one minute to reach
the window. [Answer ½]
e) Are X1 and X2 independent variables?
Independent (Stochastic) random variables
Definition
Suppose X1 have distribution function f ( x1 ) , X2 have distribution function f ( x 2 ) and X1 and
X2 have joint distribution function f ( x1 , x 2 ) then X1 and X2 are said to be independent iff
f ( x1 , x 2 ) = f ( x1 ) f ( x 2 ) , for every pair of real numbers ( x1 , x 2 )
Note
1. If X1 and X2 are Discrete random variables with joint probability function P(x1 , x 2 ) and
marginal probability functions P1 ( x1 ) and P2 (x 2 ) respectively, then X1 and X2 are
independent iff P( x1 , x 2 ) = P1 ( x1 )P2 ( x 2 ) for all real numbers ( x1 , x 2 )
2. If X1 and X2 are continuous random variables with a joint density function of f ( x1 , x 2 )
and marginal density function of f1 ( x1 ) and f 2 ( x 2 ) ,respectively, then X 1 and X 2 are
independent iff
f ( x1 , x 2 ) = f1 ( x1 ) f 2 ( x 2 )
For all pairs of real numbers ( X 1 , X 2 )
3. If X1 and X2 are not independent, they are said to be dependent.
- 26 -
Question
Random variables X1 and X2 have the joint probability density function.
⎧4x x ,0 ≤ x1 ≤ 1; 0 ≤ x 2 ≤ 1
f ( x1 , x 2 ) = ⎨ 1 2
⎩0 , elsewhere
1. Show that X1 and X2 are independent.
2. Show that f ( x1 , x 2 ) is a valid probability density function.
Solution
1
f1 ( x1 ) = ∫ f (x1 , x2 )dx2
0
1
= ∫ 4x
0
1 x 2 dx 2
1
⎡ xx 2 ⎤
= ⎢4 1 2 )⎥ = 2 x1 ,0 ≤ x1 ≤ 1
⎣ 2 ⎦ 0
Similarly,
1
f 2 (x2 ) = ∫ f ( x1, x2 )dx1
0
= 2x2 ,0 ≤ x2 ≤1
Hence f ( x1 , x 2 ) = f1 ( x1 ) f 2 ( x 2 ) for any real numbers ( x1 , x 2 ) and therefore X1 and X2 are
independent.
Question
⎧2 ,0 ≤ x 2 ≤ x1 ; 0 ≤ x1 ≤ 1
Let f (x1 , x 2 ) = ⎨
⎩0 , elsewhere
Are X1 and X2 independent? (Show)
- 27 -
Question
Determine whether random variables X, Y are independent if
⎧2e -x - y ,0 < x < y < ∞

f ( x, y ) = ⎨
⎩0 , elsewhere
Solution
Question
Determine whether random variables X, Y are independent if
⎧x + y ,0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f ( x, y ) = ⎨
⎩0 , elsewhere
Solution
Expected Value of a function of random variables
Trinomial distribution
Definition
Discrete random variables X1, X2 are said to have the trinomial distribution with positive
parameters n , P1 + P2 such that n is an integer and 0 < P1 + P2 < 1 , if the joint probability
distribution if X1 and X2 satisfies
⎧
P1 1 P2 2 (1 - p1 - p2 ) 1 2 , x1 , x2 are non- negativeintegersand that x1 + x2 ≤ n
n! x x n- x - x
⎪
P(x1 , x2 ) = ⎨ x1 ! x2 ! (n - x1 - x2 )!
⎪0
⎩ ,elsewhere
Note:
The trinomial distribution is appropriate if in n independent trials
a) On each trial there is a probability P1 of outcome of type O1
b) On each trial there is a probability p2 of an outcome of type O2
c) O1 and O2 are mutually exclusive
- 28 -
Question
Derive the conditional distribution of X1 given X 1 = x2 if X1 and X2 are jointly, trinomial
distribution.
Solution
The marginal distribution of X2 is
n− x2
P2 (x2 ) = ∑ P1 1 P2 2 (1 - p1 - p2 ) 1 2
n! x x n- x - x
x1 =0 x1 ! x2 ! (n - x1 - x2 )!
n− x2
(n - x2 )!
P1 1 (1 - p1 - p2 ) 1 2
n!
= P2 2 ∑
x x n- x - x
x2 ! (n - x2 )! x1 =0 x1 ! (n - x1 - x2 )!
P2 2 (1 - p1 - p2 + p1 ) 2
n!
=
x n- x
x2 ! (n - x2 )!
P2 2 (1 - p2 ) 2
n!
=
x n- x
x2 ! (n - x2 )!
Now
P(x1 , x2 )
P(x1 / x2 ) =
P2 (x2 )
P2 2 P1 1 (1 - p1 - p2 ) 1 2
n! x x n- x - x
x ! x ! (n - x1 - x2 )!
= 1 2
P2 2 (1 - p2 ) 2
n! x n- x
x2 ! (n - x2 )!
(n - x2 )! P1 x (1 - p1 - p 2 )n- x - x
1 1 2
=
x1 ! (n - x1 - x 2 )! (1 - p 2 )n- x 2
x n-x1 -x2
⎛n- x2 ⎞⎛ P1 ⎞ ⎛1- p2 - p1 ⎞
1
= ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟
⎝x1 ⎠⎝1- P2 ⎠ ⎝ 1- p2 ⎠
P1
This is a binomial distribution with parameters n - x2 and
1 - P2
- 29 -
Note
Using similar procedure, the marginal probability distribution function of X1 is
P1 1 (1- p1 ) 1
n!
P1 (x1 ) =
x n- x
x1 ! (n - x1 )!
Since P( x1 , x2 ) ≠ P1 ( x1 )P2 (x2 ) , then X1 and X2 are not independent.
Example
A bag contains three white, two black and four red marbles. Four marbles are drawn at random
with replacement; calculate the probability that the sample contains just one white marble
given that it contains just one red marble.
Example
Discrete random variables X1 and X2 have joint probability distribution function
⎧
⎪ x2 −2λ
⎪ λ e ⎧x = 0,1,2,3,4...x2
P(x1 , x2 ) = ⎨ ,⎨ 1
⎪ x1!(x2 − x1 )! ⎩x2 = 0,1,2,3,....
⎪⎩0, otherwise
Find the marginal distribution of X1 and X2 and the conditional distribution of X1 given X2
Solution
∞
λx e−2λ
2
P1 ( x1 ) = ∑ x !(x
x2 = x1 − x1 )!
1 2
λ x e −2 λ
1 ∞
λx 2 − x1
=
x1 !
∑ (x
x 2 = x1 − x1 )!
2
λ x e −2 λ
1
= eλ
x1!
- 30 -
λx e −λ
1
= x 1 = 0 ,1 , 2 ....
x1 !
X1 has a poison distribution with parameter λ
x2
λ x e −2 λ
2
P2 ( x2 ) = ∑
x1 = 0 x1!( x2 − x1 )!
λ x e −2 λ
2 x2
x2!
=
x2!
∑
x1 = 0 x1 ! ( x 2 − x1 )!
1 x11 x 2 − x1
But
x2
⎛ x2 ⎞ x1 x2 − x1
∑ ⎜⎜ x ⎟⎟1 1 = (1 + 1) 2 = 2 x2
x
x1 = 0 ⎝ 1 ⎠
Thus
P2 ( x 2 ) =
(2λ )
x 2
e −2 λ
, x 2 = 0,1,2,3....
x2 !
Which is a poison distribution with parameters 2λ .Are X1 and X2 independent?
P( x1 , x2 )
P(x1 x2 ) =
P2 ( x2 )
λ x e −2 λ
2
x 1 ! ( x 2 − x 1 )!
=
(2 λ )x 2 e − 2 λ
x2!
x2
⎛1⎞ x2 !
=⎜ ⎟
⎝2⎠ x1!( x 2 − x1 )!
⎛ x2 ⎞
x2
⎛1⎞
=⎜ ⎟ ⎜⎜ ⎟⎟
⎝2⎠ ⎝ x1 ⎠
This is a binomial with parameters X2 and 1/2
- 31 -
Bivariate Expectations
Let the random variables X, Y have joint probability mass function/ density function
(PMF/PDF) f ( x , y ) and marginal (PMF/PDF) f (x ) and f ( y ) respectively. Further let
g ( x , y ) be any function of x and Y. the expected value of g ( x , y ) i.e.
E [g ( x , y )] = ∑ ∑ g ( x , y ) f ( x , y ) Discrete case
all x all y
∞ ∞
E [g ( x , y )] = ∫ ∫ g (x , y ) f (x , y ) dydx Continuous case
− ∞− ∞
CASE 1
g (x , y ) = (x − μ x ) (y − μ y ) where μ x = E [X ] and μ y = E [Y ]
[
E [g ( x , y )] = E ( x − μ x ) ( y − μ y ) ]
= ∑∑ ( x − μ x ) ( y − μ y ) f ( x , y ) (Covariance of X, Y)
Y X
= ∑∑ (xy − yμ x − xμ y + μ x μ y ) f ( x , y )
Y X
= ∑∑ xy f ( x , y ) − ∑∑ yμ x f ( x , y ) − ∑∑ xμ y f (x , y ) + ∑∑ μ x μ y f (x , y )
Y X Y X Y X Y X
= ∑∑ xy f (x , y ) − μ x ∑∑ y f ( x , y ) − μ y ∑∑ x f ( x , y ) + μ x μ y ∑∑ f ( x , y )
Y X Y X Y X Y X
= ∑∑ xy f ( x , y ) − μ x ∑ y f ( y ) − μ y ∑ x f ( x ) + μ x μ y
Y X Y X
= ∑∑ xy f ( x , y ) − μ x μ y − μ y μ x + μ x μ y
Y X
= ∑∑ xy f (x , y ) − μ x μ y
Y X
⎛ ⎞⎛ ⎞
= ∑∑ xy f ( x , y ) − ⎜ ∑ x f ( x )⎟ ⎜ ∑ y f ( y )⎟
Y X ⎝ X ⎠⎝ Y ⎠
Often written as COV ( X ,Y ) = E ( XY ) − E ( X )E (Y )
- 32 -
In special cases where X = Y and μ x = μ y the COV ( X ,Y ) = Var ( X ) = E ( X − μ x )
2
If X ,Y are independent then COV ( X ,Y ) = 0 . But converse is NOT TRUE.
COV ( X ,Y ) = 0 doesn' t imply X ,Y are independent .
If Var ( X ) = σ x2 and Var (Y ) = σ y2 , the correlation coefficient between X any Y denoted by
Cov ( X ,Y )
ρ x ,y =
σ xσ y
Expectation and Bivariate Moment Generating Function
Definition
Let g ( X 1 , X 2 , X 3 , X 4 .... X k ) be a function of the random variables X 1 , X 2 , X 3 , X 4 .... X k with
probability distribution function P ( x1 , x 2 , x3 , x 4 ....x k ) . Then the expected value of
g ( X 1 , X 2 , X 3 , X 4 .... X k ) is
E[ g ( X 1 , X 2 , X 3 , X 4 .... X k )] = ∑∑ ...∑ g ( x1 , x2 , x3 , x4 ....xk )P(x1 , x2 , x3 , x4 ....xk )

xk xk −1 x1
If X 1 , X 2 , X 3 , X 4 .... X k are continuous random variables with PDF f ( X 1 , X 2 , X 3 , X 4 .... X k ) ,
then
E[ g ( X 1 , X 2 , X 3 , X 4 ....X k )] = ∫ ∫ ∫ g (x , x , x , x ....x ) f (x , x , x , x ....x )d

1 2 3 4 k 1 2 3 4 k x1 , d x2 .....d xk
xk xk −1 x1
Note
In this unit we deal with k=2. When k≥2, will be dealt with in probability & Statistic IV
Example
Let X1 and X2 have a joint probability distribution function given by
⎧2 x ,0 ≤ x1 ≤ 1,0 ≤ x2 ≤ 1
f ( x1 , x2 ) = ⎨ 1
⎩0, otherwise
- 33 -
Find
1. F ( x1 / x2 )
2. E ( X 1 X 2 )
3. E ( X 1 )
4. Var( X 1 )
5. Var( X 1 X 2 )
Solution
1. Can be done using the previous method
1 1
2. E ( X 1 , X 2 ) = ∫ ∫ x1 , x2 f ( x1 , x2 )d x1 d x2
0 0
1
1 1
⎡ 3⎤
= ∫ ∫ x1 , x 2 (2 x1 )d x , d x = x 2 x1 d
1
1
∫⎢ 22
⎥ x2
0 0
⎣ 3 ⎦ 0
0
1
⎛2⎞
1
2 ⎡ x2 ⎤
2
= ∫ ⎜ ⎟ x2 d x2 = ⎢
3
0⎝ ⎠ 3 ⎥
⎣ ⎦2 0
1
=
3
1 1
3. E ( X 1 ) = g (x1 , x 2 ) = x1 = ∫ ∫ x1 (2 x1 )d x1 , d x2
0 0
1
1⎡ 2 x13 ⎤ 1
2
= ∫⎢ ⎥d x2 = ∫ 3d x2
0
⎢⎣ 3 ⎥⎦ 0
0
1
2
= ∫
0
3
d x 2 =
2
3
x2 ]
1
- 34 -
2
=
3
( )
1 1
4. E X 1 = ∫∫ x1 f ( x1 , x2 )d x1 , d x2
2 2
0 0
1
1 1 ⎡ 2 x14 ⎤
1
1
⎛1⎞ ⎡ ⎤
1
= ∫ ∫ 2 x1 d x1 , d x2
3
= ∫⎢ ⎥d = ∫ ⎜ ⎟d x2 x2
2 =⎢ ⎥
0⎝ ⎠
x2
0 0
⎣ 4 ⎦
0
0
2
⎣ ⎦0
1
=
2
( )
5. Var ( X 1 ) = E x1 − (E ( x1 ))
2 2
2
1 ⎛2⎞
= −⎜ ⎟ =
1
2 ⎝3⎠ 18
6. Var ( X 1 X 2 ) = ?
Question
The random variables X1 and X2 have the joint probability distribution function given by
⎧2(1 − x1 ),0 ≤ x1 ≤ 1; 0 ≤ x2 ≤ 1
f (x1 , x2 ) = ⎨
⎩0, otherwise
Find
1. F (x1 x2 )
2. E ( X 1 X 2 )
3. E ( X 1 )
4. E ( X 2 )
5. Var( X 2 )
- 35 -
Properties of Expected Value of Random Variables
1. Let C be a constant. The E (C ) = C , where g ( X 1 , X 2 , X 3 , X 4 .... X k ) = C .
2. Let g ( X 1 , X 2 ) be a function of the random variables X 1 , X 2 and C be a constant. Then
E [Cg ( X 1 , X 2 )] = CE [g ( X 1 , X 2 )]
3. Let X 1 and X 2 be random variables with joint probability distribution function
of f ( x1 , x 2 ) . Let g1 ( X 1 , X 2 ) ⋅ ⋅ ⋅ g k ( X 1 , X 2 ) be functions of X 1 and X 2 . Then
E [g1 ( X 1 , X 2 ) + g 2 ( X 1 , X 2 ) + ⋅ ⋅ ⋅ + g k ( X 1 , X 2 )] = Eg1 ( X 1 , X 2 ) + Eg 2 ( X 1 , X 2 ) + ⋅ ⋅ ⋅ + Eg k ( X 1 , X 2 )
Covariance of Two Random Variables
Definition
The covariance of X1 and X2 is defined on the expected value of ( X 1 − μ1 )( X 2 − μ 2 ) . In the
notation form,
Cov( X 1 X 2 ) = E[( X 1 − μ1 )( X 2 − μ 2 )]
Or
E ( X 1 , X 2 ) − E ( X 1 )E ( X 2 ) (Show this)Where μ1 = E ( X 1 ) and μ 2 = E ( X 2 )
Note
The larger the value (absolute value) of the covariance of X1 and X2, the greater the linear
dependence between X1 and X2, Positive values of the covariance indicate that X1 increases as
X2 increases. Negative value indicates X1 decreases as x2 increases. A Zero value indicates no
linear dependence between X1 and X2.
Unfortunately it is difficult to use the covariance as a measure of dependence because, its value
depends upon the scale of measurement and therefore it is hard to determine whether a
particular covariance is larger at first glance.
- 36 -
This problem can be eliminated by standardizing its values, using simple coefficient of linear
correlation.
Definition
Let X1 and X2 be two random variables. The correlation coefficient between X1 and X2 is
defined as
Cov( X 1 , X 2 ) δx1 x2
ρ= =
Var( x1 ) Var(x 2 ) σxσx
1 2
Where σ x1 and σ x are the standard derivation of X1 and X2, respectively.

2
Note
1. − 1 ≤ ρ ≤ 1
2. when ρ = ±1 , then all points fall on a straight line
3. When ρ = 0 , then the covariance is zero and therefore no correlation between the two
variables.
4. when ρ > 0 , it implies that X2 increases as X1 increases
5. when ρ < 0 , it implies that X2 decreases as X1 increases
Example
The joint PDF of X1 and X2 is given by
⎧3x , 0 ≤ x2 ≤ x1 ≤ 1
f ( x1 , x2 ) = ⎨ 1
⎩0, elsewhere
Find
Cov( X 1 , X 2 ) , and correlation coefficient ρ 0
Solution
Cov ( X 1 , X 2 ) = E ( X 1 , X 2 ) − E ( X 1 )E ( X 2 )
- 37 -
1 x1
Now E ( X 1 , X 2 ) = ∫ ∫ x1 x2 (3x1 )d x2 d x1
0 0
1 ⎡ x2 2 ⎤
= ∫ 3x1 ⎢ ⎥d
2
x1
0
⎣ 2 ⎦
1
3 ⎡ x1 ⎤ =
5 3
= ⎢ Check
2 5 ⎥ 10
⎣ ⎦ 0
3 3
E(X 1 ) = , E(X 2 ) =
4 8
3 ⎛ 3 ⎞⎛ 3 ⎞
Then Cov( X 1 , X 2 ) = − ⎜ ⎟⎜ ⎟ = 0.02 check
10 ⎝ 4 ⎠⎝ 8 ⎠
Theorem
Let Y1 , Y 2 , Y3 ....Y n and X 1 , X 2 , X 3 .... X m be random variables with E (Yi ) = μ i and
E( X i ) = ε i
Define
n m
U1 = ∑ aiYi , and U 2 =
i =1
∑b X
j =1
i j
For constants a 1 .... a n , b 1 ... b m
Then the following hold
a) E (U 1 ) = ∑ aU i i
i =1
Var (Yi ) + 2 ∑ ∑ a i a j Cov (Yi , Y j ) where the double sum is over

n
b) Var (U 1 ) = ∑a i
2
i =1 i j
all pairs (i, j ) with i < j .
- 38 -
∑ ∑ a b Cov (Y , X )
n m
c) Cov (U 1 , U 2 ) = i j i j
i j =1
Proof
a) Follows from Probability and Statistics II
⎛ n ⎞ n n
E (U 1 ) = E ⎜ ∑ a i Yi ⎟ = ∑ a i E (Yi ) = ∑ aU i i
⎝ i =1 ⎠ i i
b) The variance is defined as
Var (U 1 ) = E (U 1 ) − (E (U 1 ))
2
2
⎛ n n ⎞
= E ⎜⎜ ∑ a iY i − ∑ a iU i ⎟⎟
⎝ j =1 i=1 ⎠
2
⎛ n ⎞
= E ⎜⎜ ∑ a i (Yi − U i )⎟⎟
⎝ j =1 ⎠
⎡ n
= E ⎢ ∑ a i (Y i − U
2
i )2 + ∑∑ aia j (Y i −U i )(Y j −U j )⎤⎥
⎣ j =1 i≠ j ⎦
)(Y j )
n
∑ a i E (Y i − U )2 ∑∑ a i a j E (Y i − U
2
= i + i −U j
j =1 i≠ j
By definition of variance and covariance, we have
Var (U 1 ) = ∑a Var (Y i ) + ∑∑aa Cov (Y i , Y j )

2
i i j
i i≠ j
Note that
Cov (Yi , Y j ) = Cov (Y j , Yi ) and hence, we can write
Var (U 1 ) = ∑a Var (Y i ) + 2 ∑ ∑aa Cov (Y i , Y j )

2
i i j
i i< j
c) Using similar steps as in (b), we have
- 39 -
Cov (U 1 , U 2 ) = E [(U 1 − E (U 1 ))(U 2 − E (U 2 ))]
⎡⎛ n n
⎞⎛ m m ⎞⎤
= E ⎢ ⎜ ∑ a i Y i − ∑ a iU i ⎟ ⎜⎜ ∑ b j X j − ∑ b j ε ⎟
j ⎟⎥
⎣⎢ ⎝ i i =1 ⎠ ⎝ j =1 j =1 ⎠ ⎦⎥
⎧⎪ ⎡ ⎛ n ⎞⎛ m ⎤⎫
= E ⎨ ⎢ ⎜ ∑ a i (Yi − U i )⎟ ⎜⎜ ∑ b j (X j − ε j )⎞⎟⎟ ⎥ ⎪⎬
⎪⎩ ⎣⎢ ⎝ i ⎠ ⎝ j =1 ⎠ ⎦⎥ ⎪⎭
⎡ n m
= E ⎢ ∑ ∑ a i b j (Y i − U i )(X j − ε j )⎤⎥
⎣ i =1 j =1 ⎦
− U i )(X j − ε )
n m
= ∑ ∑ a b E (Y
i =1 j =1
i j i j
∑ ∑ a b Cov (Y , X )
n m
= i j i j
i =1 j =1
Note:
Cov (Yi , X j ) = Var (Y ) i
Bivariate Normal Distribution
In general, multivariate normal density function would be defined for k continuous random
variables X 1 , X 2 , X 3 .... X k . For this unit, we require K=2 (bivariate) which is defined as
⎧ 1 ⎧ Q⎫
⎪ exp⎨− ⎬ , − ∞ < x1 < ∞, − ∞ < x2 < ∞
f (x1 , x2 ) = ⎨2π ,σ1σ 2 1 − ρ2 ⎩ 2 ⎭1
⎪0, elsewhere
⎩
⎧ ⎡−∞< x1 < ∞,−∞< x2 < ∞,

⎪ 1 ⎧⎪ 1 ⎡(x −μ )2 2ρ(x −μ )(x −μ ) (x −μ )2 ⎤⎫⎪⎢
− + ⎥⎬⎢−∞< μ1 < ∞,
1 1 1 1 2 2 2 2
⎪ exp⎨ ⎢
f (x1, x2) = ⎨2πσ1σ2 1− ρ2 ⎪1− ρ2 ⎢⎣ σ11 σ σ σ ⎥⎦⎪⎭⎢
⎩
⎣−∞< μ1 < ∞,σ1,σ2 > 0 …(*)
11 22 22
⎪
⎪0,elsewhere
⎩
- 40 -
In matrix form, it can be written as
⎧
⎪ 1 ⎧ 1 ⎫
exp⎨− (x −μ)1∑ (x−μ)1⎬, ……………………..…….(**)
−1
f (x) = ⎨ 1
−
⎪2π ⎩ 2 − −
⎭
⎩ ∑ 2
⎡ x1 ⎤
Where x = ⎢ ⎥,−∞ ≤ xi ≤ ∞, i = 1,2
⎣ x2 ⎦
⎡δ 11 δ 21 ⎤ ⎡μ ⎤
∑ Is a 2x2 Variance covariance matrix of X , i.e. ∑ = ⎢δ ⎥
δ 22 ⎦
and μ = ⎢ 1 ⎥
⎣ 12 ⎣μ 2 ⎦
Question
Show that (**) is the same as (*)
Solution
⎡δ 11 δ 21 ⎤
∑ = ⎢δ δ 22 ⎥⎦
⎣ 12
−1
1 ⎡ δ 11 − δ 12 ⎤
∑ =
δ 11δ 22 − δ 12 2
⎢− δ δ 22 ⎥⎦
⎣ 12
But
δ 12
ρ12 = ρ = ⇒ δ 12 = ρ12 δ 11δ 22
δ 11δ 22
Then
−1
1 ⎡ δ 22 − δ12 ⎤
∑ = ⎢
δ11δ 22 (1 − ρ12 2 ) ⎣− δ12 δ11 ⎥⎦
The Standardized square distance becomes
⎡ δ22 −ρ12 δ11δ22 ⎤⎡ x1 −μ1 ⎤

(x−μ)′∑ (x−μ) =(x −μ , x −μ )
−1 1
⎢ ⎥⎢ ⎥
δ11 ⎥⎦⎣x2 −μ2 ⎦
1 1 2 2
δ11δ22 −δ122 ⎢⎣−ρ12 δ11δ22
- 41 -
⎡ ( x1 − μ1 )2 (x2 − μ2 )2
1 (x1 − μ1 ) (x2 − μ2 )2 ⎤
= 2 ⎢
+ + ρ ⎥ …..(***)
1 − ρ12 ⎣⎢ σ11 σ 22
12
σ11 σ 22 ⎦⎥
∑ ( )
= δ11δ 22 − δ12 = δ11δ 22 1 − ρ122 …………………………… (3)
2
Putting (***) and (****) in (**), we get
1 ⎧⎪ 1 ⎡ ( x − μ )2 ( x − μ ) 2 ⎛ ( x1 − μ 1 )2 ⎞⎛ ( x 2 − μ 2 )2 ⎞ ⎤ ⎫⎪
f ( x1 , x 2 ) = exp ⎨− ⎢ 1 1
+ 2 2
− 2 ρ 12 ⎜ ⎟⎜ ⎟⎥ ⎬
(
2π δ 11δ 22 1 − ρ 122 ) (
⎪⎩ 2 1 − ρ 12
2
) ⎢⎣ δ 11 δ 22 ⎜
⎝ δ 11 ⎟⎜
⎠⎝ δ 22 ⎟⎥
⎠ ⎦ ⎪⎭
Note:
If X1 and X2 are uncorrelated, and then the joint PDF of X1 and X2 can be written as the
product of Univariate normal density f ( x1 , x 2 ) = f ( x1 ) f ( x 2 )
Where
1 ⎡ (x1 − μ1 ) ⎛ (x1 − μ1 )( x 2 − μ 2 ) ⎞ (x 2 − μ 2 )2 ⎤
2
Q= ⎢ − 2 ρ12 ⎜⎜ ⎟⎟ + ⎥
(
1 − ρ 2 ⎣ δ12 ) ⎝ δ 1δ 2 ⎠ δ 22 ⎦
The bivariate normal density is a function of five parameters μ1 , μ 2 , δ 1 , δ 2 and ρ . This is

2 2
usually denoted as ( X 1 , X 2 ) ≈ BVN μ1 , μ 2 , δ 1 , δ 2 , ρ ( 2 2

)
Assignment:
(
If ( X 1 , X 2 ) ≈ BVN μ1 , μ2 , δ1 , δ 2 , ρ
2 2
)
( )
Show that, X 1 ≈ N μ1 , δ12 , X 2 ≈ N μ 2 , δ 2 ( 2
)and ρ is the Coefficient of X and X 1 2.
Question
(a) Derive the marginal densities of X1and X2
(b) Find the conditional density function of X1 given X1=X2
- 42 -
Joint Moment Generating Function
The moment generating function, discussed in probability and statistics II can be generalized to
k- dimensional random variables.
Definition
Let X = ( X 1 ,..., X k ) be a vector of k random variables. The MGF of X, if it exists, is defined
as
⎡ ⎛ k ⎞⎤
M X (t ) = E ⎢exp⎜ ∑ t i xi ⎟⎥
⎣ ⎝ i =1 ⎠⎦
Where t = (t1 ,..., t k )
NOTE:
-The bivariate MGF has properties analogous to those of Univariate MGF.
- Mixed moments such as E X ir , X sj ( ) is obtained by differentiating the joint MGF r times
with respect to t i and S times with respect to t j and then setting all t i = t j = 0
- The joint MGF also uniquely determines the joint distribution of variables X 1 ,..., X k
-The MGF of marginal distributors can also be obtained from the joint MGF e.g.
M x (t 1 ) = M x, y (t 1 ,0 )
M y (t 2 ) = M x, y (0, t 2 )
If M x, y (t 1 ,t 2 ) exists, then the Random variables X and Y are independent iff
M x, y (t 1 ,t 2 ) = M x (t 1 )M y (t 2 )
Question ???
Example
Suppose X and Y have density functions f ( x, y ) = λ2 e − λy if 0 < x < y < ∞ . Find the joint mgf.
Solution
- 43 -
∞ y
M x , y (t1 , t 2 ) = E (e t x +t y ) = ∫ ∫ e t x +t y λ2 e −λy dxdy
1 2 1 2
0 0
y y
∞ t1 x ∞ t1 x
e e
= λ2 ∫ e t2 y e −λy dy == λ2 ∫ e − y (λ −t2 ) dy
0
t1 0
t1
0 0
∞
e t1 y − y (λ −t2 ) 1 − y (λ −t2 )
= λ2 ∫ e − e dy
0
t1 t1
∞
λ2 − y ( λ −t1 −t 2 )
= ∫e − e − y (λ −t2 )dy
t1 0
∞
λ2 ⎡ e − y (λ −t −t ) e − y ( λ −t ) ⎤
1 2 2
= ⎢ − ⎥
t1 ⎣ − (λ − t1 − t 2 ) − (λ − t 2 ) ⎦ 0
λ2 ⎡ ⎡⎛ 1 ⎞ ⎛ 1 ⎞⎤⎤
= ⎢(0 − 0) − ⎢⎜⎜ ⎟⎟ − ⎜⎜ ⎟⎟⎥⎥
t1 ⎢⎣ ⎣⎝ − (λ − t1 − t )
2 ⎠ ⎝ − (λ − t )
2 ⎠⎦ ⎥⎦
λ2 ⎡ ⎡⎛ 1 ⎞ ⎛ 1 ⎞⎤ ⎤
= ⎢− ⎢⎜⎜ ⎟⎟ − ⎜⎜ ⎟⎥ ⎥
t1 ⎢⎣ ⎣⎝ − (λ − t1 − t 2 ) ⎠ ⎝ − (λ − t 2 ) ⎟⎠⎦ ⎥⎦
λ2 ⎛ 1 ⎞ ⎛ 1 ⎞
= ⎜⎜ ⎟−⎜ ⎟
t1 ⎝ (λ − t1 − t 2 ) ⎟⎠ ⎜⎝ (λ − t 2 ) ⎟⎠
λ2 ⎛ 1 1 ⎞
= ⎜⎜ − ⎟
t1 ⎝ (λ − t1 − t 2 ) (λ − t 2 ) ⎟⎠
Question
Let f ( x, y ) = ke −2 x −5 y x > 0, y > 0
Find
1. k
2. Find the marginal densities of x and y
- 44 -
3. Find the conditional density of X given y = 6
4. Find E (x y = 1) , E ( y 2 x = 2 )
Question
Let f ( x, y ) = k ( 2x + y ) 0 < x <1, 0 < y < 2
Find
1. k
2. E ( x )
3. E ( y )
4. Var ( x )
5. Var ( y )
6. E ( x, y ) and hence Covariance ( x, y )
7. Correlation coefficient.
Question
A continuous random variable X has the probability density function:
⎧λxe − λx , x>0 ⎫
f(x)= ⎨ ⎬
⎩0 , elsewhere⎭
Find the moment generating function of X. (7 marks)
Solution
∞
( )
M x (t ) = E e tx = ∫ e tx λxe −λx dx
0
∞
∞
− x ( λ −t )
⎡ xe − x (λ −t ) ∞ e − x (λ −t ) ⎤
= λ ∫ xe dx = λ ⎢ −∫ dx ⎥
0 ⎣ − (λ − t ) 0 − (λ − t ) ⎦ 0
∞
⎡ xe − x (λ −t ) e − x (λ −t ) ⎤ ⎡ ⎛ 1 ⎞⎤
= λ⎢ − 2⎥
= λ ⎢ (0 − 0 ) − ⎜ 0 − ⎟⎥
⎣ − (λ − t ) (λ − t ) ⎦ 0 ⎢⎣
⎜
⎝ (λ − t )2 ⎟⎠⎦⎥
- 45 -
⎡ ⎛ 1 ⎞⎤ λ
= λ ⎢− ⎜⎜ 0 − ⎟ =
2 ⎟⎥
⎣⎢ ⎝ (λ − t ) ⎠⎦⎥ (λ − t )2
Conditional Expectations
Conditional Mean and Variance
Let X, Y be random variables with JPMF or JPDF, f ( x , y )
E [X Y ] = ∑ x f ( x y ) for x, y discrete
x
∞
=
−∞
∫ x f (x y ) dx for x, y Continuous
E [X Y ] is called the conditional mean of X, given Y
[
Also Var [X Y ] = E X 2 Y − (E [X Y ]) ] 2
2
⎛ ⎞
= ∑ x f (x y ) − ⎜ ∑ x f (x y )⎟
2
for x, y discrete
x ⎝ x ⎠
2
∞
⎛∞ ⎞
= ∫ x f (x y ) dx − ⎜⎜ ∫ x f (x y ) dx ⎟⎟
2
for x, y Continuous
−∞ ⎝ −∞ ⎠
The expression is called the Conditional variance of X given Y.
Theorem 1
E [E ( X Y )] = E [X ]
Proof (for discrete case)
⎡ ⎤
E [E ( X Y )] = ∑ ⎢∑ x f (x y )⎥ f ( y )
y ⎣ x ⎦
= ∑∑ x f (x y ) f ( y )
y x
- 46 -
f (x , y )
= ∑∑ x f ( y ) = ∑∑ x f ( x , y )
y x f (y) y x
= ∑ x∑ f (x , y ) = ∑ x f (x ) = E[X ]
x y x
Theorem 2
E [X E (Y X )] = E [ XY ]
Proof (for discrete case)
Definition
If X1and X2 any two random variables, the conditional expectation of X1 given X 2 = x 2 is
defined as
E [X 1 X 2 = x2 ] = ∑ x1 P (x1 x2 )X 2 = x2 If X1 and X2 are Discrete

allx1
and
∞
E [X 1 X 2 = x ] = ∫ x f (x x )dx
1 1 2 1 , if X1 and X2 are continuous
−∞
Example:
Suppose X1and X2 are random variables with joint PDF given by
⎧1
⎪ , 0 ≤ x1 ≤ x 2 ;0 ≤ x 2 ≤ 2
f ( x1 , x 2 ) = ⎨ 2
⎪⎩0, elsewhere
Find
1. The conditional expectation of X1 given that X2=1.
2. The MGF of X1 and X2
3. The MGF of X1 given X2=1
Solution
First find f ( x 2 )
- 47 -
x2
x2
1 1 ⎤ x2
f ( x2 ) = ∫ dx1 = x1 =
0 2 2 ⎥⎦ 0
2
⎧1
⎪ , 0 ≤ x1 ≤ x 2 ≤ 2
f (x1 x 2 ) = ⎨ x 2
⎪0,
⎩ elsewhere
Then
∞
E (x1 x 2 = 1) = ∫ x f (x1 1 x 2 )dx1
−∞
1
= ∫ x1 (1)dx1
0
x12 ⎤ 1, because x = 1 < 2

= ⎥ =
2
2
2 ⎦⎥ 0
Theorem
If X and Y are jointly distributed random variables and h(x, y) is a function, then
E[h(x, y)] = Ex (E[h(x, y) x])
The theorem says that a joint expectation, such as the one on the left side of the equation, can
be solved by first finding the conditional expectation. E[h(x, y) x], and then finding its
expectation relative to the marginal distribution of X
Theorem
If X and Y are jointly distributed random variables and g ( x ) is a function then
E[g(x)Y x] = g(x)E(Y x)
Example
If ( X , Y ) ≈ MULT (n, P1 , P2 ) find the Cov ( X , Y )
Solution
- 48 -
By straight forward derivation we have (show this)
( X ) ≈ BIN (n, P1 ) , (Y ) ≈ BIN (n, P2 ) and conditional on X = x ,

Y x ≈ BIN (n − x, P ) , where P = P2
1 − p1
Note
E (Y X ) =
(n − x )P2
1 − P1
Using the later theorem
E ( X , Y ) = E (E ( XY X ))
= E [XE (Y X )]
⎡ X (n − X )P2 ⎤
= E⎢ ⎥
⎣ 1 − P1 ⎦
⎡ P ⎤
[ ( )]
= ⎢ 2 ⎥ nE ( X ) − E X 2 …………………………..(*)
⎣1 − P1 ⎦
2
( )
Now E ( X ) = nP1 and E X = Var ( X ) + (nP1 )
2
= nP1 (1 + (n − 1)P1 )
Therefore * becomes,
E ( XY ) = n(n − 1)P1 P2
Thus Cov ( X , Y ) = E ( X , Y ) − E ( X )E (Y )
= n(n − 1)P1 P2 − (nP1 )(nP2 )
= −nP1 P2
- 49 -
Example
( )
If μ1 = E ( X ) , μ 2 = E (Y ) and E Y x is a linear function of x ,
Show that
δ2
E (Y x ) = μ 2 + ρ (x − μ1 ) and E X (Var (Y x )) = δ 2 2 (1 − ρ 2 )
δ1
Solution
Suppose E (Y x ) = ax + b then
μ 2 = E (Y ) = E X (E (Y x )) = E X (ax + b ) = aμ1 + b
Now
Cov ( X , Y ) δ XY δ
a= = 2 = ρ 2 Where δ XY = ρδ 1δ 2 and
Var ( X 1 ) δ1 δ1
δ2
b = E (Y ) − aE ( X ) = μ 2 − ρ μ1
δ1
Then
E (Y x ) = ax + b
δ2 δ
=ρ x + μ 2 − ρ 2 μ1
δ1 δ1
δ2
= μ2 + ρ ( x − μ1 )
δ1
δ XY = E[( X − μ1 )(Y − μ2 )]
= E [( X − μ1 )(Y )] − 0
= E X {E [( X − μ1 )(Y X )]}
= E X [( X − μ1 )E (Y X )] = E X [( X − μ1 )(ax + b )]
- 50 -
= aδ 12
{(
E X [Var (X Y )] = E X E Y 2 X − (E (Y X )) ) 2
}
{( )
= E Y 2 − E X (E (Y X ))
2
}
( ) {
= E Y 2 − (E (Y )) − E X (E (Y X )) − (E (Y ))
2 2 2
}
= Var (Y ) − VarX (E (Y X ))
δ2
= Var (Y ) − VarX [ μ2 + ρ ( X − μ1 )]
δ1
δ 22 2
= Var (Y ) − ρ δ1 2
δ12
(
= δ2 1− ρ 2
2
)
Theorem
( )
Let X, Y be random Variables with E ( X ) = μ1 and E (Y ) = μ 2 if E Y X is a linear function
of x , Show that
E (Y x ) = μ 2 + ρ
δ2
δ1
(
(x − μ1 ) and E X (Var (Y x )) = δ 2 2 1 − ρ 2 )
Proof
( )
If E Y x = ax + b , then
μ 2 = E (Y ) = E X (E (Y x )) = E X (ax + b ) = aμ1 + b and
δ XY = E[( X − μ1 )(Y − μ 2 )]
= E [( X − μ1 )(Y )] − 0
= E X {E [( X − μ1 )(Y X )]}
- 51 -
= E X [( X − μ1 )E (Y X )] = E X [( X − μ1 )(ax + b )]
= aδ 12
Thus
Cov ( X , Y ) δ XY δ
a= = 2 = ρ 2 and
Var ( X 1 ) δ1 δ1
δ2
b = E (Y ) − aE ( X ) = μ 2 − ρ μ
δ1 1
i.e. E (Y x ) = ax + b
δ2 δ
=ρ x + μ 2 − ρ 2 μ1
δ1 δ1
δ2
= μ2 + ρ ( x − μ1 )
δ1
NOTE:
E (Y X ) is sometimes defined to as regression function i.e. E (Y x ) = ax + b for multiple
regression, have analogous expression.
Conditional Distribution for Bivariate Normal Random Variables
Theorem:
(
If ( X 1 , X 2 ) ≈ BVN μ1 , μ 2 , δ 1 , δ 2 , ρ
2 2
)
Then
i) conditional on X 1 = x1 ,
⎛ δ
(
X 2 X 1 = x1 ≈ N ⎜⎜ μ 2 + ρ 2 ( X 1 − μ1 ), δ 2 1 − ρ 2
δ1
2
)⎞⎟⎟
⎝ ⎠
- 52 -
ii) Conditional on X 2 = x2 ,we have
⎛ δ
(
X 1 X 2 = x 2 ≈ N ⎜⎜ μ1 + ρ 1 ( X 2 − μ 2 ), δ 1 1 − ρ 2
δ2
2
)⎞⎟⎟
⎝ ⎠
Show this!!!
Note
δ1
E ( X 1 X 2 ) = μ1 + ρ ( X 2 − μ 2 ) is sometimes referred to as regression function of
δ2
X 1 X 2 Multiple?
Example
Suppose X1, X2 have the bivariate normal distribution with parameters
3
μ1 = μ 2 = 2, δ1 = δ 2 = 2 and ρ = ,
5
Calculate
i. ρ ( X 1 > 4)
(
ii. ρ X 1 > 4 X 2 = 3 )
Solution
i. X1 is distributed as N (2, 2) , the
⎛ X1 − 2 4 − 2 ⎞
ρ ( X 1 > 4) = P⎜ > ⎟
⎝ 2 2 ⎠
= P(Z > 1) = ?
δ1
ii. μ = μ1 + ρ (X 2 − μ2 )
δ2
3⎛ 2⎞
= 2 + ⎜ ⎟(3 − 2 ) = ?
5⎝2⎠
- 53 -
and
δ = δ1 1 − ρ 2
9
= 2 1−
25
= 1.6
Hence the distribution of X1given X2=3, is N 2.6, 1.6

2
( )
⎛ X 1 − 2.6 4 − 2.6 ⎞
ρ (X 1 > 4 X 2 = 3) = P⎜ > X 2 = 3⎟
⎝ 1.6 1.6 ⎠
⎛ 1.4 ⎞
= P⎜ Z > X 2 = 3⎟ = ?
⎝ 1.6 ⎠
Change of Variable Technique (For Continuous Case)
Suppose we are given the joint PDF of X 1 , X 2 … X p and we wish to determine the joint
distribution of Y1 = g 1 (X 1 , X 2 … X p ),⋅ ⋅ ⋅, Yr = g r (X 1 ...X p ) Where r is some integers such that
(1 ≤ r ≤ P ) if r < P , we introduce additional new random variables,

Yr +1 = g r +1 (X 1 , X 2 … X p )
Y p = g p (X 1 , X 2 … X p )
Then we find the joint distribution of Y1 , Y2 ⋅ ⋅ ⋅ Y p and finally find the marginal distribution of
Y1 , Y2 ⋅ ⋅ ⋅ Yr . The possible introduction of additional random variables makes use of the
following transformation
Y1 = g 1 (X 1 , X 2 … X p ),Y2 = g 2 (X 1 , X 2 … X p )⋅ ⋅ ⋅ Y p = g p (X 1 , X 2 … X p ) ,
whose solution can be written as,
X 1 = w1 (Y1 , Y2 …Y p ),⋅ ⋅ ⋅, X p = w p (Y1 , Y2 …Y p )
- 54 -
The Jacobian of the transformation becomes
⎡ δX 1 δX 1 δX 1 ⎤
⎢ ..... ⎥
⎢ δY1 δY2 δY p ⎥
⎢ . . . ⎥
J =⎢
⎢ . . . ⎥⎥
⎢ δX p δX p δX p ⎥
⎢ δY ......
⎣ 1 δY2 δY p ⎥⎦
The joint PDF of Y1 , Y2 ⋅ ⋅ ⋅ Y p is given by
⎧⎪ f (w1 (Y ), w2 (Y ),...w p (Y ) ) J , Y ∈ R p
g (Y1 , Y2 ,......, Y p ) = ⎨
⎪⎩0, elsewhere
⎡ y1 ⎤
⎢ ⎥
⎢ y2 ⎥
Where y = ⎢. ⎥
⎢ ⎥
⎢. ⎥
⎢y ⎥
⎣ p⎦
Note
In this, unit, P=2 (Bivariate and therefore, we have
⎧ f (w1 (Y1 , Y2 ), w2 (Y1 , Y2 )) J , Y1 ∈ R, Y2 ∈ R

g (Y1 , Y2 ,⋅ ⋅ ⋅, Y p ) = ⎨
⎩0, elsewhere
Example
Suppose the joint distribution of X1and X2 is given by
⎧2(1 − x1 ), 0 ≤ x1 ≤ 1; 0 ≤ x 2 ≤ 1
f ( x1 , x 2 ) = ⎨
⎩0, elsewhere
i) Find the density function of the variable U = X 1 X 2 .
ii) Hence or otherwise find E (U ) and Var (U )
- 55 -
Solution
U = X 1 X 2 , let V = X 1 , then X 1 = V and X 2 = U

V
The Jacobian of transformation is
dX 1 dX 1
J = dv du
dX 2 dX 2
dv du
1 0 1
= 1 1 =
− 2 V
V V
The joint PDF of V and U is
⎧ f (w1 (V ,U ), w2 (V ,U )) J ; V ,U ∈ R
f (V ,U ) = ⎨
⎩0, elsewhere
⎧ 1
⎪2(1 − V ) , 0 ≤ V ≤ 1; 0 ≤ U ≤ V , or 0 ≤ U ≤ V ≤ 1
f (V , U ) = ⎨ V
⎪⎩0, elsewhere
The PDF of U is
α
fU (U ) = ∫α f (v, u )dv
−
1 1
1 ⎛1 ⎞
= ∫ 2(1 − v ) dv == 2 ∫ ⎜ − 1⎟dv
u⎝
u
v v ⎠
= 2⎛⎜
⎝ [ln v −v] ⎞⎟⎠ = 2{ln 1 − 1 − (ln u − u )}
1
⎧2{u − ln u − 1}, 0 ≤ u ≤ 1
=⎨
⎩0, elsewhere
- 56 -
α
E (U ) = ∫ uf u (u )du
−α
α
= ∫ 2u (u − ln u − 1)du
−α
⎧1 1 1
⎫
= 2⎨∫ u 2 du − ∫ u (ln u )du − ∫ udu ⎬
⎩0 0 0 ⎭
⎧⎪ u 3 ⎤ 1 1 u 2 ⎤ ⎫⎪
1
= 2⎨ ⎥ − ∫ u (ln u )du − ⎥ ⎬
⎪⎩ 3 ⎦ 0 0 2 ⎦0 ⎪
⎭
By using integration by parts, the middle integral becomes,
1
1
⎡u 2 ⎤ 1 u2 1
∫0 u(ln u )du = ⎢⎣ 2 (ln u )⎥⎦ − ∫0 2 . u du
0
1
⎡ u2 ⎤ 1
= ⎢0 − ⎥ = −
⎣ 4 ⎦0 4
Thus
⎡1 ⎛ 1 ⎞ 1 ⎤ ⎡1⎤ 1
E (U ) = 2⎢ − ⎜ − ⎟ − ⎥ = 2⎢ ⎥ =
⎣3 ⎝ 4 ⎠ 2 ⎦ ⎣12 ⎦ 6
Question
Let X1and X2 have a joint PDF given by
⎧2 x , 0 ≤ x1 ≤ 1; 0 ≤ x 2 ≤ 1
f (x1 , x 2 ) = ⎨ 1
⎩0, elsewhere
Find the PDF of Y1 = X 1 X 2
Hence or otherwise, find E (Y1 )
- 57 -
Question
Suppose X1and X2 have a joint PDF given by
⎧e − ( x1 + x2 ) ,0 ≤ x1 ,0 ≤ x 2
f ( x1 , x 2 ) = ⎨
⎩0 elsewhere
Find the PDF of Y1 = X 1 + X 2
T –Distribution
Recall from SMA 2230
If X ≈ N μ , σ(2
)
X −μ
Then Z =
σ
If X , is the mean of the sample X 1 , X 2 ,⋅ ⋅ ⋅ X n drawn randomly for normal population with mean
μ and variance σ 2 then
X −μ
Z=
σ
n
If the variance of the population is unknown and n is large, then we replace σ by S,
S =∑
(X − X ) 2
X −μ
i.e. Z = S
n −1
n
If n is small and σ is unknown then we have a t- distributed random variable with n-1 degrees
X −μ
of freedom, i.e. t = S
n
Note
If X 1 , X 2 ,⋅ ⋅ ⋅ X n is a random sample from normal population with E ( X i ) = μ
- 58 -
Then to test the hypothesis
H 0 : μ = μ 0 → (specificValue )
1.
H 1 : μ > μ 0 → (Upper − tail )
Or
H 0 : μ = μ 0 → (specificValue )
2.
H 1 : μ < μ 0 → (lower − tail )
Or
H 0 : μ = μ 0 → (specific Value )
3.
H 1 : μ ≠ μ 0 → (Use − 2 − tailed )
We calculate
X −μ
t= and
S
n
1. Reject H0 in (1.) if t > tα ,n −1
2. Reject H0 in 2. if t < −tα ,n and
3. Reject H0 in 3. If t > tα / 2 (two- tailed rejection region)
Comparing Means of Two Normal Population
Suppose that independent random samples are selected from each of two normal populations ;
X 11 , X 12 , X 13 ... X 1n1 ,from the first and X 21 , X 22 , X 23 ... X 2 n2 from the second ,where the
mean and variance of the i th population are μi and δ i 2 ,i=1,2. Further assume that Xi
2
and S i , i=1, 2 are the corresponding sample means and variances.
- 59 -
1 n1
1 n2
X1 =
n1
∑ X 1i and X 2 =
i n2
∑X
i
2i
The unbiased estimation of the variance is obtained by pooling the sample data to obtain
∑ (X ) ( )
n1 n2
− X 1 + ∑ X 2i − X 1
2 2
1i
S2 = i i
n1 + n2 − 2
=
(n1 − 1)S1 + (n2 − 1)S 2
2 2
n1 + n2 − 2
The test statistic in this case is given by
t=
(X 1 )
− X 2 − (μ1 − μ 2 )
1 1
S +
n1 n2
This has a student’s distribution with n1 + n2 − 2 d.f
To test the null hypothesis H 0 : μ1 − μ 2 = D0 for some fixed value Do, it follows that if Ho is
fine, then the test statistic is
t=
(X 1 − X 2 − (D0 ) )
1 1
S +
n1 n2
Has a t distribution with n1 + n2 − 2 degrees of freedom
Example
In an experiment to test two procedures, the following information was obtained
Standard procedure
n1 = 9 n2 = 9
X 1 = 35.22 sec onds X 2 = 31.56 sec onds
∑ (X ) ∑ (X )
9 9
2 2
1i −X = 195.56 2i − X2 = 160.22
i =1 i =1
- 60 -
Test the hypothesis that the two populations have the same mean.
Take α = 0.05 Level of significance.
Solution
H 0 : μ 1 − μ 2 = 0 , against
H 1 : μ1 − μ 2 ≠ 0
The test statistic is
t=
(X 1 )
− X 2 − (D0 ) D
, 0 =0
1 1
S +
n1 n2
Now
n1 n2
∑ ( X 1i − X 1 ) 2 + ∑ ( X 2i − X 1 ) 2
S2 = i i
n1 + n2 − 2
S2 =
(n1 − 1)S12 + (n2 − 1)S 2 2
n1 + n2 − 2
=
(195.56) + (160.22 ) = 22.24
9+9−2
⇒t=
(X− X2
1
=
)
35.22 − 31.56
= 1.65
1 1 ⎛ 1 1⎞
S + 4.71 ⎜ + ⎟
n1 n2 ⎝9 9⎠
The tabulated t value is t 0 .025 ,16 = 2 . 120 since t calculated =1.65< t 0 .025 ,16 = 2 . 120 we do not
reject Ho this implies there is not sufficient to indicate a difference in the two procedures.
Question
The strength of concrete depends, to some extent, on the method used for drying. Two different
drying methods showed the following results for independently tested specimens
- 61 -
Method 1 Method 2
N1 = 7 N 2 = 10
X 1 = 3250 X 2 = 3240
S 1 = 210 S 2 = 190
1. Do the methods appear to produce concrete with different mean strengths?
Use α = 0.05 .
2. does method 1 produce stronger concrete than method 2
Paired T-Test
It is Based on the differences, procedures same as for univariate t.
Example
An industry in deciding whether to purchase a machine of design A or B, checks the time for
completing a certain task on each machine. Nine technicians were used in the experiment, with
each technician using both machines A and machine B in a randomized order. The time (in
seconds) to completion of the task are given in the table below
Technicians 1 2 3 4 5 6 7 8 9
A 327.6 327.7 327.7 327.9 327.4 327.7 327.8 327.8 327.4
B 327.6 327.7 327.6 327.8 327.4 327.6 327.8 327.7 327.3
Test if there is a significant difference between the completion times at the 5% significance
level.
Solution
In paired t-test, we use the differences,
- 62 -
Sample 1 2 3 4 5 6 7 8 9
di=xa-xb 0 0 0.1 0.1 0 0.1 0 0.1 0.1
Now the hypothesis is
Ho: μ d = 0 verses H1: μ d ≠ 0
If the differences are normally distributed, the test statistic is
−
d − μd
td =
sd / n
Where d =
−
∑ di
=
0 .5
= 0 . 056
n 9
n
(
⎛ di − d 2 ) ⎞⎟ = 0 . 002778
sd
2
= ∑ ⎜⎜ ⎟
i =1 ⎝ n −1 ⎠
Then
0.056
td = = 3.17
0.053 / 9
The tabulated value (two- tailed)
t0.025 ,8 = 2.306
Because t-calculated 3.17 > t 0.025 ,8 = 2.306 ,
We reject the null hypothesis the μ A − μB = 0 .
Conclusion:
The two machines have different mean responses.
Question:
Test if machine A takes longer than machine B
- 63 -
Question
Consider an experiment to test the effects of a particular drug on human pulse rate. Six subjects
are chosen and their pulse rates measured both before and after the treatment, with the
following results.
Subjects 1 2 3 4 5 6
Before 73 69 70 64 69 66
After 78 73 70 69 68 72
Do the pulse rates taken after the stimulus differ significantly from those taken before it?
Take α = 0.05 .
Solution
Hypothesis to be tasted is H 0 : μ d = 0 vs. H1 : μ d ≠ 0 , where μ d is the difference between the
two pulses.
xd 5 4 0 5 -1 6
1 6 (5 + 4 + ... + 6) = 3.17
xd = ∑
6 1
xi =
6
2
sd =
1 6
∑
5 1 5
[ ]
(xi − x )2 = 1 (5 − 3.17 )2 + ⋅ ⋅ ⋅ + (6 − 3.17 )2 = 2.932
s d = 2.93
The test statistic is t =

(x d − 0) 3.17
= = 2.64
sd / n 2.93 / 6
The tabulated value, t5 ,0.025 = 2.571 since t=2.64> t5 ,0.025 = 2.571 reject H 0 .
Conclusion:
- 64 -
The effect of the treatment on pulse rates is significant. It is reasonable to conclude that there
has been an increase in pulse rate after taking the drug.
Chi- Square Distribution
Assume that we have a random sample X 1 , X 2 , X 3 .... X n from a normal distribution with
unknown mean of μ and unknown variance of σ . Suppose we wish to test

2
H 0 : δ 2 = δ 0 For some fixed value δ 0 2 , verses H 1 : δ ≠ δ 0 .

2 2 2
Then under H 0 ,
X2 =
(n − 1)S 2 has a χ 2 distribution with n − 1 degree of freedom.
δ 02
Suppose X 1 , X 2 , X 3 .... X n is a random sample from a normal distribution with E( X i ) = μ
and Var ( X i ) = σ 2 , to test the hypothesis
H 0 : σ 2 = δ 0 against
2
i.
H1 : σ 2 > δ 0 (upper − tail )
2
Or
H 0 : δ 2 = δ 0 against
2
ii.
H1 : δ 2 < δ 0 (lower − tail )
2
Or
H 0 : δ 2 = δ 0 against
2
iii.
H1 : δ 2 ≠ δ 0
2
Calculate the test statistic
X2 =
(n − 1)S 2
δ 02
- 65 -
For (i) reject H 0 if X calculated >
2
χα2 , n −1
For (ii) reject H 0 if X calculated < χ1−α , n −1 (lower tail)

2 2
For (iii) reject H 0 if X calculated < χ α , n −1 or X <

2 2 2
χ 2 α ,n −1
1−
2 2
Example
A machine engine part produced by a company is claimed to have diameter variance no larger
than 0.0002(diameter measured in inches). A random sample of 10 parts gave a sample
variance of 0.0003. Test, at the 5% level, H 0 : σ 2 = 0.0002 against H1 : σ 2 > 0.0002 .
Solution
Assume the measured diameters are normally distributed. The test statistic
(n − 1) s 2
X2 =
δ 02
9(0.0003)
= = 13.5
0.002
The tabulated value is χ 2 0.05,9 = 16.919 since
X 2 calculated < χ 2 0.05 , 9 = 16 .919 We do not reject H 0 .
Conclusion:
There is no enough evidence to indicate that σ 2 exceeds 0.0002.
Question
An experimenter was convinced that his measuring equipment possessed variability, which
resulted in a standard deviation of 2. Sixteen measurements resulted in a value of S 2 = 6.1 . Do
the data disagree with his claim? Take a = 0.01 . What would you conclude if you
choose a = 0.05 .
- 66 -
F-Test
We may be interested in comparing the variance of two normal distributions, and testing to
determine whether or not they are equal. This problem is often encountered when comparing
the precision of two measuring instruments, the variation in quality characteristics of
manufactured product, or the variation in scores for two testing procedures. Example, suppose
x11 , x12 , x13 ....x1n1 and x21 , x22 , x23 ....x2 n2 are independent random samples from normal
distributions with unknown means and var( X 1i ) = σ 12 , var( X 2 i ) = σ 2 2 , where σ 1 and σ 2 are
2 2
unknown. Suppose we are interested to test the null hypothesis, H 0 : σ 1 = σ 2 against the
2 2
H1 : σ 1 > σ 2
2 2
now, σ 1 and σ 2 can be estimated by S1
2 2 2
alternative hypothesis and
2
S 2 respectively
2 2
We would reject H 0 in favour of H1 if S1 is much larger than S 2 i.e. reject H 0 if
(n1 − 1 )s1 2
δ 1 2 (n1 − 1 ) ……………..***
F =
(n 2 − 1 )s 2 2
δ 2 2 (n 2 − 1 )
s1 δ 2
2 2
= >k
s2 δ1
2 2
Where k depends upon the probability distribution of the statistic
2
s1
= 2
s2
(n1 − 1)s1 2 (n2 − 1)s2 2

Note that and are independent chi-square random variables.
δ12 δ 22
Therefore *** has an F- distribution with n1 − 1 numerator degrees of freedom and n2 − 1
- 67 -
2
s
Denominator degrees of freedom. Under the null hypotheses δ1 = δ 2
2 2
, then F = 1 2 has an
s2
F-distribution with n1 − 1 Numerator d.f and we have n2 − 1 denominator d.f.
The rejection region becomes F > Fα , n1 −1, n 2 − 2 i.e. k = Fn1 −1, n 2 − 2 ,α
Example
Consider two random samples, X1 and X2 of sizes 10 and 20 with sample variances given as
0.0003 and 0.0001 respectively. Assuming that the populations, from which the samples have
been drawn, are normal, determine whether the variance of the first population is significantly
greater than the second one. Take a = 0.05 .
Solution
Let δ 12 and δ 2 denote the variances for the first and second population from which the
2
samples were taken. Then the hypothesis to be tested is H 0 : δ1 = δ 2 against H1 : δ1 > δ 2

2 2 2 2
The test statistic is
2
s1
F= 2 , based on V1=9 and V2=19 d.f
s2
Now
2
s1 0.0003
F= 2
= =3
s2 0.0001
The tabulated value is F9,19 , 0.05 = 2.42
Since F-calculated > F9,19, 0.05 = 2.42 we reject the null hypothesis.
Conclusion
The variation of the first population is greater than the second one.
- 68 -
Note:
1. If X1,X2,X3…Xn is a random sample of size n from a normal distribution with mean of
µ and a variance of d2 , then
1
X = ∑ xi is normally distributed with a mean of µ and a
n i
δ2
variance of .
n
Show that……………………………
Xi − μ
2. Suppose X1, X2, X3…Xn is as defined in 1. then Z i = are independent standard
δ
2
⎛ Xi − μ ⎞
n n
normal random variables i=1,2, ……n and ∑ Z i = ∑ ⎜
2
⎟ , is a chi-square
i i ⎝ δ ⎠
distribution with n d.f
3. Let Z be a standard normal random variable and let χ v2 be a chi-square random variable
with v d.f. Then if Z and X2 are independent
Z , is a t-distribution with v d.f.

T=
χ2
v
4. Let χ 1 and χ 2 be chi-square random variables with V1 and V2 d.f, respectively. Then
2 2
if χ12 and χ 22 are independent,
χ12 / V1
F= 2 , is said to have an f distribution with V1 numerator
χ 2 / V2
d.f and V2 denominator d.f
- 69 -
Question
Consider two random samples X1 and X2 of sizes 9 and 5 with sample variances 115 and 24
respectively.
Assuming that the populations, from which the samples have been drawn, are normal,
determine whether the samples could have come from a population with a common variance.
Question
Eight students took two complete sciences practical in successive weeks, and obtained the
following marks out of 20.
Students First Practical Second Practical
1 12 11
2 12 11
3 13 15
4 10 11
5 12 12
6 14 10
7 13 14
8 10 12
Assuming that the marks are normally distributed, carry out a paired sample t-test to determine
whether there is a significant difference between performance in the first and second practical.
Question
For the random variables X and Y, the covariance a matrix is
- 70 -
⎡ 25 − 12⎤
∑ = ⎢− 12 16 ⎥⎦
⎣
Determine the standard deviation of X and of Y. Also the correlation coefficient between x and
Ordered Statistics
Let x 1 , x 2 , x 3 .... x n Denote independent continuous random variables with distribution
function F ( x) and density function f ( x) .
Denote the ordered random variables Xi by x(1) , x( 2) , x(3) ....x( n ) Where
x(1) ≤ x( 2) ≤ x(3) ≤ .... ≤ x( n ) (for continuous random variables, equality signs can be
ignored)
i.e.
x ( 1 ) = Min (x 1 , x 2 , x 3 .... x n )
is the minimum of Xi’s and x ( n ) = Max (x 1 , x 2 , x 3 .... x n ) the maximum of the Xi’s
Question
Find the probability density function for x ( n )
Solution
Because x ( n ) is the maximum of X 1 , X 2 , X 3 .... X n the event
(X (n) ≤ x) will occur, iff the event ( X i ≤ x) occur, for every
i = 1 , 2 , 3 .... n i.e.
P(X (n) ≤ x) = P ( X 1 ≤ x, X 2 ≤ x, X 3 ≤ x ,...... X n ≤ x)
Since Xi’s are independent and P ( X i ≤ x) = F ( x ) for i = 1 , 2 , 3 .... n , it follows
That
- 71 -
P(X (n) ≤ x) = P ( X 1 ≤ x)P ( X 2 ≤ x)P ( X 3 ≤ x )...... P ( X n ≤ x)
= [F (x)] n
X dF ( x ) , then taking derivatives on both

Let f (x) denote the density of
n (n) and f ( x) =
dx
sides, we get
f n ( x) = n[F ( x ) ]
n −1
f ( x)
Question
Find the density function for Y (1 )
Solution
Because X (1 ) is the minimum of X 1 , X 2 , X 3 .... X n Then the events
(X (1 ) > x) occur iff the events ( X i > x ) occur for i = 1 , 2 , 3 .... n .Because
the Xi are independent and
P ( X i > x ) = 1 − P ( X i ≤ x ) = 1 − F ( x ) , for i = 1 , 2 , 3 .... n then
P(X (1 ) ≤ x) = 1 − P ( X 1 > x, X 2 > x, X 3 > x ,......, X n > x)
= 1 − P ( X 1 > x)P ( X 2 > x)P ( X 3 > x )...... P ( X n > x)
= 1 − [1 − F ( x )][1 − F ( x )][1 − F ( x )]......... [1 − F ( x )]
= 1 − [1 − F ( x )]
n
Let f1 ( x ) denote the density of X (1) , then differentiating both sides, we get
f1 ( x ) = n[1 − f ( x ) ]
n−1
f ( x)
- 72 -
Example:
A computer component has length of life X, measured in hours, with probability function
⎧⎛ 1 ⎞ 100 −x
⎪⎜ ⎟ e .......... ..., x > 0

f ( x ) = ⎨⎝ 100 ⎠
⎪0.......... .......... ......., elsewhere
⎩
Suppose that two such components operate in parallel, i.e. the computer does not fail until both
components fail.
Find
1. The density function of Y, the length of life of the computer
2. the median of Y
Solution
Now Y = Max ( X 1 , X 2 ) and f Y ( x ) = n[F ( x ) ] f ( x )

n −1
−x
1 100
But f ( x) = e
100
x −t
1
⇒ F ( x) = ∫0 100 l 100 dt
x
−t
⎤
= −l 100
⎥
⎦0
−x
= 1− l 100
The density function of Y is
f Y ( x ) = n[F ( x )]
n −1
f ( x)
⎡ −x
⎤ 1 100−x
= 2 ⎢1 − e ⎥
100
e ,x > 0
⎣ ⎦ 100
- 73 -
⎧⎛ 1 ⎞⎛ − x −x
⎞
⎪⎜ ⎟⎜⎜ e − e 50 ⎟⎟, x > 0
100
= ⎨⎝ 50 ⎠⎝ ⎠
⎪
⎩0.........................elsewhere
3. To find the median, we first find the distribution of Y.
1 ⎛⎜ −100 ⎞
x t t
−
FY ( x ) = ∫ ⎜ e − e 50 ⎟
⎟ dt
0
50 ⎝ ⎠
1 ⎧⎪ ⎛⎜ −100 ⎞ ⎫⎪
x t x t
−
= ⎨∫ ⎜ e dt − ∫ e 50
dt ⎟⎬
⎟⎪
50 ⎪⎩ 0 ⎝ 0 ⎠⎭
⎧ ⎤
x
⎡ − ⎤ ⎪
x
⎫
1 ⎪
t t
−
= ⎨− 100 e 100
⎥ − ⎢ − 50 e ⎥ ⎬
50
50 ⎪ ⎦ 0 ⎢⎣ ⎦ 0 ⎪⎭
⎩
1 ⎪
⎧
−
x ⎡ −
x ⎤ ⎫⎪
= ⎨ ( −100 e 100 − ( −100 e ) − ⎢ − 50 e 50 − ( −50 e ) ⎥ ⎬
0 0
50 ⎪ ⎢ ⎥⎪
⎩ ⎣ ⎦⎭
1 ⎧⎪ −
x
−
x ⎫⎪
= ⎨100 −100 e 100 + 50 e − 50 ⎬
50
50 ⎪⎩ ⎪⎭
1 ⎧⎪ −
x
−
x ⎫
⎪
= ⎨ 50 + 50 e −100 e
50 100 ⎬
50 ⎪⎩ ⎪⎭
x x
− −
= 1+ e 50
− 2e 100
Thus f(x) =….
Let x0.5 be the x>0 median, then
x 0.5 x 0.5
− −
0 .5 = 1 + e 50
− 2e 100
x0.5
−
Let m = e 100
Then 0.5 = 1 + m 2 − 2m
- 74 -
= m 2 − 2m + 0.5 = 0
b 2 − 4ac
m = −b ±
2a
2 2 − 4 (1 )(0 . 5 )
= +2 ±
2
2
= +2 ± = 1 . 707 or 0 . 293
2
x 0 .5
−
Now m = e 100
= 0.293
x0.5 = 122.758
Question
A computer component has length of life X, measured in hours, with probability density
function,
⎧⎛ 1 ⎞ −100 x
⎪⎜ ⎟ e .......... ..., x > 0

f ( x ) = ⎨⎝ 100 ⎠
⎪0.......... .......... ......., elsewhere
⎩
Suppose that two such components operate independently and in series a certain system i.e.
the system fails when either components fail
Find
1. The density function for Y, the length of Life of the system
2. the median of Y
Solution
Because the system fails at the first component failure Y = Min ( X 1 , X 2 ) Where X 1 and
X 2 are independent random variables with given density. Then, become
f Y ( x ) = n[1 − F ( x )]
n −1
f ( x)
- 75 -
n −1
⎡ −
x
⎤ 1 −100
x
= 2 ⎢1 − (1 − e 100 ) ⎥ e
⎣ ⎦ 100
⎧⎛ 1 − x ⎞
⎪⎜ e 50 ⎟⎟, x > 0
= ⎨⎜⎝ 50 ⎠
⎪
⎩0............., elsewhere
The median of Y,
Revision questions for Probability and Statistics III
1. Let Y1 and Y2 be random variables with mean and Variance (μ , σ ) and (μ ,σ )

1 1
2
2 2
2
respectively, and the correlation coefficient ρ . Show that
E Y2 (var (Y1 Y2 )) = σ 12 1 − ρ 2 . ( )
Solution
{(
EY2 (var (Y1 Y2 )) = EY2 E Y12 Y2 − (E (Y1 Y2 )) ) 2
}
{( )
= E Y12 − EY2 [E (Y1 Y2 )]
2
}
( ) {
= E Y12 − (E (Y1 )) − EY2 [E (Y1 Y2 )] − EY2 (E (Y1 Y2 ))
2 2
}
= Var (Y1 ) − VarY2 [E (Y1 Y2 )]
⎧ δ ⎫
= Var (Y1 ) − VarY2 ⎨μ1 + ρ 1 (Y2 − μ 2 )⎬
⎩ δ2 ⎭
δ 12δ 22
= Var (Y1 ) − ρ 2
δ2
(
= σ 12 1 − ρ 2 )
- 76 -
2. Suppose Y1 and Y2 have bivariate normal distribution with parameters
(μ1 = μ 2 = 2, σ 1 = σ 2 = 4 ) and ρ = 3 . Calculate P(Y 1 > 4 Y2 = 3)

4
Solution
The conditional mean is given by
δ1
μ = μ1 + ρ (Y2 − μ 2 )
δ2
3 4
= 2+ ⋅ (3 − 2)
4 4
= 2.75
The conditional variance is
σ 2 = σ 12 (1 − ρ 2 )
⎛ 9⎞
= 16⎜1 − ⎟
⎝ 16 ⎠
=7
Hence, (Y1 Y2 = 3) ~ N (2.75, 7 )
⎛Y − μ 4− μ ⎞
P(Y1 > 4 Y2 = 3) = P⎜ 1 > ⎟
⎝ σ σ ⎠
⎛ 4 − 2.75 ⎞
= P⎜⎜ Z > ⎟⎟
⎝ 7 ⎠
= P (Z > 0.472 )
= 0.3192
3. Discrete random variables Y1 and Y2 have the joint pdf.
- 77 -
⎧ λ y 2 e −2 λ
⎪ , y1 = 0,1,2... y 2
⎪⎪ y1 ! ( y 2 − y 1 )!
p (Y1 , Y2 ) = ⎨ y 2 = 0,1,2,....
⎪0, Otherwise
⎪
⎪⎩
Find the conditional distribution of Y1 given Y2
Solution
P (Y1 = y1 , Y2 = y 2 )
P (Y1 Y2 = y 2 ) =
P (Y2 = y 2 )
P (Y2 = y 2 ) = ∑ P( y , y ) 1 2
all Y1
y2
λ y e −2 λ
2
= ∑ y !( y
y1 = 0 − y1 )!
1 2
λ y e −2 λ
2 y2
y2!
= ∑ y !( y
y2! y1 = 0 1 2 − y 1 )!
y2
⎛ y2 ⎞
but ∑ ⎜⎜ y ⎟⎟ = (1 + 1)y 2 = 2 y 2
y1 = 0⎝ 1 ⎠
P(y2 ) =
(2λ )
y
e −2 λ
2
, y 2 = 0 ,1, 2 ...
thus
y2!
λ y e −2 λ
2
P (Y1 , Y2 ) y1!( y 2 − y1 )!
P (Y1 Y2 ) = =
P (Y2 ) (2λ ) y2 e −2 λ
y2!
y2
⎛1⎞ y2!
=⎜ ⎟
⎝2⎠ y1!( y 2 − y1 )!
- 78 -
y2 − y1
⎛ y2 ⎞ ⎛ y2 ⎞⎛ 1 ⎞ 2 ⎛ 1 ⎞
y2 y
⎛1⎞
=⎜ ⎟ ⎜⎜ ⎟⎟ = ⎜⎜ ⎟⎟⎜ ⎟ ⎜ ⎟
⎝2⎠ ⎝ y1 ⎠ ⎝ y1 ⎠⎝ 2 ⎠ ⎝ 2 ⎠
1
This is a binomial probability distribution with parameters Y2 and .
2
4. Suppose Y1 and Y2 are independent and exponentially distributed random variables with
2 parameters.
⎧⎪ 1 e − 2 ( y1 + y2 ) y1≥0 , y2 ≥0
1
( )
f y1 , y 2 = ⎨ 4
⎪⎩0 elsewhere
Find the joint PDF of U = Y1 − Y2 and W = Y1 + Y2
Solution
The solutions for Y1 and Y2 in terms of U and W are
1
Y1 = (U + W )
2
1
Y2 = (W − U ) = − 1 (U − W )
2 2
The Jacobian of Transformation is
∂Y1 ∂Y1 1 1
J = ∂U ∂W = 2 2 =1
∂Y2 ∂Y2 1 1 2
−
∂U ∂W 2 2
The joint PDF of U and W is
f (U , W ) = f (w1 (U , W ), w2 (U , W )) ⋅ J
⎧ 1 − 12 w ⎛ 1 ⎞
⎪ e ⎜ ⎟
= ⎨4 ⎝2⎠
⎪0,
⎩ otherwise
- 79 -
⎧ 1 − 12 w
⎪ e , 0 ≤W ≤U
= ⎨8 .
⎪0,
⎩ otherwise
5. Suppose X 1 and X 2 have a joint PDF given by
⎧e− ( x1 + x2 ) , 0 ≤ x1; 0 ≤ x2
f ( x1 , x2 ) = ⎨
⎩0, elsewhere
Find the PDF of Y1 = X 1 + X 2
Solution
Let Y2 = X 2
Therefore X 1 = Y1 − X 2 = Y1 − Y2
∂X 1 ∂X 1
∂Y1 ∂Y2 1 −1
J= = =1
∂X 2 ∂X 2 0 1
∂Y1 ∂Y2
⎧ f (w1 (Y1 , Y2 ), w2 (Y1 , Y2 )) ⋅ J

f (Y1 , Y2 ) = ⎨
⎩0 elsewhere
⎧e − (Y1 −Y2 +Y2 ) ⋅ 1 ( )

− Y1
=⎨ = e
⎩0 elsewhere
Limits 0 ≤ x1 ; 0 ≤ x2
0 ≤ Y1 − Y2 ; 0 ≤ Y2
Y2 ≤ Y1 Hence 0 ≤ Y2 ≤ Y1
- 80 -
Y1
The PDF of g (Y1 ) = ∫ g (Y1 , Y2 )dY2

0
Y1
= ∫ e −Y1 dY2 = e −Y1 [Y2 ]01

Y
⎧Y1e −Y1 ,0 ≤ Y1
g (Y1 ) = ⎨
⎩0 , otherwise
6. Let X 1 and X 2 have a joint PDF given by
⎧2 x , 0 ≤ x1 ≤ 1; 0 ≤ x2 ≤ 1
f ( x1 , x2 ) = ⎨ 1
⎩0, elsewhere
Find the PDF of Y1 = X 1 X 2
Solution
Let Y2 = X 2
Y1 Y
Therefore X 1 = = 1 and X 2 = Y2
X 2 Y2
∂X 1 ∂X 1
1 Y1
∂Y1 ∂Y 2 1
J= = Y2 − Y 22 =
∂X 2 ∂X 2 Y2
0 1
∂Y1 ∂Y 2
⎧ f (w1 (Y1 , Y2 ), w2 (Y1 , Y2 )) ⋅ J

f (Y1 , Y2 ) = ⎨
⎩0 elsewhere
⎧ Y1 1
⎪2 ⋅
= ⎨ Y2 Y2
⎪0 elsewhere
⎩
Limits
0 ≤ x1 ≤ 1; 0 ≤ x2 ≤ 1
- 81 -
Y1
0≤ ≤ 1; 0 ≤ Y2 ≤ 1
Y2
Hence 0 ≤ Y1 ≤ Y2 ≤ 1
Y1
The PDF of g (Y1 ) = ∫ g (Y1 , Y2 )dY2

0
1 1 1
Y1 1
= ∫2 2
⋅dY2 = 2Y1∫ 2
⋅dY2 = 2Y1 ∫ Y2 dY2
−2
Y1 Y2 Y1 Y2 Y1
1 1
⎡ Y −1 ⎤ ⎡1⎤
= 2Y1 ⎢ 2 ⎥ = −2Y1 ⎢ ⎥
⎣ − 1 ⎦ Y1 ⎣ Y2 ⎦ Y1
1
⎡1⎤ ⎧ 1⎫
= −2Y1 ⎢ ⎥ = −2Y1 ⎨1 − ⎬
⎣ Y2 ⎦ Y1 ⎩ Y1 ⎭
⎧2(1 − Y1 ) ,0 ≤ Y1
g (Y1 ) = ⎨
⎩0 , otherwise
Hence or otherwise, find E (Y1 )
1
E (Y1 ) = ∫ Y1 g (Y1 )dY1
0
1
= ∫ Y1.2(1 − Y1 )dY1
0
1
⎡ Y3⎤ 4
= 2 ⎢Y12 − 1 ⎥ =
⎣ 3 ⎦0 3
Question
The joint probability density of X and Y is
⎧k (5 x + y ), 0 < x < 1, 0 < y < 1

f ( x, y ) = ⎨
⎩0, elsewhere
1) Find the value of k
- 82 -
2) Find P(0 < x < 13 , 1
2 < y < 1)
3) Determine the marginal densities of X and Y
4) Find E ( x ) , E ( y ) , E (x 2 ) and E ( y 2 )
5) Calculate the correlation coefficient between X and Y.
Question
The joint probability density function of X and Y is
⎧ke − (4 x +3 y ) , x > 0, y > 0

f ( x, y ) = ⎨
⎩0, elsewhere
1) Find k
2) Determine the joint mgf of X and Y
3) Find the conditional density of Y given X.
4) Are X and Y independent?
Question
Find the probability density function of Y = X 1 + X 2 if the joint probability distribution density
of X 1 and X 2 is
⎧ke − ( x1 + x2 ) , x1 > 0, x2 > 0

f ( x, y ) = ⎨
⎩0, elsewhere
- 83 -
Examination for April 2006
SMA 2231 Probability and stats lll
Question 1
i) a). The joint probability distribution of two random variables X1 and X2 is shown in the
following table
(x1 , x2 ) (0,0) (0,1) (1,0) (1,1) (2,0) (2,1)

f ( x1 , x2 ) 1 / 18 3 / 18 4 / 18 3 / 18 6 / 18 1 / 18
Find
1) The marginal distribution of X1
2) The marginal distribution of X2
3) The marginal distribution of X1 given that X2 =1
4) The marginal distribution of X2 given that X1 =1 10mks
b). A soft drink machine has a random amount Y2 in supply at the beginning of a given day and
dispenses a random amount Y1 during the day (with measurement in gallons). It is not re-
supplied during the day and hence and hence Y1< Y2 have joint density
⎧ 12 , 0 ≤ y1 ≤ y2 , 0 ≤ y2 ≤ 2
f ( y1 , y2 ) = ⎨
⎩0, elsewhere
Find
i) The conditional density of Y1 given Y2 = y2
1
ii) The probability that less than 2
gallon is sold, given that the machine contains 1 gallon
at the start of the day.
iii) P (Y1 ≥ 12 Y2 ≤ 1
4 ) 8 mks
iv) P (Y1 ≤ 12 Y2 = 2 )
c). Suppose that the random variable x1 and x2 have the joint probability density function
- 84 -
⎧12 x x (1 − x1 ), 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1
f ( x1 , x2 ) = ⎨ 1 2
⎩0, elsewhere
Show that x1 and x2 are independent random variables. (5 marks)
d). The random variables X and Y have chi-square distributions with n and m degrees of
freedom respectively where n > m. Find the distribution of X-Y using the method of moment
generating function.
e). If two random variables are independent, are they also un-correlated? Is the converse true?
2. a). Define and explain the following terms:
i) Bivariate distribution
ii) Correlation coefficient
iii) Regression coefficient
iv) Conditional distribution 7 marks
b). X1 and X2 are normally distributed random variables with means and standard deviations
μ1 , δ1 and μ 2 , δ 2 respectively.
i) Using the method of moment generating function, find the probability distribution
of Y=X1-X2. (Write down the probability density function)
ii) If μ1 = 6, μ 2 = 7, δ12 = 1 and δ 22 = 1 find the probability of X1>X2. 8mks
3. a). X1and X2 are independent random variables each with a chi-square distribution with r=2
degrees of freedom. Find the distribution of Y = 1

2 ( X 1 − X 2 ) using transformation of variable
technique.
(Recall that if X is a chi-square random variable with r degrees of freedom then its probability
density function is given by
( 12 ) r
2
x
r
2 −1
e
−x
2 ∞
where Γ ( x ) = ∫ xα −1e − x dx 10mks
Γ (r 2 ) 0
- 85 -
b). Let X1, X2 be a random sample from a distribution having the probability density function
⎧e x , 0 < x < ∞
f (x ) = ⎨
⎩0, elsewhere
If Y1 = X 1 + X 2 and Y2 = X 1 ( X 1 + X 2 ) find the joint distribution of y1 , y2 . (10 marks)
4.a). if a bivariate normal density has the exponent
−
1
102
[
(x + 2)2 − 2.8(x + 2)( y − 1) + 4( y − 1)2 ]
Find the values of
i. The means μ1 and μ 2
ii. The standard deviations δ1 and δ 2
iii. The correlation coefficient, ρ . 10 mks
b). in a certain population of married couples the height X of the husband and the height Y of
the wife have a bivariate normal distribution with parameters μ1 = 5.8 units and μ 2 = 5.3
units. The standard deviations δ1 = δ 2 = 0.2 , and correlation coefficient ρ = 0.6 . Find the
probability that the height of the wife lies between 5.28 and 5.92 units given that the height of
the husband is 6.3 units. 10 mks
5. a). Let U and V be two independent chi-square random variables with respective means r1
and r2.
i. What is the distribution of
U r1
F=
V r2
ii. Write down an expression for finding the marginal density of F.
iii. What is the joint density of U and V?
U r1
iv. Introducing Z=V and F = find the joint density of f and z.
V r2
- 86 -
v. Write down an expression for finding the marginal density of F. 10mks
b). let X be a standard normal random variable and Y a chi square random variable. X and Y
are independent.
X
i. State the distribution of
Y k
X
ii. Consider T = . Introduce Z=Y and find the joint distribution of T and Z.
Y k
iii. Write down an expression for finding the marginal density of T. 10 mks
CAT QUESTIONS
1.Given the function
⎧6 x 2 y, 0 < x < 1, 0 < y <1

f ( x, y ) = ⎨
⎩0, elsewhere
a. Show that f ( x, y ) is a probability density function
b. Calculate the variance of x and the variance of y
c. Find P(0 < x < 34 , 13 < y < 2)
Qz
2. If two random variables X and Y have the joint probability distribution function
⎧ 130 ( x + y ) , for x = 0,1,2,3 and y = 0,1,2.

P ( x, y ) = ⎨
⎩0, elsewhere
a) Show that P( x, y ) satisfy the properties of a discrete joint distribution function 4mks
b) Find the probability that x =3
c) Find the probability that y =1
d) Find F (2,1)
e) Find:
i. The marginal distribution of X
- 87 -
ii. The marginal distribution of Y
Qz
If X is the proportion of persons who will respond to one kind of mail-order solicitation, Y is
the proportion of persons who will respond to another kind of mail-order solicitation and the
joint probability density of X and Y is given by
⎧(2 5)( x + 4 y ), 0 < x < 1, 0 < y < 1

f (x , y ) = ⎨
⎩0, elsewhere
Find
a) The marginal densities of X and Y;
b) The conditional density of Y given that X takes on the value X;
c) The conditional density of X given that Y takes on the value Y.
Qz
Check for each of the following probability densities whether the two random variables are
independent:
⎧(1 81)x12 x 22 , for 0 < x1 < 3, 0 < x 2 < 3

a) f ( x1 , x 2 ) = ⎨
⎩0, elsewhere
⎧(2 81)x12 x 22 , for 0 < x1 < x 2 < 3

b) f ( x1 , x 2 ) = ⎨
⎩0, elsewhere
- 88 -
CAT III
SMA 2231
Q1. Let x1 , x 2 − − x n be a random sample from a normal population with mean μ and
1 n
variance σ 2 . Show that the sample mean x= ∑ xi and
n i
the sample variance
n
( x i − x )2
S2 = ∑ are independent (use sample size n=2).
i n −1
Q2. Let x1 , x 2 − − x n be independently and identically distributed with mean μ and
n
variance σ 2 . Let Q = ∑ ( xi − x ) where x is the sample mean. Find E (Q ) .
2
Q3. The following are observations 9random) from a normal population with mean 22 and
variance 10. They are 25, 17, 23, 20, 18, 15, 24, and 21. Calculate a statistic that is a function
of all observations which has;
a) Standard normal distribution
b) Chi-square distribution with 7 degrees of freedom
c) T distribution with 7 degrees of freedom
Q4.
(a) Define the order statistics
(b) Find the probability density function of Yn = max( x1 , x 2 − − x n ) if x1 , x 2 − − x n is a
1
random sample from the distribution f ( x ) =
−x
e θ,x > 0.
θ
- 89 -

SMA 2231 Probability and Statistics III

Uploaded by

Copyright:

Available Formats

You might also like

SMA 2231 Probability and Statistics III

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SMA 2231 Probability and Statistics III

Uploaded by

Copyright:

Available Formats

SMA 2231

PROBABILITY AND STATISTICS III

1. Bivariate probability mass and distribution function (Discrete case)

2. Bivariate probability density function (continuous case)

3. Joint, marginal and conditional distribution function.

6. Multiple regression and correlation.

7. Bivariate normal distribution.

9. The t, chi-square and F distribution.

10. Distribution of order statistics.

1. Probability and statistical inference by Hogg and Tannis.

2. Introduction to Mathematical statistics by Hogg and Craig.

3. Introduction to the Theory of Statistics, by Mood A. MGraybill.

4. Mathematical statistics with applications by W. mendenhall, D.D Wackerls and R. L.

Let X and Y be discrete random variables, denoted by x a realizable value of X and by y a

Then the function f ( x, y ) = P( X = x, Y = y ) is said to be the joint probability function of X

and Y if it satisfies the following two conditions

The double summation extends over all possible pairs ( x, y ) .

1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)

3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)

4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)

5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6)

6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

interest to the experimenter.

X1: The numbers of dots appearing on dice 1.

X2: The number of the number of dots appearing on dice 2.

X3: the sum of the number of dots on both dice

X4: the product of the number of dots on both dice.

each sample point.

there is a function P( X1 , X 2 ) = P(X1 = x1 ) P(X 2 = x2 ) with the following properties;

probability mass function P ( x1 , x 2 )

P(2 ≤ X 1 ≤ 3, 1 ≤ X 2 ≤ 2) = P(2, 1) + P(2, 2) + P(3, 1) + P(3, 2)

For two discrete variables X1 andX2,

Tossing of two – dice experiment

= P(1,1) + P(1, 2) + P(1, 3) + P(2, 1) + P(2, 2) + P(2, 3)

(a) F (- 1, 2) (b). F (1.5, 2) , (c). F (5, 7 )

b) F (1.5, 2) = P(X 1 ≤ 1.5 , X 2 ≤ 2)

= P(0,0) + P(0,1) + P(0,2) + P(1,0) + P(1,1) + P(1,2)

c) In a similar way as in (b) above.

Continuous Bivariate Distribution

probability density function f ( x1 , x 2 ) if the following are satisfied:-

1. f ( x1 , x 2 ) ≥ 0 on the give domain

Let X1 and X2 be continuous random variables with joint distribution function f ( x1 , x 2 ) . If

there exists a non negative function F ( x1 , x 2 ) such that

variables. The function f ( x1 , x 2 ) is called the Joint probability density function.

a) Sketch the probability density Surface

b) Find F (0.2, 0.4 )

c) Find P(0.1 ≤ x1 ≤ 0.3 ; 0 ≤ x 2 ≤ 0.5)

region (0 ≤ x1 ≤ 0.2 , 0 ≤ x 2 ≤ 0.4 )

iii) P (0.1 ≤ X 1 ≤ 0.3, 0 ≤ X 2 ≤ 0.5 )

This probability corresponds to the volume under f ( x1 , x 2 ) = 1 over the region

(0.1 ≤ x1 ≤ 0.3 , 0 ≤ x2 ≤ 0.5)

c) If a 2 ≥ a1 and b2 ≥ b1 then F (a 2 ,b2 ) - F (a 2 ,b1 ) - F (a1 ,b2 ) + F (a1 ,b1 ) ≥ 0

Suppose X1 and X2 have the joint bivariate PDF given as

Find P (0.25 ≤ X 1 ≤ 0.5, X 2 ≥ 0.25 )

P (0.25 ≤ X 1 ≤ 0.5, X 2 ≥ 0.25 )

win $ 2 or $ 3 respectively. If no head appears, you loss $ 1 (that is win – $ 1)

a) Find the joint probability distribution function of X1 and X2.