SMA 2231 Probability and Statistics III

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 89

SMA 2231

PROBABILITY AND STATISTICS III

COURSE OUTLINE

1. Bivariate probability mass and distribution function (Discrete case)

2. Bivariate probability density function (continuous case)

3. Joint, marginal and conditional distribution function.

4. Bivariate moment generating function (MGF) and change of variable techniques for

bivariate distribution.

5. Stochastic independence.

6. Multiple regression and correlation.

7. Bivariate normal distribution.

8. Independence of sample mean and Variance for normal mean and variance for normal

distribution.

9. The t, chi-square and F distribution.

10. Distribution of order statistics.

References:

1. Probability and statistical inference by Hogg and Tannis.

2. Introduction to Mathematical statistics by Hogg and Craig.

3. Introduction to the Theory of Statistics, by Mood A. MGraybill.

4. Mathematical statistics with applications by W. mendenhall, D.D Wackerls and R. L.

Sheaffer.
Discrete Bivariate Probability Distribution

Let X and Y be discrete random variables, denoted by x a realizable value of X and by y a

realizable value of Y. let the probability that X takes values x and Y takes values y be

denoted by

P ( X = x, Y = y )

Then the function f ( x, y ) = P( X = x, Y = y ) is said to be the joint probability function of X

and Y if it satisfies the following two conditions

i) f ( x, y ) ≥ 0

ii) ∑∑ f (x, y ) = 1
x y

The double summation extends over all possible pairs ( x, y ) .

Example

Consider an experiment of tossing a pair of dice. The sample space contains 36 sample points

corresponding to 6 x 6 =36 ways which the numbers may appear on the faces of the die

2nd dice

1 2 3 4 5 6

1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)


1st dice
2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6)

3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)

4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)

5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6)

6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

-2-
Any of the following random variables could be defined over the sample space and might be of

interest to the experimenter.

X1: The numbers of dots appearing on dice 1.

X2: The number of the number of dots appearing on dice 2.

X3: the sum of the number of dots on both dice

X4: the product of the number of dots on both dice.

The 36 sample points associated with the experiment are equiprobable and correspond to the

36 numerical events ( x1 , x 2 ) . Thus if ones are obtained in the throw of the dice, the simple

event is (1, 1) . Throw a 2 on die 1 and a 3 on die 2 would be the sample event (2, 3) . Because all

pairs ( x1 , x 2 ) occur with the same relative frequency a probability of 1/36 would be assigned to

each sample point.

For this example, the intersection ( x1 , x 2 ) contains only one sample point. Hence the bivariate

probability function is

⎧1
⎪ x1 = 1, 2, 3, 4 ,5, 6 x 2 = 1, 2, 3, 4 ,5, 6
P(x1 , x 2 ) = ⎨ 36
⎪⎩0 Otherwise

Definition:

Let the random variable X1 , X2 take a countable number of pairs of real values ( X 1 , X 2 ) .If

there is a function P( X1 , X 2 ) = P(X1 = x1 ) P(X 2 = x2 ) with the following properties;

a) P( X1 , X 2 ) ≥ 0

b) ∑∑ P( X , X
x1 x2
1 2 ) = 1 and

b d
c) For any constants a, b, c and d , P (a ≤ X 1 ≤ b , and , c ≤ X 2 ≤ d ) = ∑ ∑ P( X 1 , X2)
x1 = a x 2 = c

-3-
Then X1 and X2 are said to have a joint (or bivariate) discrete probability distribution with joint

probability mass function P ( x1 , x 2 )

Example:

Using the results of tossing of two dice, calculate the P(2 ≤ X 1 ≤ 3,1 ≤ X 2 ≤ 2)

Solution:

P(2 ≤ X 1 ≤ 3, 1 ≤ X 2 ≤ 2) = P(2, 1) + P(2, 2) + P(3, 1) + P(3, 2)

4 1
= =
36 9

Definition:

For any random variables X1 and X2, the joint (bivariate) distribution function, F (a, b ) is

given by

F (a, b ) = P ( X 1 ≤ a, X 2 ≤ b )

For two discrete variables X1 andX2,

F (a, b ) = P( X 1 ≤ a, X 2 ≤ b ) = ∑ ∑ P(x x ) 1 2
all x1 all x2

Example

Tossing of two – dice experiment

F (2, 3) = P(X 1 ≤ 2 , X 2 ≤ 3)

= P(1,1) + P(1, 2) + P(1, 3) + P(2, 1) + P(2, 2) + P(2, 3)

1 1 1 1 1 1
= + + + + +
36 36 36 36 36 36

1
=
36

Example

Suppose the random variables X1 and X2 have the following probability distribution.

-4-
X1

0 1 2

0 1 2 1
X2 9 9 9

1 2 2 0
9 9

2 1 0 0
9

Find

(a) F (- 1, 2) (b). F (1.5, 2) , (c). F (5, 7 )

Solution

a) F (-1, 2) = P(X 1 ≤ -1 , X 2 ≤ 2) = P( φ ) = 0

b) F (1.5, 2) = P(X 1 ≤ 1.5 , X 2 ≤ 2)

= P(0,0) + P(0,1) + P(0,2) + P(1,0) + P(1,1) + P(1,2)

=8/9

c) In a similar way as in (b) above.

F (5,7) = P(X 1 ≤ 5 , X 2 ≤ 7) = 1

Continuous Bivariate Distribution

Definition:

The random variables X1, X2 are said to have a bivariate (joint) continuous distribution with

probability density function f ( x1 , x 2 ) if the following are satisfied:-

1. f ( x1 , x 2 ) ≥ 0 on the give domain

-5-
2. f ( x1 , x 2 ) is continuous except along a countable number of points and curves.

∞ ∞
3. ∫ ∫ f (x , x )dx dx
− ∞− ∞
1 2 1 2 =1

b d
4. For constants a, b, c and d, P(a ≤ X 1 ≤ b, and , c ≤ X 2 ≤ d ) = ∫ ∫ f ( x1 , x2 )dx1 dx2
a c

Definition

Let X1 and X2 be continuous random variables with joint distribution function f ( x1 , x 2 ) . If

there exists a non negative function F ( x1 , x 2 ) such that

x1 x 2

F( x1 , x2 ) = ∫ ∫ f (t , t
- ∞- ∞
1 2 )dt2 dt1

For any real numbers x1 and x 2 , then X1 and X2 are said to be jointly continuous random

variables. The function f ( x1 , x 2 ) is called the Joint probability density function.

Example:

Suppose that a radioactive particle is randomly located in a square with sides of unit length.

That is, if two regions of equal areas are considered, the particle is equally likely to be in

either. Let X1 and X2 denote the co ordinates locating the particles. A reasonable model for the

relative frequency histogram for X1 and X2 would be the bivariate analogues to the univariate

uniform distribution.

⎧1 ,0 ≤ x1 ≤ 1 , 0 ≤ x 2 ≤ 1
f ( x1 , x 2 ) = ⎨
⎩0 ,Otherwise

a) Sketch the probability density Surface

b) Find F (0.2, 0.4 )

c) Find P(0.1 ≤ x1 ≤ 0.3 ; 0 ≤ x 2 ≤ 0.5)

-6-
Solution

i) Diagram

0.40.2
ii) F (0.2, 0.4 ) = ∫ ∫ f (x , x )dx dx 1 2 1 2
-∞ -∞

0.40.2
= ∫ ∫ f (1)dx dx
- ∞ -∞
1 2

0 .4

∫ [x ]
0 .4
= 1 0 dx 2
-∞

0 .4
= ∫ 0 . 2 dx
-∞
2

= [0.2 x2 ]0
0.4

= 0.08

Note: The probability F (0.2, 0.4) corresponds to the volume under f (x1 , x 2 ) = 1 over the

region (0 ≤ x1 ≤ 0.2 , 0 ≤ x 2 ≤ 0.4 )

iii) P (0.1 ≤ X 1 ≤ 0.3, 0 ≤ X 2 ≤ 0.5 )

0.50.3
= ∫ ∫ f ((x , x ) )dx dx
0 0.1
1 2 1 2

0.50.3
= ∫ ∫ dx dx
0 0.1
1 2

= 0.1

This probability corresponds to the volume under f ( x1 , x 2 ) = 1 over the region

(0.1 ≤ x1 ≤ 0.3 , 0 ≤ x2 ≤ 0.5)


The properties of a bivariate cumulative distribution function are given in the following

theorem

-7-
Theorem

Let X1 and X2 be random variables discrete or continuous with the joint distribution

function f ( x1 , x 2 ) , then

a) F (− ∞,−∞ ) = F (− ∞, X 2 ) = F ( X 1 ,−∞ ) = 0

b) F (∞, ∞ ) = 1

c) If a 2 ≥ a1 and b2 ≥ b1 then F (a 2 ,b2 ) - F (a 2 ,b1 ) - F (a1 ,b2 ) + F (a1 ,b1 ) ≥ 0

Example

Suppose X1 and X2 have the joint bivariate PDF given as

⎧3x ,0 ≤ x 2 ≤ x1 ≤ 1
f ( x1 , x 2 ) = ⎨ 1
⎩0 ,Otherwise

Find P (0.25 ≤ X 1 ≤ 0.5, X 2 ≥ 0.25 )

Solution

P (0.25 ≤ X 1 ≤ 0.5, X 2 ≥ 0.25 )

0 .5 x1

= ∫ ∫ 3x
0 . 25 0 . 25
1 dx 2 dx 1

0.5
= ∫ 3 x [x ]
x1
1 2 0.25 dx1
0.25

0.5
1
= ∫
0 .25
3 x1 [ x1 − ]dx1
4

0 .5
⎡ 3 3 2⎤
= ⎢ x1 − x1 ⎥
⎣ 8 ⎦ 0.25

⎡ 1 3 ⎛ 1 ⎞⎤ ⎡ 1 3 ⎛ 1 ⎞⎤
= ⎢ − ⎜ ⎟⎥ ⎢ − ⎜ ⎟⎥
⎣ 8 8 ⎝ 4 ⎠ ⎦ ⎣ 64 8 ⎝ 16 ⎠ ⎦

-8-
5
=
128

Question

Three fair coins are tossed independently one of the variables of interest is X1 = the umber of

heads. Let X2 denote the amount of money won on a side set in the following manner. If the

first head occurs on the first toss, you win $ 1. If the first head occurs on toss 2 or on toss 3 you

win $ 2 or $ 3 respectively. If no head appears, you loss $ 1 (that is win – $ 1)

a) Find the joint probability distribution function of X1 and X2.

b) What is the probability that less than three heads occur and you win $ 1 or less? (I.e.

F (2, 1)

Solution

X1

a)
0 1 2 3

-1 1 0 0 0
8

1 0 1 2 1
8 8 8
X2
2 0 1 1 0
8 8

3 0 1 0 0
8

i)

ii) F (2 ,1) = 1
2

Question

Let X1 and X2 have the joint PDF given by

-9-
⎧k (1 - x 2 ) ,0 ≤ x1 ≤ x 2 ≤ 1
f (x1 , x 2 ) = ⎨
⎩0 ,Otherwise

a) Find the values of K that makes this a PDF

1 x2

∫ ∫ k (1 − x
0 0
2 )dx 1 dx 2 = 1

b) Find P ( X 1 ≤ 0.75, X 2 ≥ 0.5 ) answer K=6, b=31/64

Question

Let X1 and X2 denote the proportions of time, out of one working day, that employee A and B,

respectively, actually spend performing their assigned tasks. The joint relative frequency

behavior of X1 and X2 is modeled by the density function.

⎧ x + x2 ,0 ≤ x1 ≤ 1;0 ≤ x 2 ≤ 1
f (x1 , x 2 ) = ⎨ 1
⎩0 ,elsewhere

a) Find P ( X 1 ≤ 0.5, X 2 ≥ 0.25 ) answer 21/64

b) Find P ( X 1 + X 2 ≤ 1) answer 1/3

Question

A joint probability distribution function for random variables X, Y is given by

⎧6xy 2 ,0 ≤ x ≤ 1 , 0≤ y ≤1
f ( x, y ) = ⎨
⎩0 , elsewhere

i) Check that f ( x , y ) is a probability density function

ii) Find P[0 ≤ x ≤ 0.5 and 0 .5 ≤ y ≤ 0.75]

iii) Find P(x + y ≥ 1)

Solution

Check the conditions given above

- 10 -
∞ ∞
i) ∫ ∫ f (x , x )dx dx
− ∞− ∞
1 2 1 2 =1

ii) Required probability

∫ [2 xy ]
0.5 0.75 0.5
3 0.75
= ∫ ∫ 6 xy dydx =
2
0.5 dx
0 0.5 0

0.5 0.5
= ∫ 2 x [ 27 ] [19 ]
64 − 8 dx = ∫ 2 x 64 dx
1

0 0

= x2[ 19
0.5
64 0
] = 19
64 ⋅ 14 = 19
256

Question

Find k for f ( x, y ) to be a density function of x and y .

i) f ( x, y ) = ⎨
(
⎧k x 2 + 2 y ) 0 < x < 1; 1 < y < 3
⎩0 elsewhere

(
⎧k x + e 2 x
ii) f ( x, y ) = ⎨
) 0 < x < 2; 0 < y < 1
⎩0 elsewhere

iii) For the above , find P[0.5 < x < 1 and 0 .5 ≤ y ≤ 1]

Question

Show that f ( x, y ) is a joint density function

⎧3
⎪ x( y + x ) 0 < x < 1; 0 < y < 2
f ( x, y ) = ⎨ 5
⎪⎩0 elsewhere

Question

⎧( y + x ) 0 < x < 1; 0 < y < 1


Let f ( x, y ) = ⎨
⎩0 elsewhere

Find F (x, y )

Check div book

- 11 -
Question

( )(
⎧ 1 − e−x 1 − e−y
f ( x, y ) = ⎨
)
x > 0; y > 0
⎩0 elsewhere

Find the density of ( x, y ) and P[1 < x < 2 and 3 < y < 5 ]

Solutions

Marginal Probability Function

Definition

Let X1 and X2 be jointly discrete random variables with probability function P (x1 , x 2 ) . Then

the marginal probability functions of X1 and X2, respectively, are given by

P1 ( x1 ) = ∑ P(x x ) , and P (x ) = ∑ P(x x )


1 2 2 2 1 2
all x2 all x1

Similarly, If X1 and X2 are jointly continuous random variables with joint density

function f (x1 , x 2 ) , then the marginal density function of X1 and X2, respectively are given by

∞ ∞

f 1 ( x1 ) = ∫ f (x1 x 2 )dx 2 , and f 2 ( x 2 ) = ∫ f (x x )dx


1 2 1
-∞ -∞

Example

For the joint PMF between X and Y,

⎧a( y + 3x + 1) , x = 0, 1, 2; y = 1, 3
f ( x, y ) = ⎨
⎩0 ,elsewhere

i) Evaluate the constant a

ii) Find the marginal PMFs and Var (x )

- 12 -
Solution

0 1 2 P1(y)

1 2a 5a 8a 15a
Y
3 4a 7a 10a 21a

P1(x) 6a 12a 18a 36a

36a = 1

1
a=
36

The marginal PMF f ( x ) is given by

6 1 ⎫
f (0 ) = 6a = =
36 6 ⎪
⎪ ⎧ 6( x + 1) (x + 1)
12 1 ⎪ ⎪ = for x = 0, 1, 2
f (1) = 12a = = ⎬ f ( x ) = ⎨ 36 6
36 3 ⎪ ⎪⎩0 elsewhere
18 1 ⎪
f (2 ) = 18a = =
36 2 ⎪⎭

The marginal PMF f ( y ) is given by

15 5 ⎫
f (1) = 15a = =
36 12 ⎪⎪ ⎧
⎬ f (y) = ⎨
21 7 ⎪ ⎩ checkk
f (3) = 21a = =
36 12 ⎪⎭

E (x ) =
4
3
( )
E x2 =
7
3

Var ( x ) = E ( x ) − (E ( x ))
2 2

7 16 5 35
Var (x ) = − = Check Var ( y ) =
3 9 9 36

- 13 -
Example

The probability distribution of X1 and X2 is given below

X1

0 1 2 P2(x2)

0 0 3/15 3/15 6/15

1 2/15 6/15 0 8/15


X2
2 1/15 0 0 1/15

P1(x1) 3/15 9/15 3/15 1

Find the marginal probability distribution function of

a) X1

b) X2

Solution

I.

X1 0 1 2

P1(x1) 3/15 9/15 3/15

II.

X2 0 1 2

P2(x2) 6/15 8/15 1/15

Example

The random variables X and Y have the joint distribution

- 14 -
X

1 2 3 P1(y)

2 0.5 1/6 1/12 1/3


Y
3 1/6 0 1/6 1/3

4 0 1/3 0 1/3

P1(x) 1/4 1/2 1/4 1

1 1
i) Find the marginal values of PMFs F2 (4), F1 (2 ) [ F2 (4 ) = F1 (2 ) = ]
3 2

ii) Are X, Y independent? [NO f (2,4 ) = 13 ≠ f1 (2 ) f 2 (4 ) = 16 ]

Example

The random variables X, Y have the joint PMF

⎧x + y
⎪ , for x = 1, 2, 3; y = 1, 2
f (x, y ) = ⎨ 21
⎪⎩0 , elsewhere

Find the P( x = 1), P( x = 2), P ( x = 3), P ( y = 1), P(2 = 1)

Can you find the formula for the marginal PMFs f 2 ( y ), f1 ( x )

3y + 6 y + 2 2x + 3
Answer: [ f 2 ( y ) = = f1 (x ) =
21 7 21

Example

The joint PMF

⎧ λ x + y e −2 λ
⎪ x = 0, 1, 2; y = 0, 1, 2
f ( x, y ) = ⎨ x! y!
⎪0
⎩ elsewhere

- 15 -
Example

Let X1 and X2 have probability density function

⎧2x ,0 ≤ x1 ≤ 1;0 ≤ x 2 ≤ 1
f (x1 , x 2 ) = ⎨ 1
⎩0 ,elsewhere

Sketch f ( x1 , x2 ) and find the marginal density functions for X1 and X2

Solution

Marginal density are given by


f 1 ( x1 ) =
-∞
∫ f( x x 1 2 )dx 2

1
= ∫2x
0
1 dx 2

= [x1 x 2 ]0 = 2x1
1

⎧2 x ,0 ≤ x1 ≤ 1
⇒ f 1 ( x1 ) = ⎨ 1
⎩0 , elsewhere

Similarly


f 2 ( x2 ) = ∫ f( x x
-∞
1 2 )dx1

1
= ∫2x
0
1 dx 1

=x1
2 1
0
] =1
⎧1 ,0 ≤ x 2 ≤ 1
⇒ f 2 ( x2 ) = ⎨
⎩0 ,elsewhere

- 16 -
Conditional Distribution Function

Definition

Suppose X1 and X2 are jointly discrete random variables with probability function

P ( x1 , x 2 ) and marginal probability functions P1 (x1 ) and P2 ( x 2 ) respectively. Then the

conditional discrete probability function of X1 given X2 is

P (x1 x 2 ) = P ( X 1 = x1 X 2 = x 2 )

P( X 1 = x1 X 2 = x 2 )
=
P( X 2 = x 2 )

P(x1 , x 2 ) f (x1 , x 2 )
= = Provided P2 ( x 2 ) > 0
P( x 2 ) f (x 2 )

Similarly the conditional probability function of X 2 = x 2 given X 1 = x1 is

P (x1 , x 2 ) f ( x1 , x 2 )
P (x 2 x1 ) = = Provided P ( x1 ) > 0
P (x1 ) f ( x1 )

Special cases

If ( X , Y ) are independent then

f ( x, y ) f ( x ) f ( y )
f (x y ) = = = f (x ) .
f (y) f (y)

Similarly

f ( x, y ) f ( x ) f ( y )
f (y x) = = = f (y)
f (x ) f (x )

- 17 -
Example

Consider the distribution of X1 and X2 given below

0 1 2 P2(x2)

0 0 3/15 3/15 6/15

1 2/15 6/15 0 8/15

2 1/15 0 0 1/15

P1(x1) 3/15 9/15 3/15 1

a) Find the conditional distribution of X1 given that (i). X2=1, (ii). X2=2

b) Find the conditional distribution of X2 given that X1=1

Solution

P(x1 , x2 )
(
a) (i). P x1 x2 =) where x2 =1
P2 (x2 )

P( x1 , x2 = 1)
P(x1 x2 = 1) =
P2 ( x2 = 1)

P(x1 ,1)
=
P2 (1)

2
P(0,1) 15 1
= =
P2 (1) 8 4
15

6
P(1,1) 15 3
= =
P2 (1) 8 4
15

- 18 -
0
P(2,1) 15
= =0
P2 (1) 8
15

X1 0 1 2

P(x1/x2=1) 1/4 3/4 0

b) Complete

Definition

Let X1 and X2 be jointly continuous random variables with joint density f (x1 , x 2 ) and

marginal densities f ( x1 ) and f (x 2 ) respectively. Then the conditional density of X1 given

X 2 = x 2 is given by

⎧ f ( x1 , x2 )
⎪ , f 2 (x2 ) > 0
f1 (x1 x2 ) = ⎨ f 2 ( x2 )
⎪0
⎩ ,elsewhere

and the conditional density of X2 given X 1 = x1 is given by

⎧ f ( x1 , x 2 )
⎪ , f1 (x1 ) > 0
f 2 (x 2 x1 ) = ⎨ f 1 ( x1 )
⎪0
⎩ ,elsewhere

Example

Suppose X1 and X2 have the joint PDF

⎧1
⎪ ,0 ≤ x1 ≤ x2 , 0 ≤ x2 ≤ 2
f ( x2 , x1 ) = ⎨ 2
⎪⎩0 ,elsewhere

Find

a) The conditional density of X1 given X 2 = x 2 and evaluate P ( X 1 ≤ 0.5/X 2 = 1)

- 19 -
Solution

The marginal density of X2 is given by


f 2 (x 2 ) = ∫ f (x x )dx 1 2 1
-∞

1
1 1
= ∫
0
2
dx 1 =
2
x2

⎧1
⎪ x , x1 ≤ x2 ≤ 2
i.e. f 2 (x 2 ) = ⎨ 2 2
⎪⎩0 ,elsewhere

f ( x1 , x 2 )
By definition, f (x1 x 2 ) =
f 2 (x 2 )

1
2 1
=
1
(x2 ) x2
2

⎧1
⎪ , 0 ≤ x1 ≤ x 2 ≤ 2
f 1 ( x1 x 2 ) = ⎨ x 2
⎪0, , elsewhere

Now

0.5
P ( X 1 ≤ 0.5 X 2 = 1) = ∫ f (x 1 x 2 = 1)dx1
-∞

0.5
1 1
∫ 1 dx
-∞
1 =
2

Example

In a group of nine executives of a certain business firm, four are married, three have never

married and two are divorced. Three of the executives are to be selected for promotion. Let X1

denote the number of married executives and X2 the number of never married executives

- 20 -
among the three selected for promotion. Assuming that the three are randomly selected from

the nine available,

a) Find the joint probability distribution of X1 and X2

b) Find the marginal probability distribution of X1, the number of married executives

among the three selected

c) Find (i). P (X 1 = 1 X 2 = 2 ) (ii) P ( X 2 = 2 X 1 = 1)

d) Let X3 denote the number of divorced executives among the three selected for

promotion, then X 3 = 3 − X 1 − X 2 . Find P ( X 3 = 1 X 2 = 1)

Solution

a) The joint probability distribution is a hyper geometric distribution with

N = 9, n = 3, r1 = 4, r2 = 3, r3 = 2 , r = r1 + r2 + r3 = N

⎧ ⎛ 4 ⎞⎛ 3 ⎞⎛ 2 ⎞
⎪ ⎜⎜ ⎟⎟⎜⎜ ⎟⎟⎜⎜ ⎟⎟
⎪⎪ ⎝ x1 ⎠⎝ x 2 ⎠ ⎝ 3 − x1 − x 2 ⎠ ⎧0 ≤ x1 ≤ 3

P ( x1 , x 2 ) = ⎨ ⎛9⎞ ⎨0 ≤ x 2 ≤ 3, and
⎪ ⎜⎜ ⎟⎟ ⎪0 ≤ x + x ≤ 3
⎪ ⎝3⎠ ⎩ 1 2

⎪⎩0, elsewhere

3
b) P1 ( x1 ) = ∑ P(x , x ) = ∑ P(x , x )
1 2 1 2
all x2 x =0

Then

X1 0 1 2 3

P1 (x1 ) 5 20 15 2
42 42 42 42

P( x1 = 1, x 2 = 2 )
c) (i) P1 (x1 = 1 X 2 = 2 ) =
P2 (x 2 = 2 )

- 21 -
= P2 ( x2 = 2 ) = ∑ p ( X ,2 )
1
all x1

2
=
3

ii. = P (x 2 = 2 X 1 = 1) = ?

P( X 3 = 1, X 2 = 1)
a. P( X 3 = 1 X 2 = 1) =
P2 ( X 2 = 1)

Note that. X 1 = 1, X 2 = 1, X 3 = 1

⎛ 4 ⎞⎛ 3 ⎞⎛ 2 ⎞
⎜⎜ ⎟⎟⎜⎜ ⎟⎟⎜⎜ ⎟⎟
1 1 1
P(x3 = 1, X 2 = 1) = ⎝ ⎠⎝ ⎠⎝ ⎠
⎛9⎞
⎜⎜ ⎟⎟
⎝3⎠

P2 (x 2 = 1) = ∑ P( X 1 ,1) 8
allx1 =
15
=?

Question

For X, Y having a joint probability distribution functions

⎧e − y 0 < x < y < ∞


f (x , y ) = ⎨
⎩0 elsewhere

Determine the conditional PDF for y given x

Solution

f (x , y )
f (y x) =
f (x )

∞ ∞
f (x ) = ∫ f (x , y ) dy = ∫ e
y
dy
−∞ x

[
= − e −y ] ∞
x

= e−x

- 22 -
The conditional PDF for y given x is

f (x , y ) e − y
f (y x) = = − x = e x − y for y > x ,0 < x < y < ∞
f (x ) e

Question

For random variables X, Y having a joint probability distribution function

⎧x + y 0 < x < 1; 0 < y < 1


f (x , y ) = ⎨
⎩0 elsewhere

Determine the conditional PDF for y given x and P ( y ≤ 0.5 x = 0.6 )

Solution

f (x , y )
f (y x) =
f (x )

[ ]
1
f ( x ) = ∫ ( x + y ) dy = xy + 12 y 2
1
0
= x+ 1
2
0

f (y x) =
(x + y ) = 2(x + y )
x + 12 2x + 1

2( x + y )
y

F (y x) = ∫ dy
0
2x + 1

=
2
2x + 1
[
xy + 12 y 2 ]y
0
=
2
2x + 1
( )
xy + 12 y 2 or
2 xy + y 2
2x + 1

x y
Note F ( x , y ) = P[X ≤ x ,Y ≤ y ] = ∫ ∫ f (x , y ) dydx
− ∞− ∞

2(0.6 )(0.5) + 0.25


F (0.5 X = 0.6 ) = = 0.386
2(0.6) + 1

- 23 -
Distribution of continuous variables

Definition

If X1 and X2 are jointly continuous random variables with joint density function f ( X 2 , X 1 )

then the conditional distribution function of X1 given X 2 = x2 is given by

f ( x1 , x 2 )
x1

f (x1 x 2 ) = ∫−∞ f 2 (x2 ) dx1


Example

The random variables X1 and X2 have the following joint probability density function

⎧( x + x 2 ) 0 < x1 < 1; 0 < x2 < 1


f ( x1 , x 2 ) = ⎨ 1
⎩0, elsewhere

a)

i) Find the conditional probability density of X2 given X1, f ( X 2 x1 )

ii) Find P(0 < x 2 < 0.5 x1 = 0.25)

Solution

The conditional probability density function

f ( x1 , x 2 )
f (x 2 x1 ) =
f ( x1 )

f ( x 1 , x 2 ) = x1 + x 2 , 0 < x1 < 1, 0 < x 2 < 1

Now

1
f ( x1 ) = ∫ f ( x1 , x2 )dx2
0

1
= ∫ (x1 + x2 )dx2
0

- 24 -
1
⎡ x2 2 ⎤
= ⎢x x + ⎥
1 2
⎣ 2 ⎦ 0

⎛ 1⎞
= ⎜ x1 + ⎟-0
⎝ 2⎠

Now

f ( x1 , x 2 )
f (x 2 x1 ) =
f 1 (x1 )

⎧ ( x1 + x 2 )
⎪ ,0 < x2 < 1
= ⎨ (x1 + 0.5)
⎪0
⎩ ,otherwise

ii). P(0 < x 2 < 0.5 x1 = 0.25)

0 .5
0 . 25 + x 2
= ∫ 0 .25 + 0 .5 dx
0
2

0 .5
⎡ 1 x2 ⎤
2
= ⎢ (0 .25 x 2 + )⎥
⎣⎢ 0 .75 2 ⎦⎥ 0

1
=
3

Question

If X1 is the total time between a customer’s arrival in the store and leaving the service window

and X2 is the time spent in line before reaching the window, then the joint density of these

variables is given as

⎧e -x1 ,0 ≤ x 2 ≤ 1 < ∞
f (x1 , x 2 ) = ⎨
⎩0 ,elsewhere

a) Find P (x 1 < 2 , x 2 > 1 ) [answer e -1 − 2e −2 ]

b) Find P ( x 1 > 2 x 2 ) [Answer ½]

- 25 -
c) Find P ( x 1 − x 2 ) ≥ 1 (Note the X 1 − X 2 is the time spent at the service window).

Answer [ e-1]

d) If 2 minutes elapse between a customer’s arrival at the store and his departure from the

service window, find the probability that he waited in line less than one minute to reach

the window. [Answer ½]

e) Are X1 and X2 independent variables?

Independent (Stochastic) random variables

Definition

Suppose X1 have distribution function f ( x1 ) , X2 have distribution function f ( x 2 ) and X1 and

X2 have joint distribution function f ( x1 , x 2 ) then X1 and X2 are said to be independent iff

f ( x1 , x 2 ) = f ( x1 ) f ( x 2 ) , for every pair of real numbers ( x1 , x 2 )

Note

1. If X1 and X2 are Discrete random variables with joint probability function P(x1 , x 2 ) and

marginal probability functions P1 ( x1 ) and P2 (x 2 ) respectively, then X1 and X2 are

independent iff P( x1 , x 2 ) = P1 ( x1 )P2 ( x 2 ) for all real numbers ( x1 , x 2 )

2. If X1 and X2 are continuous random variables with a joint density function of f ( x1 , x 2 )

and marginal density function of f1 ( x1 ) and f 2 ( x 2 ) ,respectively, then X 1 and X 2 are

independent iff

f ( x1 , x 2 ) = f1 ( x1 ) f 2 ( x 2 )

For all pairs of real numbers ( X 1 , X 2 )

3. If X1 and X2 are not independent, they are said to be dependent.

- 26 -
Question

Random variables X1 and X2 have the joint probability density function.

⎧4x x ,0 ≤ x1 ≤ 1; 0 ≤ x 2 ≤ 1
f ( x1 , x 2 ) = ⎨ 1 2
⎩0 , elsewhere

1. Show that X1 and X2 are independent.

2. Show that f ( x1 , x 2 ) is a valid probability density function.

Solution

1
f1 ( x1 ) = ∫ f (x1 , x2 )dx2
0

1
= ∫ 4x
0
1 x 2 dx 2

1
⎡ xx 2 ⎤
= ⎢4 1 2 )⎥ = 2 x1 ,0 ≤ x1 ≤ 1
⎣ 2 ⎦ 0

Similarly,

1
f 2 (x2 ) = ∫ f ( x1, x2 )dx1
0

= 2x2 ,0 ≤ x2 ≤1

Hence f ( x1 , x 2 ) = f1 ( x1 ) f 2 ( x 2 ) for any real numbers ( x1 , x 2 ) and therefore X1 and X2 are

independent.

Question

⎧2 ,0 ≤ x 2 ≤ x1 ; 0 ≤ x1 ≤ 1
Let f (x1 , x 2 ) = ⎨
⎩0 , elsewhere

Are X1 and X2 independent? (Show)

- 27 -
Question

Determine whether random variables X, Y are independent if

⎧2e -x - y ,0 < x < y < ∞


f ( x, y ) = ⎨
⎩0 , elsewhere

Solution

Question

Determine whether random variables X, Y are independent if

⎧x + y ,0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f ( x, y ) = ⎨
⎩0 , elsewhere

Solution

Expected Value of a function of random variables

Trinomial distribution

Definition

Discrete random variables X1, X2 are said to have the trinomial distribution with positive

parameters n , P1 + P2 such that n is an integer and 0 < P1 + P2 < 1 , if the joint probability

distribution if X1 and X2 satisfies


P1 1 P2 2 (1 - p1 - p2 ) 1 2 , x1 , x2 are non- negativeintegersand that x1 + x2 ≤ n
n! x x n- x - x

P(x1 , x2 ) = ⎨ x1 ! x2 ! (n - x1 - x2 )!
⎪0
⎩ ,elsewhere
Note:

The trinomial distribution is appropriate if in n independent trials

a) On each trial there is a probability P1 of outcome of type O1

b) On each trial there is a probability p2 of an outcome of type O2

c) O1 and O2 are mutually exclusive

- 28 -
Question

Derive the conditional distribution of X1 given X 1 = x2 if X1 and X2 are jointly, trinomial

distribution.

Solution

The marginal distribution of X2 is

n− x2
P2 (x2 ) = ∑ P1 1 P2 2 (1 - p1 - p2 ) 1 2
n! x x n- x - x

x1 =0 x1 ! x2 ! (n - x1 - x2 )!

n− x2
(n - x2 )!
P1 1 (1 - p1 - p2 ) 1 2
n!
= P2 2 ∑
x x n- x - x

x2 ! (n - x2 )! x1 =0 x1 ! (n - x1 - x2 )!

P2 2 (1 - p1 - p2 + p1 ) 2
n!
=
x n- x

x2 ! (n - x2 )!

P2 2 (1 - p2 ) 2
n!
=
x n- x

x2 ! (n - x2 )!

Now

P(x1 , x2 )
P(x1 / x2 ) =
P2 (x2 )

P2 2 P1 1 (1 - p1 - p2 ) 1 2
n! x x n- x - x

x ! x ! (n - x1 - x2 )!
= 1 2
P2 2 (1 - p2 ) 2
n! x n- x

x2 ! (n - x2 )!

(n - x2 )! P1 x (1 - p1 - p 2 )n- x - x
1 1 2

=
x1 ! (n - x1 - x 2 )! (1 - p 2 )n- x 2

x n-x1 -x2
⎛n- x2 ⎞⎛ P1 ⎞ ⎛1- p2 - p1 ⎞
1

= ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟
⎝x1 ⎠⎝1- P2 ⎠ ⎝ 1- p2 ⎠

P1
This is a binomial distribution with parameters n - x2 and
1 - P2

- 29 -
Note

Using similar procedure, the marginal probability distribution function of X1 is

P1 1 (1- p1 ) 1
n!
P1 (x1 ) =
x n- x

x1 ! (n - x1 )!

Since P( x1 , x2 ) ≠ P1 ( x1 )P2 (x2 ) , then X1 and X2 are not independent.

Example

A bag contains three white, two black and four red marbles. Four marbles are drawn at random

with replacement; calculate the probability that the sample contains just one white marble

given that it contains just one red marble.

Example

Discrete random variables X1 and X2 have joint probability distribution function


⎪ x2 −2λ
⎪ λ e ⎧x = 0,1,2,3,4...x2
P(x1 , x2 ) = ⎨ ,⎨ 1
⎪ x1!(x2 − x1 )! ⎩x2 = 0,1,2,3,....
⎪⎩0, otherwise

Find the marginal distribution of X1 and X2 and the conditional distribution of X1 given X2

Solution


λx e−2λ
2

P1 ( x1 ) = ∑ x !(x
x2 = x1 − x1 )!
1 2

λ x e −2 λ
1 ∞
λx 2 − x1
=
x1 !
∑ (x
x 2 = x1 − x1 )!
2

λ x e −2 λ
1

= eλ
x1!

- 30 -
λx e −λ
1

= x 1 = 0 ,1 , 2 ....
x1 !

X1 has a poison distribution with parameter λ

x2
λ x e −2 λ
2

P2 ( x2 ) = ∑
x1 = 0 x1!( x2 − x1 )!

λ x e −2 λ
2 x2
x2!
=
x2!

x1 = 0 x1 ! ( x 2 − x1 )!
1 x11 x 2 − x1

But

x2
⎛ x2 ⎞ x1 x2 − x1
∑ ⎜⎜ x ⎟⎟1 1 = (1 + 1) 2 = 2 x2
x

x1 = 0 ⎝ 1 ⎠

Thus

P2 ( x 2 ) =
(2λ )
x 2
e −2 λ
, x 2 = 0,1,2,3....
x2 !

Which is a poison distribution with parameters 2λ .Are X1 and X2 independent?

P( x1 , x2 )
P(x1 x2 ) =
P2 ( x2 )

λ x e −2 λ
2

x 1 ! ( x 2 − x 1 )!
=
(2 λ )x 2 e − 2 λ
x2!

x2
⎛1⎞ x2 !
=⎜ ⎟
⎝2⎠ x1!( x 2 − x1 )!

⎛ x2 ⎞
x2
⎛1⎞
=⎜ ⎟ ⎜⎜ ⎟⎟
⎝2⎠ ⎝ x1 ⎠

This is a binomial with parameters X2 and 1/2

- 31 -
Bivariate Expectations

Let the random variables X, Y have joint probability mass function/ density function

(PMF/PDF) f ( x , y ) and marginal (PMF/PDF) f (x ) and f ( y ) respectively. Further let

g ( x , y ) be any function of x and Y. the expected value of g ( x , y ) i.e.

E [g ( x , y )] = ∑ ∑ g ( x , y ) f ( x , y ) Discrete case
all x all y

∞ ∞
E [g ( x , y )] = ∫ ∫ g (x , y ) f (x , y ) dydx Continuous case
− ∞− ∞

CASE 1

g (x , y ) = (x − μ x ) (y − μ y ) where μ x = E [X ] and μ y = E [Y ]

[
E [g ( x , y )] = E ( x − μ x ) ( y − μ y ) ]
= ∑∑ ( x − μ x ) ( y − μ y ) f ( x , y ) (Covariance of X, Y)
Y X

= ∑∑ (xy − yμ x − xμ y + μ x μ y ) f ( x , y )
Y X

= ∑∑ xy f ( x , y ) − ∑∑ yμ x f ( x , y ) − ∑∑ xμ y f (x , y ) + ∑∑ μ x μ y f (x , y )
Y X Y X Y X Y X

= ∑∑ xy f (x , y ) − μ x ∑∑ y f ( x , y ) − μ y ∑∑ x f ( x , y ) + μ x μ y ∑∑ f ( x , y )
Y X Y X Y X Y X

= ∑∑ xy f ( x , y ) − μ x ∑ y f ( y ) − μ y ∑ x f ( x ) + μ x μ y
Y X Y X

= ∑∑ xy f ( x , y ) − μ x μ y − μ y μ x + μ x μ y
Y X

= ∑∑ xy f (x , y ) − μ x μ y
Y X

⎛ ⎞⎛ ⎞
= ∑∑ xy f ( x , y ) − ⎜ ∑ x f ( x )⎟ ⎜ ∑ y f ( y )⎟
Y X ⎝ X ⎠⎝ Y ⎠

Often written as COV ( X ,Y ) = E ( XY ) − E ( X )E (Y )

- 32 -
In special cases where X = Y and μ x = μ y the COV ( X ,Y ) = Var ( X ) = E ( X − μ x )
2

If X ,Y are independent then COV ( X ,Y ) = 0 . But converse is NOT TRUE.

COV ( X ,Y ) = 0 doesn' t imply X ,Y are independent .

If Var ( X ) = σ x2 and Var (Y ) = σ y2 , the correlation coefficient between X any Y denoted by

Cov ( X ,Y )
ρ x ,y =
σ xσ y

Expectation and Bivariate Moment Generating Function

Definition

Let g ( X 1 , X 2 , X 3 , X 4 .... X k ) be a function of the random variables X 1 , X 2 , X 3 , X 4 .... X k with

probability distribution function P ( x1 , x 2 , x3 , x 4 ....x k ) . Then the expected value of

g ( X 1 , X 2 , X 3 , X 4 .... X k ) is

E[ g ( X 1 , X 2 , X 3 , X 4 .... X k )] = ∑∑ ...∑ g ( x1 , x2 , x3 , x4 ....xk )P(x1 , x2 , x3 , x4 ....xk )


xk xk −1 x1

If X 1 , X 2 , X 3 , X 4 .... X k are continuous random variables with PDF f ( X 1 , X 2 , X 3 , X 4 .... X k ) ,

then

E[ g ( X 1 , X 2 , X 3 , X 4 ....X k )] = ∫ ∫ ∫ g (x , x , x , x ....x ) f (x , x , x , x ....x )d


1 2 3 4 k 1 2 3 4 k x1 , d x2 .....d xk
xk xk −1 x1

Note

In this unit we deal with k=2. When k≥2, will be dealt with in probability & Statistic IV

Example

Let X1 and X2 have a joint probability distribution function given by

⎧2 x ,0 ≤ x1 ≤ 1,0 ≤ x2 ≤ 1
f ( x1 , x2 ) = ⎨ 1
⎩0, otherwise

- 33 -
Find

1. F ( x1 / x2 )

2. E ( X 1 X 2 )

3. E ( X 1 )

4. Var( X 1 )

5. Var( X 1 X 2 )

Solution

1. Can be done using the previous method

1 1

2. E ( X 1 , X 2 ) = ∫ ∫ x1 , x2 f ( x1 , x2 )d x1 d x2
0 0

1
1 1
⎡ 3⎤
= ∫ ∫ x1 , x 2 (2 x1 )d x , d x = x 2 x1 d
1

1
∫⎢ 22
⎥ x2
0 0
⎣ 3 ⎦ 0
0

1
⎛2⎞
1

2 ⎡ x2 ⎤
2
= ∫ ⎜ ⎟ x2 d x2 = ⎢
3
0⎝ ⎠ 3 ⎥
⎣ ⎦2 0

1
=
3

1 1

3. E ( X 1 ) = g (x1 , x 2 ) = x1 = ∫ ∫ x1 (2 x1 )d x1 , d x2
0 0

1
1⎡ 2 x13 ⎤ 1
2
= ∫⎢ ⎥d x2 = ∫ 3d x2
0
⎢⎣ 3 ⎥⎦ 0
0

1
2
= ∫
0
3
d x 2 =
2
3
x2 ]
1

- 34 -
2
=
3

( )
1 1

4. E X 1 = ∫∫ x1 f ( x1 , x2 )d x1 , d x2
2 2

0 0

1
1 1 ⎡ 2 x14 ⎤
1
1
⎛1⎞ ⎡ ⎤
1

= ∫ ∫ 2 x1 d x1 , d x2
3
= ∫⎢ ⎥d = ∫ ⎜ ⎟d x2 x2
2 =⎢ ⎥
0⎝ ⎠
x2
0 0
⎣ 4 ⎦
0
0
2
⎣ ⎦0
1
=
2

( )
5. Var ( X 1 ) = E x1 − (E ( x1 ))
2 2

2
1 ⎛2⎞
= −⎜ ⎟ =
1
2 ⎝3⎠ 18

6. Var ( X 1 X 2 ) = ?

Question

The random variables X1 and X2 have the joint probability distribution function given by

⎧2(1 − x1 ),0 ≤ x1 ≤ 1; 0 ≤ x2 ≤ 1
f (x1 , x2 ) = ⎨
⎩0, otherwise

Find

1. F (x1 x2 )

2. E ( X 1 X 2 )

3. E ( X 1 )

4. E ( X 2 )

5. Var( X 2 )

- 35 -
Properties of Expected Value of Random Variables

1. Let C be a constant. The E (C ) = C , where g ( X 1 , X 2 , X 3 , X 4 .... X k ) = C .

2. Let g ( X 1 , X 2 ) be a function of the random variables X 1 , X 2 and C be a constant. Then

E [Cg ( X 1 , X 2 )] = CE [g ( X 1 , X 2 )]

3. Let X 1 and X 2 be random variables with joint probability distribution function

of f ( x1 , x 2 ) . Let g1 ( X 1 , X 2 ) ⋅ ⋅ ⋅ g k ( X 1 , X 2 ) be functions of X 1 and X 2 . Then

E [g1 ( X 1 , X 2 ) + g 2 ( X 1 , X 2 ) + ⋅ ⋅ ⋅ + g k ( X 1 , X 2 )] = Eg1 ( X 1 , X 2 ) + Eg 2 ( X 1 , X 2 ) + ⋅ ⋅ ⋅ + Eg k ( X 1 , X 2 )

Covariance of Two Random Variables

Definition

The covariance of X1 and X2 is defined on the expected value of ( X 1 − μ1 )( X 2 − μ 2 ) . In the

notation form,

Cov( X 1 X 2 ) = E[( X 1 − μ1 )( X 2 − μ 2 )]

Or

E ( X 1 , X 2 ) − E ( X 1 )E ( X 2 ) (Show this)Where μ1 = E ( X 1 ) and μ 2 = E ( X 2 )

Note

The larger the value (absolute value) of the covariance of X1 and X2, the greater the linear

dependence between X1 and X2, Positive values of the covariance indicate that X1 increases as

X2 increases. Negative value indicates X1 decreases as x2 increases. A Zero value indicates no

linear dependence between X1 and X2.

Unfortunately it is difficult to use the covariance as a measure of dependence because, its value

depends upon the scale of measurement and therefore it is hard to determine whether a

particular covariance is larger at first glance.

- 36 -
This problem can be eliminated by standardizing its values, using simple coefficient of linear

correlation.

Definition

Let X1 and X2 be two random variables. The correlation coefficient between X1 and X2 is

defined as

Cov( X 1 , X 2 ) δx1 x2
ρ= =
Var( x1 ) Var(x 2 ) σxσx
1 2

Where σ x1 and σ x are the standard derivation of X1 and X2, respectively.


2

Note

1. − 1 ≤ ρ ≤ 1

2. when ρ = ±1 , then all points fall on a straight line

3. When ρ = 0 , then the covariance is zero and therefore no correlation between the two

variables.

4. when ρ > 0 , it implies that X2 increases as X1 increases

5. when ρ < 0 , it implies that X2 decreases as X1 increases

Example

The joint PDF of X1 and X2 is given by

⎧3x , 0 ≤ x2 ≤ x1 ≤ 1
f ( x1 , x2 ) = ⎨ 1
⎩0, elsewhere

Find

Cov( X 1 , X 2 ) , and correlation coefficient ρ 0

Solution

Cov ( X 1 , X 2 ) = E ( X 1 , X 2 ) − E ( X 1 )E ( X 2 )

- 37 -
1 x1

Now E ( X 1 , X 2 ) = ∫ ∫ x1 x2 (3x1 )d x2 d x1
0 0

1 ⎡ x2 2 ⎤
= ∫ 3x1 ⎢ ⎥d
2
x1
0
⎣ 2 ⎦
1

3 ⎡ x1 ⎤ =
5 3
= ⎢ Check
2 5 ⎥ 10
⎣ ⎦ 0

3 3
E(X 1 ) = , E(X 2 ) =
4 8

3 ⎛ 3 ⎞⎛ 3 ⎞
Then Cov( X 1 , X 2 ) = − ⎜ ⎟⎜ ⎟ = 0.02 check
10 ⎝ 4 ⎠⎝ 8 ⎠

Theorem

Let Y1 , Y 2 , Y3 ....Y n and X 1 , X 2 , X 3 .... X m be random variables with E (Yi ) = μ i and

E( X i ) = ε i

Define

n m
U1 = ∑ aiYi , and U 2 =
i =1
∑b X
j =1
i j

For constants a 1 .... a n , b 1 ... b m

Then the following hold

a) E (U 1 ) = ∑ aU i i
i =1

Var (Yi ) + 2 ∑ ∑ a i a j Cov (Yi , Y j ) where the double sum is over


n
b) Var (U 1 ) = ∑a i
2

i =1 i j

all pairs (i, j ) with i < j .

- 38 -
∑ ∑ a b Cov (Y , X )
n m
c) Cov (U 1 , U 2 ) = i j i j
i j =1

Proof

a) Follows from Probability and Statistics II

⎛ n ⎞ n n
E (U 1 ) = E ⎜ ∑ a i Yi ⎟ = ∑ a i E (Yi ) = ∑ aU i i
⎝ i =1 ⎠ i i

b) The variance is defined as

Var (U 1 ) = E (U 1 ) − (E (U 1 ))
2

2
⎛ n n ⎞
= E ⎜⎜ ∑ a iY i − ∑ a iU i ⎟⎟
⎝ j =1 i=1 ⎠
2
⎛ n ⎞
= E ⎜⎜ ∑ a i (Yi − U i )⎟⎟
⎝ j =1 ⎠

⎡ n
= E ⎢ ∑ a i (Y i − U
2
i )2 + ∑∑ aia j (Y i −U i )(Y j −U j )⎤⎥
⎣ j =1 i≠ j ⎦

)(Y j )
n

∑ a i E (Y i − U )2 ∑∑ a i a j E (Y i − U
2
= i + i −U j
j =1 i≠ j

By definition of variance and covariance, we have

Var (U 1 ) = ∑a Var (Y i ) + ∑∑aa Cov (Y i , Y j )


2
i i j
i i≠ j

Note that

Cov (Yi , Y j ) = Cov (Y j , Yi ) and hence, we can write

Var (U 1 ) = ∑a Var (Y i ) + 2 ∑ ∑aa Cov (Y i , Y j )


2
i i j
i i< j

c) Using similar steps as in (b), we have

- 39 -
Cov (U 1 , U 2 ) = E [(U 1 − E (U 1 ))(U 2 − E (U 2 ))]

⎡⎛ n n
⎞⎛ m m ⎞⎤
= E ⎢ ⎜ ∑ a i Y i − ∑ a iU i ⎟ ⎜⎜ ∑ b j X j − ∑ b j ε ⎟
j ⎟⎥
⎣⎢ ⎝ i i =1 ⎠ ⎝ j =1 j =1 ⎠ ⎦⎥

⎧⎪ ⎡ ⎛ n ⎞⎛ m ⎤⎫
= E ⎨ ⎢ ⎜ ∑ a i (Yi − U i )⎟ ⎜⎜ ∑ b j (X j − ε j )⎞⎟⎟ ⎥ ⎪⎬
⎪⎩ ⎣⎢ ⎝ i ⎠ ⎝ j =1 ⎠ ⎦⎥ ⎪⎭

⎡ n m
= E ⎢ ∑ ∑ a i b j (Y i − U i )(X j − ε j )⎤⎥
⎣ i =1 j =1 ⎦

− U i )(X j − ε )
n m
= ∑ ∑ a b E (Y
i =1 j =1
i j i j

∑ ∑ a b Cov (Y , X )
n m
= i j i j
i =1 j =1

Note:

Cov (Yi , X j ) = Var (Y ) i

Bivariate Normal Distribution

In general, multivariate normal density function would be defined for k continuous random

variables X 1 , X 2 , X 3 .... X k . For this unit, we require K=2 (bivariate) which is defined as

⎧ 1 ⎧ Q⎫
⎪ exp⎨− ⎬ , − ∞ < x1 < ∞, − ∞ < x2 < ∞
f (x1 , x2 ) = ⎨2π ,σ1σ 2 1 − ρ2 ⎩ 2 ⎭1
⎪0, elsewhere

⎧ ⎡−∞< x1 < ∞,−∞< x2 < ∞,


⎪ 1 ⎧⎪ 1 ⎡(x −μ )2 2ρ(x −μ )(x −μ ) (x −μ )2 ⎤⎫⎪⎢
− + ⎥⎬⎢−∞< μ1 < ∞,
1 1 1 1 2 2 2 2
⎪ exp⎨ ⎢
f (x1, x2) = ⎨2πσ1σ2 1− ρ2 ⎪1− ρ2 ⎢⎣ σ11 σ σ σ ⎥⎦⎪⎭⎢

⎣−∞< μ1 < ∞,σ1,σ2 > 0 …(*)
11 22 22

⎪0,elsewhere

- 40 -
In matrix form, it can be written as


⎪ 1 ⎧ 1 ⎫
exp⎨− (x −μ)1∑ (x−μ)1⎬, ……………………..…….(**)
−1
f (x) = ⎨ 1

⎪2π ⎩ 2 − −

⎩ ∑ 2

⎡ x1 ⎤
Where x = ⎢ ⎥,−∞ ≤ xi ≤ ∞, i = 1,2
⎣ x2 ⎦

⎡δ 11 δ 21 ⎤ ⎡μ ⎤
∑ Is a 2x2 Variance covariance matrix of X , i.e. ∑ = ⎢δ ⎥
δ 22 ⎦
and μ = ⎢ 1 ⎥
⎣ 12 ⎣μ 2 ⎦

Question

Show that (**) is the same as (*)

Solution

⎡δ 11 δ 21 ⎤
∑ = ⎢δ δ 22 ⎥⎦
⎣ 12

−1
1 ⎡ δ 11 − δ 12 ⎤
∑ =
δ 11δ 22 − δ 12 2
⎢− δ δ 22 ⎥⎦
⎣ 12

But

δ 12
ρ12 = ρ = ⇒ δ 12 = ρ12 δ 11δ 22
δ 11δ 22

Then

−1
1 ⎡ δ 22 − δ12 ⎤
∑ = ⎢
δ11δ 22 (1 − ρ12 2 ) ⎣− δ12 δ11 ⎥⎦

The Standardized square distance becomes

⎡ δ22 −ρ12 δ11δ22 ⎤⎡ x1 −μ1 ⎤


(x−μ)′∑ (x−μ) =(x −μ , x −μ )
−1 1
⎢ ⎥⎢ ⎥
δ11 ⎥⎦⎣x2 −μ2 ⎦
1 1 2 2
δ11δ22 −δ122 ⎢⎣−ρ12 δ11δ22

- 41 -
⎡ ( x1 − μ1 )2 (x2 − μ2 )2
1 (x1 − μ1 ) (x2 − μ2 )2 ⎤
= 2 ⎢
+ + ρ ⎥ …..(***)
1 − ρ12 ⎣⎢ σ11 σ 22
12
σ11 σ 22 ⎦⎥

∑ ( )
= δ11δ 22 − δ12 = δ11δ 22 1 − ρ122 …………………………… (3)
2

Putting (***) and (****) in (**), we get

1 ⎧⎪ 1 ⎡ ( x − μ )2 ( x − μ ) 2 ⎛ ( x1 − μ 1 )2 ⎞⎛ ( x 2 − μ 2 )2 ⎞ ⎤ ⎫⎪
f ( x1 , x 2 ) = exp ⎨− ⎢ 1 1
+ 2 2
− 2 ρ 12 ⎜ ⎟⎜ ⎟⎥ ⎬
(
2π δ 11δ 22 1 − ρ 122 ) (
⎪⎩ 2 1 − ρ 12
2
) ⎢⎣ δ 11 δ 22 ⎜
⎝ δ 11 ⎟⎜
⎠⎝ δ 22 ⎟⎥
⎠ ⎦ ⎪⎭
Note:

If X1 and X2 are uncorrelated, and then the joint PDF of X1 and X2 can be written as the

product of Univariate normal density f ( x1 , x 2 ) = f ( x1 ) f ( x 2 )

Where

1 ⎡ (x1 − μ1 ) ⎛ (x1 − μ1 )( x 2 − μ 2 ) ⎞ (x 2 − μ 2 )2 ⎤
2

Q= ⎢ − 2 ρ12 ⎜⎜ ⎟⎟ + ⎥
(
1 − ρ 2 ⎣ δ12 ) ⎝ δ 1δ 2 ⎠ δ 22 ⎦

The bivariate normal density is a function of five parameters μ1 , μ 2 , δ 1 , δ 2 and ρ . This is


2 2

usually denoted as ( X 1 , X 2 ) ≈ BVN μ1 , μ 2 , δ 1 , δ 2 , ρ ( 2 2


)
Assignment:

(
If ( X 1 , X 2 ) ≈ BVN μ1 , μ2 , δ1 , δ 2 , ρ
2 2
)
( )
Show that, X 1 ≈ N μ1 , δ12 , X 2 ≈ N μ 2 , δ 2 ( 2
)and ρ is the Coefficient of X and X 1 2.

Question

(a) Derive the marginal densities of X1and X2

(b) Find the conditional density function of X1 given X1=X2

- 42 -
Joint Moment Generating Function

The moment generating function, discussed in probability and statistics II can be generalized to

k- dimensional random variables.

Definition

Let X = ( X 1 ,..., X k ) be a vector of k random variables. The MGF of X, if it exists, is defined

as

⎡ ⎛ k ⎞⎤
M X (t ) = E ⎢exp⎜ ∑ t i xi ⎟⎥
⎣ ⎝ i =1 ⎠⎦

Where t = (t1 ,..., t k )

NOTE:

-The bivariate MGF has properties analogous to those of Univariate MGF.

- Mixed moments such as E X ir , X sj ( ) is obtained by differentiating the joint MGF r times

with respect to t i and S times with respect to t j and then setting all t i = t j = 0

- The joint MGF also uniquely determines the joint distribution of variables X 1 ,..., X k

-The MGF of marginal distributors can also be obtained from the joint MGF e.g.

M x (t 1 ) = M x, y (t 1 ,0 )

M y (t 2 ) = M x, y (0, t 2 )

If M x, y (t 1 ,t 2 ) exists, then the Random variables X and Y are independent iff

M x, y (t 1 ,t 2 ) = M x (t 1 )M y (t 2 )

Question ???

Example

Suppose X and Y have density functions f ( x, y ) = λ2 e − λy if 0 < x < y < ∞ . Find the joint mgf.

Solution

- 43 -
∞ y
M x , y (t1 , t 2 ) = E (e t x +t y ) = ∫ ∫ e t x +t y λ2 e −λy dxdy
1 2 1 2

0 0

y y
∞ t1 x ∞ t1 x
e e
= λ2 ∫ e t2 y e −λy dy == λ2 ∫ e − y (λ −t2 ) dy
0
t1 0
t1
0 0


e t1 y − y (λ −t2 ) 1 − y (λ −t2 )
= λ2 ∫ e − e dy
0
t1 t1


λ2 − y ( λ −t1 −t 2 )
= ∫e − e − y (λ −t2 )dy
t1 0


λ2 ⎡ e − y (λ −t −t ) e − y ( λ −t ) ⎤
1 2 2

= ⎢ − ⎥
t1 ⎣ − (λ − t1 − t 2 ) − (λ − t 2 ) ⎦ 0

λ2 ⎡ ⎡⎛ 1 ⎞ ⎛ 1 ⎞⎤⎤
= ⎢(0 − 0) − ⎢⎜⎜ ⎟⎟ − ⎜⎜ ⎟⎟⎥⎥
t1 ⎢⎣ ⎣⎝ − (λ − t1 − t )
2 ⎠ ⎝ − (λ − t )
2 ⎠⎦ ⎥⎦

λ2 ⎡ ⎡⎛ 1 ⎞ ⎛ 1 ⎞⎤ ⎤
= ⎢− ⎢⎜⎜ ⎟⎟ − ⎜⎜ ⎟⎥ ⎥
t1 ⎢⎣ ⎣⎝ − (λ − t1 − t 2 ) ⎠ ⎝ − (λ − t 2 ) ⎟⎠⎦ ⎥⎦

λ2 ⎛ 1 ⎞ ⎛ 1 ⎞
= ⎜⎜ ⎟−⎜ ⎟
t1 ⎝ (λ − t1 − t 2 ) ⎟⎠ ⎜⎝ (λ − t 2 ) ⎟⎠

λ2 ⎛ 1 1 ⎞
= ⎜⎜ − ⎟
t1 ⎝ (λ − t1 − t 2 ) (λ − t 2 ) ⎟⎠

Question

Let f ( x, y ) = ke −2 x −5 y x > 0, y > 0

Find

1. k

2. Find the marginal densities of x and y

- 44 -
3. Find the conditional density of X given y = 6

4. Find E (x y = 1) , E ( y 2 x = 2 )

Question

Let f ( x, y ) = k ( 2x + y ) 0 < x <1, 0 < y < 2

Find

1. k

2. E ( x )

3. E ( y )

4. Var ( x )

5. Var ( y )

6. E ( x, y ) and hence Covariance ( x, y )

7. Correlation coefficient.

Question

A continuous random variable X has the probability density function:

⎧λxe − λx , x>0 ⎫
f(x)= ⎨ ⎬
⎩0 , elsewhere⎭

Find the moment generating function of X. (7 marks)

Solution


( )
M x (t ) = E e tx = ∫ e tx λxe −λx dx
0



− x ( λ −t )
⎡ xe − x (λ −t ) ∞ e − x (λ −t ) ⎤
= λ ∫ xe dx = λ ⎢ −∫ dx ⎥
0 ⎣ − (λ − t ) 0 − (λ − t ) ⎦ 0


⎡ xe − x (λ −t ) e − x (λ −t ) ⎤ ⎡ ⎛ 1 ⎞⎤
= λ⎢ − 2⎥
= λ ⎢ (0 − 0 ) − ⎜ 0 − ⎟⎥
⎣ − (λ − t ) (λ − t ) ⎦ 0 ⎢⎣

⎝ (λ − t )2 ⎟⎠⎦⎥

- 45 -
⎡ ⎛ 1 ⎞⎤ λ
= λ ⎢− ⎜⎜ 0 − ⎟ =
2 ⎟⎥
⎣⎢ ⎝ (λ − t ) ⎠⎦⎥ (λ − t )2

Conditional Expectations

Conditional Mean and Variance

Let X, Y be random variables with JPMF or JPDF, f ( x , y )

E [X Y ] = ∑ x f ( x y ) for x, y discrete
x


=
−∞
∫ x f (x y ) dx for x, y Continuous

E [X Y ] is called the conditional mean of X, given Y

[
Also Var [X Y ] = E X 2 Y − (E [X Y ]) ] 2

2
⎛ ⎞
= ∑ x f (x y ) − ⎜ ∑ x f (x y )⎟
2
for x, y discrete
x ⎝ x ⎠

2

⎛∞ ⎞
= ∫ x f (x y ) dx − ⎜⎜ ∫ x f (x y ) dx ⎟⎟
2
for x, y Continuous
−∞ ⎝ −∞ ⎠

The expression is called the Conditional variance of X given Y.

Theorem 1

E [E ( X Y )] = E [X ]

Proof (for discrete case)

⎡ ⎤
E [E ( X Y )] = ∑ ⎢∑ x f (x y )⎥ f ( y )
y ⎣ x ⎦

= ∑∑ x f (x y ) f ( y )
y x

- 46 -
f (x , y )
= ∑∑ x f ( y ) = ∑∑ x f ( x , y )
y x f (y) y x

= ∑ x∑ f (x , y ) = ∑ x f (x ) = E[X ]
x y x

Theorem 2

E [X E (Y X )] = E [ XY ]

Proof (for discrete case)

Definition

If X1and X2 any two random variables, the conditional expectation of X1 given X 2 = x 2 is

defined as

E [X 1 X 2 = x2 ] = ∑ x1 P (x1 x2 )X 2 = x2 If X1 and X2 are Discrete


allx1

and


E [X 1 X 2 = x ] = ∫ x f (x x )dx
1 1 2 1 , if X1 and X2 are continuous
−∞

Example:

Suppose X1and X2 are random variables with joint PDF given by

⎧1
⎪ , 0 ≤ x1 ≤ x 2 ;0 ≤ x 2 ≤ 2
f ( x1 , x 2 ) = ⎨ 2
⎪⎩0, elsewhere

Find

1. The conditional expectation of X1 given that X2=1.

2. The MGF of X1 and X2

3. The MGF of X1 given X2=1

Solution

First find f ( x 2 )

- 47 -
x2
x2
1 1 ⎤ x2
f ( x2 ) = ∫ dx1 = x1 =
0 2 2 ⎥⎦ 0
2

⎧1
⎪ , 0 ≤ x1 ≤ x 2 ≤ 2
f (x1 x 2 ) = ⎨ x 2
⎪0,
⎩ elsewhere

Then


E (x1 x 2 = 1) = ∫ x f (x1 1 x 2 )dx1
−∞

1
= ∫ x1 (1)dx1
0

x12 ⎤ 1, because x = 1 < 2


= ⎥ =
2
2
2 ⎦⎥ 0

Theorem

If X and Y are jointly distributed random variables and h(x, y) is a function, then

E[h(x, y)] = Ex (E[h(x, y) x])

The theorem says that a joint expectation, such as the one on the left side of the equation, can

be solved by first finding the conditional expectation. E[h(x, y) x], and then finding its

expectation relative to the marginal distribution of X

Theorem

If X and Y are jointly distributed random variables and g ( x ) is a function then

E[g(x)Y x] = g(x)E(Y x)

Example

If ( X , Y ) ≈ MULT (n, P1 , P2 ) find the Cov ( X , Y )

Solution

- 48 -
By straight forward derivation we have (show this)

( X ) ≈ BIN (n, P1 ) , (Y ) ≈ BIN (n, P2 ) and conditional on X = x ,


Y x ≈ BIN (n − x, P ) , where P = P2
1 − p1

Note

E (Y X ) =
(n − x )P2
1 − P1

Using the later theorem

E ( X , Y ) = E (E ( XY X ))

= E [XE (Y X )]

⎡ X (n − X )P2 ⎤
= E⎢ ⎥
⎣ 1 − P1 ⎦

⎡ P ⎤
[ ( )]
= ⎢ 2 ⎥ nE ( X ) − E X 2 …………………………..(*)
⎣1 − P1 ⎦

2
( )
Now E ( X ) = nP1 and E X = Var ( X ) + (nP1 )
2

= nP1 (1 + (n − 1)P1 )

Therefore * becomes,

E ( XY ) = n(n − 1)P1 P2

Thus Cov ( X , Y ) = E ( X , Y ) − E ( X )E (Y )

= n(n − 1)P1 P2 − (nP1 )(nP2 )

= −nP1 P2

- 49 -
Example

( )
If μ1 = E ( X ) , μ 2 = E (Y ) and E Y x is a linear function of x ,

Show that

δ2
E (Y x ) = μ 2 + ρ (x − μ1 ) and E X (Var (Y x )) = δ 2 2 (1 − ρ 2 )
δ1

Solution

Suppose E (Y x ) = ax + b then

μ 2 = E (Y ) = E X (E (Y x )) = E X (ax + b ) = aμ1 + b

Now

Cov ( X , Y ) δ XY δ
a= = 2 = ρ 2 Where δ XY = ρδ 1δ 2 and
Var ( X 1 ) δ1 δ1

δ2
b = E (Y ) − aE ( X ) = μ 2 − ρ μ1
δ1

Then

E (Y x ) = ax + b

δ2 δ
=ρ x + μ 2 − ρ 2 μ1
δ1 δ1

δ2
= μ2 + ρ ( x − μ1 )
δ1

δ XY = E[( X − μ1 )(Y − μ2 )]

= E [( X − μ1 )(Y )] − 0

= E X {E [( X − μ1 )(Y X )]}

= E X [( X − μ1 )E (Y X )] = E X [( X − μ1 )(ax + b )]

- 50 -
= aδ 12

{(
E X [Var (X Y )] = E X E Y 2 X − (E (Y X )) ) 2
}
{( )
= E Y 2 − E X (E (Y X ))
2
}
( ) {
= E Y 2 − (E (Y )) − E X (E (Y X )) − (E (Y ))
2 2 2
}
= Var (Y ) − VarX (E (Y X ))

δ2
= Var (Y ) − VarX [ μ2 + ρ ( X − μ1 )]
δ1

δ 22 2
= Var (Y ) − ρ δ1 2

δ12

(
= δ2 1− ρ 2
2
)
Theorem

( )
Let X, Y be random Variables with E ( X ) = μ1 and E (Y ) = μ 2 if E Y X is a linear function

of x , Show that

E (Y x ) = μ 2 + ρ
δ2
δ1
(
(x − μ1 ) and E X (Var (Y x )) = δ 2 2 1 − ρ 2 )
Proof

( )
If E Y x = ax + b , then

μ 2 = E (Y ) = E X (E (Y x )) = E X (ax + b ) = aμ1 + b and

δ XY = E[( X − μ1 )(Y − μ 2 )]

= E [( X − μ1 )(Y )] − 0

= E X {E [( X − μ1 )(Y X )]}

- 51 -
= E X [( X − μ1 )E (Y X )] = E X [( X − μ1 )(ax + b )]

= aδ 12

Thus

Cov ( X , Y ) δ XY δ
a= = 2 = ρ 2 and
Var ( X 1 ) δ1 δ1

δ2
b = E (Y ) − aE ( X ) = μ 2 − ρ μ
δ1 1

i.e. E (Y x ) = ax + b

δ2 δ
=ρ x + μ 2 − ρ 2 μ1
δ1 δ1

δ2
= μ2 + ρ ( x − μ1 )
δ1

NOTE:

E (Y X ) is sometimes defined to as regression function i.e. E (Y x ) = ax + b for multiple

regression, have analogous expression.

Conditional Distribution for Bivariate Normal Random Variables

Theorem:

(
If ( X 1 , X 2 ) ≈ BVN μ1 , μ 2 , δ 1 , δ 2 , ρ
2 2
)
Then

i) conditional on X 1 = x1 ,

⎛ δ
(
X 2 X 1 = x1 ≈ N ⎜⎜ μ 2 + ρ 2 ( X 1 − μ1 ), δ 2 1 − ρ 2
δ1
2
)⎞⎟⎟
⎝ ⎠

- 52 -
ii) Conditional on X 2 = x2 ,we have

⎛ δ
(
X 1 X 2 = x 2 ≈ N ⎜⎜ μ1 + ρ 1 ( X 2 − μ 2 ), δ 1 1 − ρ 2
δ2
2
)⎞⎟⎟
⎝ ⎠

Show this!!!

Note

δ1
E ( X 1 X 2 ) = μ1 + ρ ( X 2 − μ 2 ) is sometimes referred to as regression function of
δ2

X 1 X 2 Multiple?

Example

Suppose X1, X2 have the bivariate normal distribution with parameters

3
μ1 = μ 2 = 2, δ1 = δ 2 = 2 and ρ = ,
5

Calculate

i. ρ ( X 1 > 4)

(
ii. ρ X 1 > 4 X 2 = 3 )
Solution

i. X1 is distributed as N (2, 2) , the

⎛ X1 − 2 4 − 2 ⎞
ρ ( X 1 > 4) = P⎜ > ⎟
⎝ 2 2 ⎠

= P(Z > 1) = ?

δ1
ii. μ = μ1 + ρ (X 2 − μ2 )
δ2

3⎛ 2⎞
= 2 + ⎜ ⎟(3 − 2 ) = ?
5⎝2⎠

- 53 -
and

δ = δ1 1 − ρ 2

9
= 2 1−
25

= 1.6

Hence the distribution of X1given X2=3, is N 2.6, 1.6


2
( )
⎛ X 1 − 2.6 4 − 2.6 ⎞
ρ (X 1 > 4 X 2 = 3) = P⎜ > X 2 = 3⎟
⎝ 1.6 1.6 ⎠

⎛ 1.4 ⎞
= P⎜ Z > X 2 = 3⎟ = ?
⎝ 1.6 ⎠

Change of Variable Technique (For Continuous Case)

Suppose we are given the joint PDF of X 1 , X 2 … X p and we wish to determine the joint

distribution of Y1 = g 1 (X 1 , X 2 … X p ),⋅ ⋅ ⋅, Yr = g r (X 1 ...X p ) Where r is some integers such that

(1 ≤ r ≤ P ) if r < P , we introduce additional new random variables,


Yr +1 = g r +1 (X 1 , X 2 … X p )

Y p = g p (X 1 , X 2 … X p )

Then we find the joint distribution of Y1 , Y2 ⋅ ⋅ ⋅ Y p and finally find the marginal distribution of

Y1 , Y2 ⋅ ⋅ ⋅ Yr . The possible introduction of additional random variables makes use of the

following transformation

Y1 = g 1 (X 1 , X 2 … X p ),Y2 = g 2 (X 1 , X 2 … X p )⋅ ⋅ ⋅ Y p = g p (X 1 , X 2 … X p ) ,

whose solution can be written as,

X 1 = w1 (Y1 , Y2 …Y p ),⋅ ⋅ ⋅, X p = w p (Y1 , Y2 …Y p )

- 54 -
The Jacobian of the transformation becomes

⎡ δX 1 δX 1 δX 1 ⎤
⎢ ..... ⎥
⎢ δY1 δY2 δY p ⎥
⎢ . . . ⎥
J =⎢
⎢ . . . ⎥⎥
⎢ δX p δX p δX p ⎥
⎢ δY ......
⎣ 1 δY2 δY p ⎥⎦

The joint PDF of Y1 , Y2 ⋅ ⋅ ⋅ Y p is given by

⎧⎪ f (w1 (Y ), w2 (Y ),...w p (Y ) ) J , Y ∈ R p
g (Y1 , Y2 ,......, Y p ) = ⎨
⎪⎩0, elsewhere

⎡ y1 ⎤
⎢ ⎥
⎢ y2 ⎥
Where y = ⎢. ⎥
⎢ ⎥
⎢. ⎥
⎢y ⎥
⎣ p⎦

Note

In this, unit, P=2 (Bivariate and therefore, we have

⎧ f (w1 (Y1 , Y2 ), w2 (Y1 , Y2 )) J , Y1 ∈ R, Y2 ∈ R


g (Y1 , Y2 ,⋅ ⋅ ⋅, Y p ) = ⎨
⎩0, elsewhere

Example

Suppose the joint distribution of X1and X2 is given by

⎧2(1 − x1 ), 0 ≤ x1 ≤ 1; 0 ≤ x 2 ≤ 1
f ( x1 , x 2 ) = ⎨
⎩0, elsewhere

i) Find the density function of the variable U = X 1 X 2 .

ii) Hence or otherwise find E (U ) and Var (U )

- 55 -
Solution

U = X 1 X 2 , let V = X 1 , then X 1 = V and X 2 = U


V

The Jacobian of transformation is

dX 1 dX 1
J = dv du
dX 2 dX 2
dv du

1 0 1
= 1 1 =
− 2 V
V V

The joint PDF of V and U is

⎧ f (w1 (V ,U ), w2 (V ,U )) J ; V ,U ∈ R
f (V ,U ) = ⎨
⎩0, elsewhere

⎧ 1
⎪2(1 − V ) , 0 ≤ V ≤ 1; 0 ≤ U ≤ V , or 0 ≤ U ≤ V ≤ 1
f (V , U ) = ⎨ V
⎪⎩0, elsewhere

The PDF of U is

α
fU (U ) = ∫α f (v, u )dv

1 1
1 ⎛1 ⎞
= ∫ 2(1 − v ) dv == 2 ∫ ⎜ − 1⎟dv
u⎝
u
v v ⎠

= 2⎛⎜
⎝ [ln v −v] ⎞⎟⎠ = 2{ln 1 − 1 − (ln u − u )}
1

⎧2{u − ln u − 1}, 0 ≤ u ≤ 1
=⎨
⎩0, elsewhere

- 56 -
α
E (U ) = ∫ uf u (u )du
−α

α
= ∫ 2u (u − ln u − 1)du
−α

⎧1 1 1

= 2⎨∫ u 2 du − ∫ u (ln u )du − ∫ udu ⎬
⎩0 0 0 ⎭

⎧⎪ u 3 ⎤ 1 1 u 2 ⎤ ⎫⎪
1

= 2⎨ ⎥ − ∫ u (ln u )du − ⎥ ⎬
⎪⎩ 3 ⎦ 0 0 2 ⎦0 ⎪

By using integration by parts, the middle integral becomes,

1
1
⎡u 2 ⎤ 1 u2 1
∫0 u(ln u )du = ⎢⎣ 2 (ln u )⎥⎦ − ∫0 2 . u du
0

1
⎡ u2 ⎤ 1
= ⎢0 − ⎥ = −
⎣ 4 ⎦0 4

Thus

⎡1 ⎛ 1 ⎞ 1 ⎤ ⎡1⎤ 1
E (U ) = 2⎢ − ⎜ − ⎟ − ⎥ = 2⎢ ⎥ =
⎣3 ⎝ 4 ⎠ 2 ⎦ ⎣12 ⎦ 6

Question

Let X1and X2 have a joint PDF given by

⎧2 x , 0 ≤ x1 ≤ 1; 0 ≤ x 2 ≤ 1
f (x1 , x 2 ) = ⎨ 1
⎩0, elsewhere

Find the PDF of Y1 = X 1 X 2

Hence or otherwise, find E (Y1 )

- 57 -
Question

Suppose X1and X2 have a joint PDF given by

⎧e − ( x1 + x2 ) ,0 ≤ x1 ,0 ≤ x 2
f ( x1 , x 2 ) = ⎨
⎩0 elsewhere

Find the PDF of Y1 = X 1 + X 2

T –Distribution

Recall from SMA 2230

If X ≈ N μ , σ(2
)
X −μ
Then Z =
σ

If X , is the mean of the sample X 1 , X 2 ,⋅ ⋅ ⋅ X n drawn randomly for normal population with mean

μ and variance σ 2 then

X −μ
Z=
σ
n

If the variance of the population is unknown and n is large, then we replace σ by S,

S =∑
(X − X ) 2
X −μ
i.e. Z = S
n −1
n

If n is small and σ is unknown then we have a t- distributed random variable with n-1 degrees

X −μ
of freedom, i.e. t = S
n

Note

If X 1 , X 2 ,⋅ ⋅ ⋅ X n is a random sample from normal population with E ( X i ) = μ

- 58 -
Then to test the hypothesis

H 0 : μ = μ 0 → (specificValue )
1.
H 1 : μ > μ 0 → (Upper − tail )

Or

H 0 : μ = μ 0 → (specificValue )
2.
H 1 : μ < μ 0 → (lower − tail )

Or

H 0 : μ = μ 0 → (specific Value )
3.
H 1 : μ ≠ μ 0 → (Use − 2 − tailed )

We calculate

X −μ
t= and
S
n

1. Reject H0 in (1.) if t > tα ,n −1

2. Reject H0 in 2. if t < −tα ,n and

3. Reject H0 in 3. If t > tα / 2 (two- tailed rejection region)

Comparing Means of Two Normal Population

Suppose that independent random samples are selected from each of two normal populations ;

X 11 , X 12 , X 13 ... X 1n1 ,from the first and X 21 , X 22 , X 23 ... X 2 n2 from the second ,where the

mean and variance of the i th population are μi and δ i 2 ,i=1,2. Further assume that Xi

2
and S i , i=1, 2 are the corresponding sample means and variances.

- 59 -
1 n1
1 n2
X1 =
n1
∑ X 1i and X 2 =
i n2
∑X
i
2i

The unbiased estimation of the variance is obtained by pooling the sample data to obtain

∑ (X ) ( )
n1 n2
− X 1 + ∑ X 2i − X 1
2 2
1i
S2 = i i

n1 + n2 − 2

=
(n1 − 1)S1 + (n2 − 1)S 2
2 2

n1 + n2 − 2

The test statistic in this case is given by

t=
(X 1 )
− X 2 − (μ1 − μ 2 )
1 1
S +
n1 n2

This has a student’s distribution with n1 + n2 − 2 d.f

To test the null hypothesis H 0 : μ1 − μ 2 = D0 for some fixed value Do, it follows that if Ho is

fine, then the test statistic is

t=
(X 1 − X 2 − (D0 ) )
1 1
S +
n1 n2

Has a t distribution with n1 + n2 − 2 degrees of freedom

Example

In an experiment to test two procedures, the following information was obtained

Standard procedure

n1 = 9 n2 = 9
X 1 = 35.22 sec onds X 2 = 31.56 sec onds

∑ (X ) ∑ (X )
9 9
2 2
1i −X = 195.56 2i − X2 = 160.22
i =1 i =1

- 60 -
Test the hypothesis that the two populations have the same mean.

Take α = 0.05 Level of significance.

Solution

H 0 : μ 1 − μ 2 = 0 , against
H 1 : μ1 − μ 2 ≠ 0

The test statistic is

t=
(X 1 )
− X 2 − (D0 ) D
, 0 =0
1 1
S +
n1 n2

Now

n1 n2

∑ ( X 1i − X 1 ) 2 + ∑ ( X 2i − X 1 ) 2
S2 = i i
n1 + n2 − 2

S2 =
(n1 − 1)S12 + (n2 − 1)S 2 2
n1 + n2 − 2

=
(195.56) + (160.22 ) = 22.24
9+9−2

⇒t=
(X− X2
1
=
)
35.22 − 31.56
= 1.65
1 1 ⎛ 1 1⎞
S + 4.71 ⎜ + ⎟
n1 n2 ⎝9 9⎠

The tabulated t value is t 0 .025 ,16 = 2 . 120 since t calculated =1.65< t 0 .025 ,16 = 2 . 120 we do not

reject Ho this implies there is not sufficient to indicate a difference in the two procedures.

Question

The strength of concrete depends, to some extent, on the method used for drying. Two different

drying methods showed the following results for independently tested specimens

- 61 -
Method 1 Method 2
N1 = 7 N 2 = 10
X 1 = 3250 X 2 = 3240
S 1 = 210 S 2 = 190

1. Do the methods appear to produce concrete with different mean strengths?

Use α = 0.05 .

2. does method 1 produce stronger concrete than method 2

Paired T-Test

It is Based on the differences, procedures same as for univariate t.

Example

An industry in deciding whether to purchase a machine of design A or B, checks the time for

completing a certain task on each machine. Nine technicians were used in the experiment, with

each technician using both machines A and machine B in a randomized order. The time (in

seconds) to completion of the task are given in the table below

Technicians 1 2 3 4 5 6 7 8 9

A 327.6 327.7 327.7 327.9 327.4 327.7 327.8 327.8 327.4

B 327.6 327.7 327.6 327.8 327.4 327.6 327.8 327.7 327.3

Test if there is a significant difference between the completion times at the 5% significance

level.

Solution

In paired t-test, we use the differences,

- 62 -
Sample 1 2 3 4 5 6 7 8 9

di=xa-xb 0 0 0.1 0.1 0 0.1 0 0.1 0.1

Now the hypothesis is

Ho: μ d = 0 verses H1: μ d ≠ 0

If the differences are normally distributed, the test statistic is


d − μd
td =
sd / n

Where d =

∑ di
=
0 .5
= 0 . 056
n 9

n
(
⎛ di − d 2 ) ⎞⎟ = 0 . 002778
sd
2
= ∑ ⎜⎜ ⎟
i =1 ⎝ n −1 ⎠

Then

0.056
td = = 3.17
0.053 / 9

The tabulated value (two- tailed)

t0.025 ,8 = 2.306

Because t-calculated 3.17 > t 0.025 ,8 = 2.306 ,

We reject the null hypothesis the μ A − μB = 0 .

Conclusion:

The two machines have different mean responses.

Question:

Test if machine A takes longer than machine B

- 63 -
Question

Consider an experiment to test the effects of a particular drug on human pulse rate. Six subjects

are chosen and their pulse rates measured both before and after the treatment, with the

following results.

Subjects 1 2 3 4 5 6

Before 73 69 70 64 69 66

After 78 73 70 69 68 72

Do the pulse rates taken after the stimulus differ significantly from those taken before it?

Take α = 0.05 .

Solution

Hypothesis to be tasted is H 0 : μ d = 0 vs. H1 : μ d ≠ 0 , where μ d is the difference between the

two pulses.

xd 5 4 0 5 -1 6

1 6 (5 + 4 + ... + 6) = 3.17
xd = ∑
6 1
xi =
6

2
sd =
1 6

5 1 5
[ ]
(xi − x )2 = 1 (5 − 3.17 )2 + ⋅ ⋅ ⋅ + (6 − 3.17 )2 = 2.932

s d = 2.93

The test statistic is t =


(x d − 0) 3.17
= = 2.64
sd / n 2.93 / 6

The tabulated value, t5 ,0.025 = 2.571 since t=2.64> t5 ,0.025 = 2.571 reject H 0 .

Conclusion:

- 64 -
The effect of the treatment on pulse rates is significant. It is reasonable to conclude that there

has been an increase in pulse rate after taking the drug.

Chi- Square Distribution

Assume that we have a random sample X 1 , X 2 , X 3 .... X n from a normal distribution with

unknown mean of μ and unknown variance of σ . Suppose we wish to test


2

H 0 : δ 2 = δ 0 For some fixed value δ 0 2 , verses H 1 : δ ≠ δ 0 .


2 2 2

Then under H 0 ,

X2 =
(n − 1)S 2 has a χ 2 distribution with n − 1 degree of freedom.
δ 02

Suppose X 1 , X 2 , X 3 .... X n is a random sample from a normal distribution with E( X i ) = μ

and Var ( X i ) = σ 2 , to test the hypothesis

H 0 : σ 2 = δ 0 against
2
i.
H1 : σ 2 > δ 0 (upper − tail )
2

Or

H 0 : δ 2 = δ 0 against
2

ii.
H1 : δ 2 < δ 0 (lower − tail )
2

Or

H 0 : δ 2 = δ 0 against
2

iii.
H1 : δ 2 ≠ δ 0
2

Calculate the test statistic

X2 =
(n − 1)S 2
δ 02

- 65 -
For (i) reject H 0 if X calculated >
2
χα2 , n −1

For (ii) reject H 0 if X calculated < χ1−α , n −1 (lower tail)


2 2

For (iii) reject H 0 if X calculated < χ α , n −1 or X <


2 2 2
χ 2 α ,n −1
1−
2 2

Example

A machine engine part produced by a company is claimed to have diameter variance no larger

than 0.0002(diameter measured in inches). A random sample of 10 parts gave a sample

variance of 0.0003. Test, at the 5% level, H 0 : σ 2 = 0.0002 against H1 : σ 2 > 0.0002 .

Solution

Assume the measured diameters are normally distributed. The test statistic

(n − 1) s 2
X2 =
δ 02

9(0.0003)
= = 13.5
0.002

The tabulated value is χ 2 0.05,9 = 16.919 since

X 2 calculated < χ 2 0.05 , 9 = 16 .919 We do not reject H 0 .

Conclusion:

There is no enough evidence to indicate that σ 2 exceeds 0.0002.

Question

An experimenter was convinced that his measuring equipment possessed variability, which

resulted in a standard deviation of 2. Sixteen measurements resulted in a value of S 2 = 6.1 . Do

the data disagree with his claim? Take a = 0.01 . What would you conclude if you

choose a = 0.05 .

- 66 -
F-Test

We may be interested in comparing the variance of two normal distributions, and testing to

determine whether or not they are equal. This problem is often encountered when comparing

the precision of two measuring instruments, the variation in quality characteristics of

manufactured product, or the variation in scores for two testing procedures. Example, suppose

x11 , x12 , x13 ....x1n1 and x21 , x22 , x23 ....x2 n2 are independent random samples from normal

distributions with unknown means and var( X 1i ) = σ 12 , var( X 2 i ) = σ 2 2 , where σ 1 and σ 2 are
2 2

unknown. Suppose we are interested to test the null hypothesis, H 0 : σ 1 = σ 2 against the
2 2

H1 : σ 1 > σ 2
2 2
now, σ 1 and σ 2 can be estimated by S1
2 2 2
alternative hypothesis and

2
S 2 respectively

2 2
We would reject H 0 in favour of H1 if S1 is much larger than S 2 i.e. reject H 0 if

(n1 − 1 )s1 2
δ 1 2 (n1 − 1 ) ……………..***
F =
(n 2 − 1 )s 2 2
δ 2 2 (n 2 − 1 )

s1 δ 2
2 2
= >k
s2 δ1
2 2

Where k depends upon the probability distribution of the statistic

2
s1
= 2
s2

(n1 − 1)s1 2 (n2 − 1)s2 2


Note that and are independent chi-square random variables.
δ12 δ 22

Therefore *** has an F- distribution with n1 − 1 numerator degrees of freedom and n2 − 1

- 67 -
2
s
Denominator degrees of freedom. Under the null hypotheses δ1 = δ 2
2 2
, then F = 1 2 has an
s2

F-distribution with n1 − 1 Numerator d.f and we have n2 − 1 denominator d.f.

The rejection region becomes F > Fα , n1 −1, n 2 − 2 i.e. k = Fn1 −1, n 2 − 2 ,α

Example

Consider two random samples, X1 and X2 of sizes 10 and 20 with sample variances given as

0.0003 and 0.0001 respectively. Assuming that the populations, from which the samples have

been drawn, are normal, determine whether the variance of the first population is significantly

greater than the second one. Take a = 0.05 .

Solution

Let δ 12 and δ 2 denote the variances for the first and second population from which the
2

samples were taken. Then the hypothesis to be tested is H 0 : δ1 = δ 2 against H1 : δ1 > δ 2


2 2 2 2

The test statistic is

2
s1
F= 2 , based on V1=9 and V2=19 d.f
s2

Now

2
s1 0.0003
F= 2
= =3
s2 0.0001

The tabulated value is F9,19 , 0.05 = 2.42

Since F-calculated > F9,19, 0.05 = 2.42 we reject the null hypothesis.

Conclusion

The variation of the first population is greater than the second one.

- 68 -
Note:

1. If X1,X2,X3…Xn is a random sample of size n from a normal distribution with mean of

µ and a variance of d2 , then

1
X = ∑ xi is normally distributed with a mean of µ and a
n i

δ2
variance of .
n

Show that……………………………

Xi − μ
2. Suppose X1, X2, X3…Xn is as defined in 1. then Z i = are independent standard
δ
2
⎛ Xi − μ ⎞
n n
normal random variables i=1,2, ……n and ∑ Z i = ∑ ⎜
2
⎟ , is a chi-square
i i ⎝ δ ⎠

distribution with n d.f

3. Let Z be a standard normal random variable and let χ v2 be a chi-square random variable

with v d.f. Then if Z and X2 are independent

Z , is a t-distribution with v d.f.


T=
χ2
v

4. Let χ 1 and χ 2 be chi-square random variables with V1 and V2 d.f, respectively. Then
2 2

if χ12 and χ 22 are independent,

χ12 / V1
F= 2 , is said to have an f distribution with V1 numerator
χ 2 / V2

d.f and V2 denominator d.f

- 69 -
Question

Consider two random samples X1 and X2 of sizes 9 and 5 with sample variances 115 and 24

respectively.

Assuming that the populations, from which the samples have been drawn, are normal,

determine whether the samples could have come from a population with a common variance.

Question

Eight students took two complete sciences practical in successive weeks, and obtained the

following marks out of 20.

Students First Practical Second Practical

1 12 11

2 12 11

3 13 15

4 10 11

5 12 12

6 14 10

7 13 14

8 10 12

Assuming that the marks are normally distributed, carry out a paired sample t-test to determine

whether there is a significant difference between performance in the first and second practical.

Question

For the random variables X and Y, the covariance a matrix is

- 70 -
⎡ 25 − 12⎤
∑ = ⎢− 12 16 ⎥⎦

Determine the standard deviation of X and of Y. Also the correlation coefficient between x and

Ordered Statistics

Let x 1 , x 2 , x 3 .... x n Denote independent continuous random variables with distribution

function F ( x) and density function f ( x) .

Denote the ordered random variables Xi by x(1) , x( 2) , x(3) ....x( n ) Where

x(1) ≤ x( 2) ≤ x(3) ≤ .... ≤ x( n ) (for continuous random variables, equality signs can be

ignored)

i.e.

x ( 1 ) = Min (x 1 , x 2 , x 3 .... x n )

is the minimum of Xi’s and x ( n ) = Max (x 1 , x 2 , x 3 .... x n ) the maximum of the Xi’s

Question

Find the probability density function for x ( n )

Solution

Because x ( n ) is the maximum of X 1 , X 2 , X 3 .... X n the event

(X (n) ≤ x) will occur, iff the event ( X i ≤ x) occur, for every

i = 1 , 2 , 3 .... n i.e.

P(X (n) ≤ x) = P ( X 1 ≤ x, X 2 ≤ x, X 3 ≤ x ,...... X n ≤ x)

Since Xi’s are independent and P ( X i ≤ x) = F ( x ) for i = 1 , 2 , 3 .... n , it follows

That

- 71 -
P(X (n) ≤ x) = P ( X 1 ≤ x)P ( X 2 ≤ x)P ( X 3 ≤ x )...... P ( X n ≤ x)

= [F (x)] n

X dF ( x ) , then taking derivatives on both


Let f (x) denote the density of
n (n) and f ( x) =
dx

sides, we get

f n ( x) = n[F ( x ) ]
n −1
f ( x)

Question

Find the density function for Y (1 )

Solution

Because X (1 ) is the minimum of X 1 , X 2 , X 3 .... X n Then the events

(X (1 ) > x) occur iff the events ( X i > x ) occur for i = 1 , 2 , 3 .... n .Because

the Xi are independent and

P ( X i > x ) = 1 − P ( X i ≤ x ) = 1 − F ( x ) , for i = 1 , 2 , 3 .... n then

P(X (1 ) ≤ x) = 1 − P ( X 1 > x, X 2 > x, X 3 > x ,......, X n > x)

= 1 − P ( X 1 > x)P ( X 2 > x)P ( X 3 > x )...... P ( X n > x)

= 1 − [1 − F ( x )][1 − F ( x )][1 − F ( x )]......... [1 − F ( x )]

= 1 − [1 − F ( x )]
n

Let f1 ( x ) denote the density of X (1) , then differentiating both sides, we get

f1 ( x ) = n[1 − f ( x ) ]
n−1
f ( x)

- 72 -
Example:

A computer component has length of life X, measured in hours, with probability function

⎧⎛ 1 ⎞ 100 −x

⎪⎜ ⎟ e .......... ..., x > 0


f ( x ) = ⎨⎝ 100 ⎠
⎪0.......... .......... ......., elsewhere

Suppose that two such components operate in parallel, i.e. the computer does not fail until both

components fail.

Find

1. The density function of Y, the length of life of the computer

2. the median of Y

Solution

Now Y = Max ( X 1 , X 2 ) and f Y ( x ) = n[F ( x ) ] f ( x )


n −1

−x
1 100
But f ( x) = e
100
x −t
1
⇒ F ( x) = ∫0 100 l 100 dt
x
−t

= −l 100

⎦0

−x

= 1− l 100

The density function of Y is

f Y ( x ) = n[F ( x )]
n −1
f ( x)

⎡ −x
⎤ 1 100−x
= 2 ⎢1 − e ⎥
100
e ,x > 0
⎣ ⎦ 100

- 73 -
⎧⎛ 1 ⎞⎛ − x −x

⎪⎜ ⎟⎜⎜ e − e 50 ⎟⎟, x > 0
100
= ⎨⎝ 50 ⎠⎝ ⎠

⎩0.........................elsewhere

3. To find the median, we first find the distribution of Y.

1 ⎛⎜ −100 ⎞
x t t

FY ( x ) = ∫ ⎜ e − e 50 ⎟
⎟ dt
0
50 ⎝ ⎠

1 ⎧⎪ ⎛⎜ −100 ⎞ ⎫⎪
x t x t

= ⎨∫ ⎜ e dt − ∫ e 50
dt ⎟⎬
⎟⎪
50 ⎪⎩ 0 ⎝ 0 ⎠⎭

⎧ ⎤
x
⎡ − ⎤ ⎪
x

1 ⎪
t t

= ⎨− 100 e 100
⎥ − ⎢ − 50 e ⎥ ⎬
50
50 ⎪ ⎦ 0 ⎢⎣ ⎦ 0 ⎪⎭

1 ⎪


x ⎡ −
x ⎤ ⎫⎪
= ⎨ ( −100 e 100 − ( −100 e ) − ⎢ − 50 e 50 − ( −50 e ) ⎥ ⎬
0 0
50 ⎪ ⎢ ⎥⎪
⎩ ⎣ ⎦⎭

1 ⎧⎪ −
x

x ⎫⎪
= ⎨100 −100 e 100 + 50 e − 50 ⎬
50
50 ⎪⎩ ⎪⎭

1 ⎧⎪ −
x

x ⎫

= ⎨ 50 + 50 e −100 e
50 100 ⎬
50 ⎪⎩ ⎪⎭

x x
− −
= 1+ e 50
− 2e 100

Thus f(x) =….

Let x0.5 be the x>0 median, then

x 0.5 x 0.5
− −
0 .5 = 1 + e 50
− 2e 100

x0.5

Let m = e 100

Then 0.5 = 1 + m 2 − 2m

- 74 -
= m 2 − 2m + 0.5 = 0

b 2 − 4ac
m = −b ±
2a

2 2 − 4 (1 )(0 . 5 )
= +2 ±
2

2
= +2 ± = 1 . 707 or 0 . 293
2

x 0 .5

Now m = e 100
= 0.293

x0.5 = 122.758

Question

A computer component has length of life X, measured in hours, with probability density

function,

⎧⎛ 1 ⎞ −100 x

⎪⎜ ⎟ e .......... ..., x > 0


f ( x ) = ⎨⎝ 100 ⎠
⎪0.......... .......... ......., elsewhere

Suppose that two such components operate independently and in series a certain system i.e.

the system fails when either components fail

Find

1. The density function for Y, the length of Life of the system

2. the median of Y

Solution

Because the system fails at the first component failure Y = Min ( X 1 , X 2 ) Where X 1 and

X 2 are independent random variables with given density. Then, become

f Y ( x ) = n[1 − F ( x )]
n −1
f ( x)

- 75 -
n −1
⎡ −
x
⎤ 1 −100
x
= 2 ⎢1 − (1 − e 100 ) ⎥ e
⎣ ⎦ 100

⎧⎛ 1 − x ⎞
⎪⎜ e 50 ⎟⎟, x > 0
= ⎨⎜⎝ 50 ⎠

⎩0............., elsewhere

The median of Y,

Revision questions for Probability and Statistics III

1. Let Y1 and Y2 be random variables with mean and Variance (μ , σ ) and (μ ,σ )


1 1
2
2 2
2

respectively, and the correlation coefficient ρ . Show that

E Y2 (var (Y1 Y2 )) = σ 12 1 − ρ 2 . ( )
Solution

{(
EY2 (var (Y1 Y2 )) = EY2 E Y12 Y2 − (E (Y1 Y2 )) ) 2
}
{( )
= E Y12 − EY2 [E (Y1 Y2 )]
2
}
( ) {
= E Y12 − (E (Y1 )) − EY2 [E (Y1 Y2 )] − EY2 (E (Y1 Y2 ))
2 2
}
= Var (Y1 ) − VarY2 [E (Y1 Y2 )]

⎧ δ ⎫
= Var (Y1 ) − VarY2 ⎨μ1 + ρ 1 (Y2 − μ 2 )⎬
⎩ δ2 ⎭

δ 12δ 22
= Var (Y1 ) − ρ 2

δ2

(
= σ 12 1 − ρ 2 )

- 76 -
2. Suppose Y1 and Y2 have bivariate normal distribution with parameters

(μ1 = μ 2 = 2, σ 1 = σ 2 = 4 ) and ρ = 3 . Calculate P(Y 1 > 4 Y2 = 3)


4

Solution

The conditional mean is given by

δ1
μ = μ1 + ρ (Y2 − μ 2 )
δ2

3 4
= 2+ ⋅ (3 − 2)
4 4

= 2.75

The conditional variance is

σ 2 = σ 12 (1 − ρ 2 )

⎛ 9⎞
= 16⎜1 − ⎟
⎝ 16 ⎠

=7

Hence, (Y1 Y2 = 3) ~ N (2.75, 7 )

⎛Y − μ 4− μ ⎞
P(Y1 > 4 Y2 = 3) = P⎜ 1 > ⎟
⎝ σ σ ⎠

⎛ 4 − 2.75 ⎞
= P⎜⎜ Z > ⎟⎟
⎝ 7 ⎠

= P (Z > 0.472 )

= 0.3192

3. Discrete random variables Y1 and Y2 have the joint pdf.

- 77 -
⎧ λ y 2 e −2 λ
⎪ , y1 = 0,1,2... y 2
⎪⎪ y1 ! ( y 2 − y 1 )!
p (Y1 , Y2 ) = ⎨ y 2 = 0,1,2,....
⎪0, Otherwise

⎪⎩

Find the conditional distribution of Y1 given Y2

Solution

P (Y1 = y1 , Y2 = y 2 )
P (Y1 Y2 = y 2 ) =
P (Y2 = y 2 )

P (Y2 = y 2 ) = ∑ P( y , y ) 1 2
all Y1

y2
λ y e −2 λ
2

= ∑ y !( y
y1 = 0 − y1 )!
1 2

λ y e −2 λ
2 y2
y2!
= ∑ y !( y
y2! y1 = 0 1 2 − y 1 )!

y2
⎛ y2 ⎞
but ∑ ⎜⎜ y ⎟⎟ = (1 + 1)y 2 = 2 y 2
y1 = 0⎝ 1 ⎠

P(y2 ) =
(2λ )
y
e −2 λ
2

, y 2 = 0 ,1, 2 ...
thus
y2!

λ y e −2 λ
2

P (Y1 , Y2 ) y1!( y 2 − y1 )!
P (Y1 Y2 ) = =
P (Y2 ) (2λ ) y2 e −2 λ
y2!

y2
⎛1⎞ y2!
=⎜ ⎟
⎝2⎠ y1!( y 2 − y1 )!

- 78 -
y2 − y1
⎛ y2 ⎞ ⎛ y2 ⎞⎛ 1 ⎞ 2 ⎛ 1 ⎞
y2 y
⎛1⎞
=⎜ ⎟ ⎜⎜ ⎟⎟ = ⎜⎜ ⎟⎟⎜ ⎟ ⎜ ⎟
⎝2⎠ ⎝ y1 ⎠ ⎝ y1 ⎠⎝ 2 ⎠ ⎝ 2 ⎠

1
This is a binomial probability distribution with parameters Y2 and .
2

4. Suppose Y1 and Y2 are independent and exponentially distributed random variables with

2 parameters.

⎧⎪ 1 e − 2 ( y1 + y2 ) y1≥0 , y2 ≥0
1

( )
f y1 , y 2 = ⎨ 4
⎪⎩0 elsewhere

Find the joint PDF of U = Y1 − Y2 and W = Y1 + Y2

Solution

The solutions for Y1 and Y2 in terms of U and W are

1
Y1 = (U + W )
2

1
Y2 = (W − U ) = − 1 (U − W )
2 2

The Jacobian of Transformation is

∂Y1 ∂Y1 1 1
J = ∂U ∂W = 2 2 =1
∂Y2 ∂Y2 1 1 2

∂U ∂W 2 2

The joint PDF of U and W is

f (U , W ) = f (w1 (U , W ), w2 (U , W )) ⋅ J

⎧ 1 − 12 w ⎛ 1 ⎞
⎪ e ⎜ ⎟
= ⎨4 ⎝2⎠
⎪0,
⎩ otherwise

- 79 -
⎧ 1 − 12 w
⎪ e , 0 ≤W ≤U
= ⎨8 .
⎪0,
⎩ otherwise

5. Suppose X 1 and X 2 have a joint PDF given by

⎧e− ( x1 + x2 ) , 0 ≤ x1; 0 ≤ x2
f ( x1 , x2 ) = ⎨
⎩0, elsewhere

Find the PDF of Y1 = X 1 + X 2

Solution

Let Y2 = X 2

Therefore X 1 = Y1 − X 2 = Y1 − Y2

The Jacobian of transformation is

∂X 1 ∂X 1
∂Y1 ∂Y2 1 −1
J= = =1
∂X 2 ∂X 2 0 1
∂Y1 ∂Y2

⎧ f (w1 (Y1 , Y2 ), w2 (Y1 , Y2 )) ⋅ J


f (Y1 , Y2 ) = ⎨
⎩0 elsewhere

⎧e − (Y1 −Y2 +Y2 ) ⋅ 1 ( )


− Y1
=⎨ = e
⎩0 elsewhere

Limits 0 ≤ x1 ; 0 ≤ x2

0 ≤ Y1 − Y2 ; 0 ≤ Y2

Y2 ≤ Y1 Hence 0 ≤ Y2 ≤ Y1

- 80 -
Y1

The PDF of g (Y1 ) = ∫ g (Y1 , Y2 )dY2


0

Y1

= ∫ e −Y1 dY2 = e −Y1 [Y2 ]01


Y

⎧Y1e −Y1 ,0 ≤ Y1
g (Y1 ) = ⎨
⎩0 , otherwise

6. Let X 1 and X 2 have a joint PDF given by

⎧2 x , 0 ≤ x1 ≤ 1; 0 ≤ x2 ≤ 1
f ( x1 , x2 ) = ⎨ 1
⎩0, elsewhere

Find the PDF of Y1 = X 1 X 2

Solution

Let Y2 = X 2

Y1 Y
Therefore X 1 = = 1 and X 2 = Y2
X 2 Y2

The Jacobian of transformation is

∂X 1 ∂X 1
1 Y1
∂Y1 ∂Y 2 1
J= = Y2 − Y 22 =
∂X 2 ∂X 2 Y2
0 1
∂Y1 ∂Y 2

⎧ f (w1 (Y1 , Y2 ), w2 (Y1 , Y2 )) ⋅ J


f (Y1 , Y2 ) = ⎨
⎩0 elsewhere

⎧ Y1 1
⎪2 ⋅
= ⎨ Y2 Y2
⎪0 elsewhere

Limits

0 ≤ x1 ≤ 1; 0 ≤ x2 ≤ 1

- 81 -
Y1
0≤ ≤ 1; 0 ≤ Y2 ≤ 1
Y2

Hence 0 ≤ Y1 ≤ Y2 ≤ 1

Y1

The PDF of g (Y1 ) = ∫ g (Y1 , Y2 )dY2


0

1 1 1
Y1 1
= ∫2 2
⋅dY2 = 2Y1∫ 2
⋅dY2 = 2Y1 ∫ Y2 dY2
−2

Y1 Y2 Y1 Y2 Y1

1 1
⎡ Y −1 ⎤ ⎡1⎤
= 2Y1 ⎢ 2 ⎥ = −2Y1 ⎢ ⎥
⎣ − 1 ⎦ Y1 ⎣ Y2 ⎦ Y1

1
⎡1⎤ ⎧ 1⎫
= −2Y1 ⎢ ⎥ = −2Y1 ⎨1 − ⎬
⎣ Y2 ⎦ Y1 ⎩ Y1 ⎭

⎧2(1 − Y1 ) ,0 ≤ Y1
g (Y1 ) = ⎨
⎩0 , otherwise

Hence or otherwise, find E (Y1 )

1
E (Y1 ) = ∫ Y1 g (Y1 )dY1
0

1
= ∫ Y1.2(1 − Y1 )dY1
0

1
⎡ Y3⎤ 4
= 2 ⎢Y12 − 1 ⎥ =
⎣ 3 ⎦0 3

Question

The joint probability density of X and Y is

⎧k (5 x + y ), 0 < x < 1, 0 < y < 1


f ( x, y ) = ⎨
⎩0, elsewhere

1) Find the value of k

- 82 -
2) Find P(0 < x < 13 , 1
2 < y < 1)

3) Determine the marginal densities of X and Y

4) Find E ( x ) , E ( y ) , E (x 2 ) and E ( y 2 )

5) Calculate the correlation coefficient between X and Y.

Question

The joint probability density function of X and Y is

⎧ke − (4 x +3 y ) , x > 0, y > 0


f ( x, y ) = ⎨
⎩0, elsewhere

1) Find k

2) Determine the joint mgf of X and Y

3) Find the conditional density of Y given X.

4) Are X and Y independent?

Question

Find the probability density function of Y = X 1 + X 2 if the joint probability distribution density

of X 1 and X 2 is

⎧ke − ( x1 + x2 ) , x1 > 0, x2 > 0


f ( x, y ) = ⎨
⎩0, elsewhere

- 83 -
Examination for April 2006

SMA 2231 Probability and stats lll

Question 1

i) a). The joint probability distribution of two random variables X1 and X2 is shown in the

following table

(x1 , x2 ) (0,0) (0,1) (1,0) (1,1) (2,0) (2,1)


f ( x1 , x2 ) 1 / 18 3 / 18 4 / 18 3 / 18 6 / 18 1 / 18

Find

1) The marginal distribution of X1

2) The marginal distribution of X2

3) The marginal distribution of X1 given that X2 =1

4) The marginal distribution of X2 given that X1 =1 10mks

b). A soft drink machine has a random amount Y2 in supply at the beginning of a given day and

dispenses a random amount Y1 during the day (with measurement in gallons). It is not re-

supplied during the day and hence and hence Y1< Y2 have joint density

⎧ 12 , 0 ≤ y1 ≤ y2 , 0 ≤ y2 ≤ 2
f ( y1 , y2 ) = ⎨
⎩0, elsewhere

Find

i) The conditional density of Y1 given Y2 = y2

1
ii) The probability that less than 2
gallon is sold, given that the machine contains 1 gallon

at the start of the day.

iii) P (Y1 ≥ 12 Y2 ≤ 1
4 ) 8 mks

iv) P (Y1 ≤ 12 Y2 = 2 )

c). Suppose that the random variable x1 and x2 have the joint probability density function

- 84 -
⎧12 x x (1 − x1 ), 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1
f ( x1 , x2 ) = ⎨ 1 2
⎩0, elsewhere

Show that x1 and x2 are independent random variables. (5 marks)

d). The random variables X and Y have chi-square distributions with n and m degrees of

freedom respectively where n > m. Find the distribution of X-Y using the method of moment

generating function.

e). If two random variables are independent, are they also un-correlated? Is the converse true?

2. a). Define and explain the following terms:

i) Bivariate distribution

ii) Correlation coefficient

iii) Regression coefficient

iv) Conditional distribution 7 marks

b). X1 and X2 are normally distributed random variables with means and standard deviations

μ1 , δ1 and μ 2 , δ 2 respectively.

i) Using the method of moment generating function, find the probability distribution

of Y=X1-X2. (Write down the probability density function)

ii) If μ1 = 6, μ 2 = 7, δ12 = 1 and δ 22 = 1 find the probability of X1>X2. 8mks

3. a). X1and X2 are independent random variables each with a chi-square distribution with r=2

degrees of freedom. Find the distribution of Y = 1


2 ( X 1 − X 2 ) using transformation of variable
technique.

(Recall that if X is a chi-square random variable with r degrees of freedom then its probability

density function is given by

( 12 ) r
2
x
r
2 −1
e
−x
2 ∞
where Γ ( x ) = ∫ xα −1e − x dx 10mks
Γ (r 2 ) 0

- 85 -
b). Let X1, X2 be a random sample from a distribution having the probability density function

⎧e x , 0 < x < ∞
f (x ) = ⎨
⎩0, elsewhere

If Y1 = X 1 + X 2 and Y2 = X 1 ( X 1 + X 2 ) find the joint distribution of y1 , y2 . (10 marks)

4.a). if a bivariate normal density has the exponent


1
102
[
(x + 2)2 − 2.8(x + 2)( y − 1) + 4( y − 1)2 ]
Find the values of

i. The means μ1 and μ 2

ii. The standard deviations δ1 and δ 2

iii. The correlation coefficient, ρ . 10 mks

b). in a certain population of married couples the height X of the husband and the height Y of

the wife have a bivariate normal distribution with parameters μ1 = 5.8 units and μ 2 = 5.3

units. The standard deviations δ1 = δ 2 = 0.2 , and correlation coefficient ρ = 0.6 . Find the

probability that the height of the wife lies between 5.28 and 5.92 units given that the height of

the husband is 6.3 units. 10 mks

5. a). Let U and V be two independent chi-square random variables with respective means r1

and r2.

i. What is the distribution of

U r1
F=
V r2

ii. Write down an expression for finding the marginal density of F.

iii. What is the joint density of U and V?

U r1
iv. Introducing Z=V and F = find the joint density of f and z.
V r2

- 86 -
v. Write down an expression for finding the marginal density of F. 10mks

b). let X be a standard normal random variable and Y a chi square random variable. X and Y

are independent.

X
i. State the distribution of
Y k

X
ii. Consider T = . Introduce Z=Y and find the joint distribution of T and Z.
Y k

iii. Write down an expression for finding the marginal density of T. 10 mks

CAT QUESTIONS

1.Given the function

⎧6 x 2 y, 0 < x < 1, 0 < y <1


f ( x, y ) = ⎨
⎩0, elsewhere

a. Show that f ( x, y ) is a probability density function

b. Calculate the variance of x and the variance of y

c. Find P(0 < x < 34 , 13 < y < 2)

Qz

2. If two random variables X and Y have the joint probability distribution function

⎧ 130 ( x + y ) , for x = 0,1,2,3 and y = 0,1,2.


P ( x, y ) = ⎨
⎩0, elsewhere

a) Show that P( x, y ) satisfy the properties of a discrete joint distribution function 4mks

b) Find the probability that x =3

c) Find the probability that y =1

d) Find F (2,1)

e) Find:

i. The marginal distribution of X

- 87 -
ii. The marginal distribution of Y

Qz

If X is the proportion of persons who will respond to one kind of mail-order solicitation, Y is

the proportion of persons who will respond to another kind of mail-order solicitation and the

joint probability density of X and Y is given by

⎧(2 5)( x + 4 y ), 0 < x < 1, 0 < y < 1


f (x , y ) = ⎨
⎩0, elsewhere

Find

a) The marginal densities of X and Y;

b) The conditional density of Y given that X takes on the value X;

c) The conditional density of X given that Y takes on the value Y.

Qz

Check for each of the following probability densities whether the two random variables are

independent:

⎧(1 81)x12 x 22 , for 0 < x1 < 3, 0 < x 2 < 3


a) f ( x1 , x 2 ) = ⎨
⎩0, elsewhere

⎧(2 81)x12 x 22 , for 0 < x1 < x 2 < 3


b) f ( x1 , x 2 ) = ⎨
⎩0, elsewhere

- 88 -
CAT III

SMA 2231

Q1. Let x1 , x 2 − − x n be a random sample from a normal population with mean μ and

1 n
variance σ 2 . Show that the sample mean x= ∑ xi and
n i
the sample variance

n
( x i − x )2
S2 = ∑ are independent (use sample size n=2).
i n −1

Q2. Let x1 , x 2 − − x n be independently and identically distributed with mean μ and

n
variance σ 2 . Let Q = ∑ ( xi − x ) where x is the sample mean. Find E (Q ) .
2

Q3. The following are observations 9random) from a normal population with mean 22 and

variance 10. They are 25, 17, 23, 20, 18, 15, 24, and 21. Calculate a statistic that is a function

of all observations which has;

a) Standard normal distribution

b) Chi-square distribution with 7 degrees of freedom

c) T distribution with 7 degrees of freedom

Q4.

(a) Define the order statistics

(b) Find the probability density function of Yn = max( x1 , x 2 − − x n ) if x1 , x 2 − − x n is a

1
random sample from the distribution f ( x ) =
−x
e θ,x > 0.
θ

- 89 -

You might also like