Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

SOME THEORY AND PRACTICE OF STATISTICS

by Howard G. Tucker

CHAPTER 3. EXPECTATION

3.1 De…nition of the Expectation of a Random Variable. As we


have seen, the distribution function FX of a random variable X spreads a
unit of mass over the real line by assigning to every interval [a; b] of real
numbers the mass FX (b) FX (a). The expectation of this random variable
will be de…ned as the …rst moment of the spread of this mass about the
origin, i.e., the expectation will be de…ned as the center of gravity of this
spread of mass. This is not an arti…cial notion; it will play an important role
with respect to the notion of relative frequency, a notion that is a central
concept of statistics. So, after exhibiting what appears at …rst to be a rather
strange de…nition of expectation, we shall show how it is computed in the
two special cases of absolutely continuous and discrete distributions. Finally,
we develop basic properties of expectation when it exists.
As we pointed out above, a random variable X determines a distribution
function FX which spreads one unit of mass along the real line according
to the formula P ([a < X b]) = FX (b) FX (a). The expectation of
this random variable or of its distribution function is nothing other than
the centroid or center of gravity of this spread of unit mass. The problem
is in de…ning it. If, say, the random variable were discrete, taking P4 values
a1 ; a2 ; a3 ; a4 with corresponding probabilities p1 ; p2 ; p3 ; p4 , (where i=1 pi =
1), then P the centroid, or center of gravity, or …rst moment of this spread of
mass is 4i=1 ai pi . Let us suppose that a1 < a2 < 0 < a3 < a4 . Then the
distribution function FX would be as graphed in …gure 1.

SPACE FOR FIGURE 1

Now one notices that the center of gravity given by the above formula is
nothing other than the sum of the areas in rectangles denoted by 3 and 4
minus the sum of the areas in the rectangles marked o¤ as 1 and 2. In other
words, the center of gravity is the area between the graph of y = FX (x) , the
line y = 1 and the y-axis minus the area below the graph of y = FX (x), above
the x-axis and to the left of the y-axis. For this and any discrete distribution
function, this is always the case. Since any distribution function can be
approximated quite closely by a discrete distribution function, it would seem

55
almost natural to de…ne the centroid in the general case by such a di¤erence
in areas. Accordingly, we arrive at the following de…nition.
De…nition. Let X be a random variable with distribution function
FX (x). The expectation E(X) or EX of X is de…ned by

R1 R0
E(X) = (1 FX (x))dx FX (x)dx
0 1
R1 R0
= P ([X > x])dx P ([X x])dx ,
0 1

and is said to exist if the two integrals are …nite. In cases where at least one
of the integrals is in…nite, we shall say that the expectation of X does not
exist.
In other words, we say that the expectation of X exists if and only if the
two integrals in the above de…nition are …nite. For technical reasons later on
we shall need the following proposition.
Lemma 1. If X is a random variable that satis…es

Z1 Z0
k
x P ([X > x])dx < 1 and j x jk P ([X x])dx < 1
0 1

for some integer k 0, then


Z1 Z1
k
x P ([X > x])dx = xk P ([X x])dx
0 0

and
Z0 Z0
k
x P ([X < x])dx = xk P ([X x])dx .
1 1

Proof: We shall only prove the second equality. We …rst prove this in the
case where k = 0. Let > 0 be arbitrary. Since [X < x] [X x] [X <
x + ], we have

Z0 Z0 Z0
P ([X < x])dx P ([X x])dx P ([X < x + ])dx.
1 1 1

56
Let us make the change of variable y = x + in this last integral. Then this
last integral becomes
Z Z0 Z Z0
P ([X < y])dy = P ([X < x])dx+ P ([X < x])dx P ([X < x])dx+ .
1 1 0 1

These last two displays imply


Z0 Z0 Z0
P ([X < x])dx P ([X x])dx P ([X < x])dx + .
1 1 1

Letting ! 0, we obtain the conclusion in the case of k = 0. We now prove


the second equation in the conclusion of the theorem in the case where k 1
. Letting > 0 be arbitrary, but keeping < 1, we have
R0 R0
j x jk P ([X < x])dx j x jk P ([X x])dx
1 1
R0
j x jk P ([X < x + ])dx .
1

Letting y = x + , this last integral is


R
= jy jk P ([X < y])dy
1
R0 R
= jy jk P ([X < y])dy + jy jk P ([X < y])dy .
1 0

Let us bound the second integral from above. If 0 y , then j y j< .


Also, P ([X < y]) 1, so
Z
jy jk P ([X < y])dy k+1
.
0

Note that for y 0 the inequality j y jk j y jk 0, so the …rst integral


is equal to
Z0 Z0
k
j y j P ([X < y])dy + jj y jk j y jk j P ([X < y])dy .
1 1

57
We shall bound the second integral in this last expression. By the triangle
inequality, j y j j y j + and so by the binomial theorem,

Xk
k
k k
jy j (j y j + ) = j y jj k j
.
j=0
j

Thus,
X
k 1
k
k k
jj y j jyj j j y jj k j
.
j=0
j

Now we may write

R0 R1
jj y jk j y jk j P ([X < y])dy = jj y jk j y jk j P ([X < y])dy
1 1
R0 .
k k
+ jj y j j y j j P ([X < y])dy
1

Over the interval ( 1; 1] we have j y j 1, and over [ 1; 0] we have


j y j 1; keeping in mind that 0 < < 1, we have

R1 kP1
k
R1
jj y jk j y jk j P ([X < y])dy j
k j
j y jj P ([X < y])dy
1 j=0 1
R1
(2k 1) j y jk P ([X < y])dy
1

Also, as above, but keeping in mind that j y j 1 over [ 1; 0], we have

R0 R0
jj y jk j y jk j P ([X < y])dy (2k 1) P ([X < y])dy
1 1
(2k 1).

Combining the above, we obtain

R0 R0
j x jk P ([X < x])dx j x jk P ([X x])dx
1 1
R0
j x jk P ([X < x])dx + (2k 1) .
1

58
R0
Since by hypothesis j x jk P ([X x])dx < 1, we obtain the conclusion
1
by taking the limit as ! 0.
This proposition is of great use to us. It allows us to replace by < and
by >.
Lemma 2. If P fan g is a denumerable sequence of nonnegative real num-
bers, and if Qn = 1 j=n aj , then

X
1 X
1
Qn = nan .
n=1 n=1

Proof: Since in…nite series of nonnegative terms can be arranged without


altering convergence properties, we have
P1
n=1 Qn = a1 + a2 + a3 + a4 +
+ a2 + a3 + a4 +
+ a3 + a4 +
+ a4 +
= a1 + 2a2 + 3a3 + 4a4 + ,

which completes the proof.


Theorem 1. Let X be a random variable with a discrete distribution,
i.e., there exists a …nite or denumerable sequence of distinct real numbers
fxn g and corresponding
P probabilities fpn g such that P ([X = xn ]) = pn for
all values of n and n 1 pn = 1. Then the expectation of X exists and
X
E(X) = xn pn
n 1

if and only if this series is absolutely convergent.


Proof: Let

Z1 Z0
I1 = P ([X > x])dx and I2 = P ([X x])dx .
0 1

We shall use the fact that P ([X > x]) is nonincreasing in x, i.e., if x1 < x2 ,
then P ([X > x1 ]) P ([X x2 ]). Because of the de…nition of the improper

59
Riemann integral, we may write for every positive integer n,

X Z
k=n

I1 = P ([X > x])dx .


k 1
(k 1)=n

Because of the nonincreasing property of P ([X > x]) in x that was mentioned
above, we have the following inequality:

Zk=n
1 k 1 k 1
P ([X > ]) P ([X > x])dx P ([X > ]) .
n n n n
(k 1)=n

Thus, if we denote
X1 k
Ln = P ([X > ])
k 1
n n

and
X1 k 1
Un = P ([X > ]) ,
k 1
n n

we easily see because of the above inequality that for every positive integer
n,
Ln I1 Un .
However, we may write
1 XX j j+1
Ln = P ([ < X ]) .
nk 1j k n n

Now if we let aj = P ([ nj < X j+1


n
]), we may apply lemma 2 to obtain
P
1
k 1 k 1 k
Ln = n
P n
<X n
k=1
P1
k k 1 k
P
1
1 k 1 k
= n
P n
<X n n
P n
<X n
.
k=1 k=1

However
k k 1 k k
P
n
P n
<X n
=P
n
fpj : (k 1)=n < xj k=ng
fxj pj : (k 1)=n < xj k=ng .

60
Hence
P P 1
Ln Pk 1 fxj pj : (k 1)=n < xj k=ng n
P ([X > 0])
fxm pm : xm > 0g n1 .

In a similar fashion one can prove that


X 1
Un fxm pm : xm > 0g + .
n
Thus, for every positive integer n,
X 1 X 1
fxm pm : xm > 0g I1 fxm pm : xm > 0g + ,
n n
from which there follows, by taking the limit as n ! 1, that
X
I1 = fxm pm : xm > 0g .

In an entirely similar manner, one can prove that


X
I2 = fxm pm : xm 0g .
P
Thus absolute convergence of the series n 1 xm pm is necessary and su¢ cient
for the integrals I1 and I2 to exist and be …nite, and we obtain
X
E(X) = I1 I2 = xm pm ,
m 1

which proves the theorem.


Theorem 2. Let X be a random variable with an absolutely continuous
distribution function and whose density fX (x) is piecewise continuous and is
bounded over every bounded interval. Then the expectation of X exists and
Z1
E(X) = xfX (x)dx
1

if and only if this integral exists as an improper Riemann integral.


Proof: By the de…nition of expectation given above, the expectation,
R1 R0
E(X), exists and equals P ([X > x])dx P ([X x])dx, provided that
0 1

61
both of these improper Riemann integrals exist and are …nite. For A > 0, and
making use of integration by parts, the hypothesis that fX (x) is bounded over
every bounded interval and is continuous everywhere except at a countable
set of points, and the fundamental theorem of calculus, we obtain
RA RA R1
P ([X > x])dx = fX (t)dt dx
0 0 x
RA Rx
= fX (t)dt dx
0 1
Rx RA
= x fX (t)dt jA
0 + xfX (x)dx
1 0
R1 RA
=A fX (t)dt + xfX (x)dx.
A 0

Both quantities on the right hand side are nonnegative. First let us suppose
RA
that E(X) exists and is …nite. Then as A ! 1, we have P ([X > x])dx !
0
R1 RA RA
P ([X > x])dx < 1. Since 0 xfX (x)dx P ([X > x])dx <
0 0 0
R1 R1
1 we have, taking A ! 1, xfX (x)dx P ([X > x])dx < 1, i.e.,
0 0
R1 R1 R1
xfX (x)dx < 1. Since A fX (t)dt xfX (x)dx < 1, it follows that
0 A A
R1 R1
A fX (t)dt ! 0. Thus by the displayed equation above, xfX (x)dx =
A 0
R1 R1
P ([X > x])dx < 1. Now let us assume that xfX (x)dx < 1. Then be-
0 0
R1 R1 R1
cause A fX (t)dt xfX (x)dx < 1, we have A fX (t)dt ! 0 as A ! 1.
A A A
R1
By the displayed equation above, it follows that P ([X > x])dx < 1 and
0
R1 R1 R0
P ([X > x])dx = xfX (x)dx. Similarly, P ([X x])dx < 1 if and
0 0 1
R0
only if xfX (x)dx < 1, in which case the two integrals are equal, which
1
proves the theorem
We now give some examples of expectations of random variables.

62
Example 1. The binomial distribution. Suppose X is a random variable
with the binomial distribution, Bin(n; p), i.e.,
n
px (1 p)n
x
x
if x = 0; 1; 2; ;n
P ([X = x]) =
0 otherwise .

In this case we have a discrete distribution, so


P
n
E(X) = kP ([X = k])
k=0
Pn
n
= k k
pk (1 p)n k
k=1
P
n
n 1
= np k 1
pk 1 (1 p)(n 1) (k 1)
k=1
nP1
n 1
= np k
pk (1 p)n 1 k
k=0
= np(p + (1 p))n 1

= np .

Example 2. The geometric distribution. Let X be a random variable


whose distribution is geom(p). Then P ([X = n]) = (1 p)n 1 p for n =
1; 2; . Recalling that the derivative of a power series is equal to the series of
the derivatives insidePthe interval of absolute convergence, and recalling that
the geometric series n 0 xn has radius of convergence 1, then for 0 < p < 1,

X
1
d X
1
d 1 1
E(X) = n(1 p)n 1 p = p (1 p)n = p = .
n=1
dp n=0 dp 1 (1 p) p

Example 3. The Poisson distribution. Let X be a random variable whose


n
distribution is P ( ) for some > 0, i.e., P ([X = n]) = e =n! for n =
0; 1; 2; . Then
X
1 n X
1 n 1
E(X) = ne = e = e e = .
n=0
n! n=1
(n 1)!

We next consider expectation of random variables with discrete distrib-


utions that arise from sampling without replacement. We shall …rst …nd the
expectation of the general sample survey sum distribution and then draw
o¤ as special cases the expectations of the Wilcoxon and hypergeometric
distributions. But …rst we need a combinatorial lemma.

63
Lemma 3. If x1 ; x2 ; ; xN are any numbers, and if 1 n N , then
X X
n
N 1 X
N
xik = xj .
1 i1 <i2 < in N k=1
n 1 j=1

Proof: For every j, 1 j N , there are obviously Nn 11 sums of the


Pn
form xik , where 1 i1 < i2 < in N , that include xj . This proves
k=1
the lemma.
Example 4. The sample survey sum distribution. Let X denote the sum of
n numbers selected at random without replacement from the (not necessarily
P
n
distinct) set of N numbers x1 ; x2 ; ; xN . Each sum of the form xik is
k=1
N
selected with probability 1= n
. Therefore, by lemma 3,
X 1 X
n
n X
N
E(X) = N
xik = xj = nx;
1 i1 <i2 < in N n k=1
N j=1

1
P
N
where x = N
xj .
j=1
Example 5. The Wilcoxon Distribution. In this case, a random variable
W is the sum of n numbers selected at random without replacement from
1; 2; ; N . We recall that
X
N
N (N + 1)
j= ,
j=1
2

and so, applying the result of example 4, if W has the W ilcoxon(n; N ) dis-
tribution, then E(W ) = n(N2+1) .
Example 6. The hypergeometric distribution. Recall that X is said to
have the hypergeometric distribution if it has the same distribution as the
number of red balls in a sample of size n taken at random without replacement
from an urn that contains r red balls and b black balls, where 1 n r + b.
In this case, we assign the number 1 to each of the r red balls and the number
0 to each of the black balls. Now the number X of red balls in the sample
of size n is equal to the sum of the numbers in the sample. Thus, taking
N = r + b, we have
r
E(X) = n .
r+b

64
2
Example 7. The normal distribution, N ( ; ). This absolutely continu-
ous distribution has a density given by
1 (x )2 =2 2
fX (x) = p e , -1 < x < 1,
2 2

2
where 2 ( 1; 1) and > 0 are constants. Using theorem 2, we obtain
Z1 Z1
1 (x )2 =2 2
E(X) = xfX (x)dx = p xe dx = .
2 2
1 1

Example 8. The exponential distribution. This absolutely continuous


distribution has a density given by

e x if x 0
fX (x) =
0 if x < 0,

where > 0 is some constant. Using theorem 2, we obtain


Z1 Z1
x 1
E(X) = xfX (x)dx = xe dx = .
1 1

Expectations do not always exist. As an example of a random variable


with an absolutely continuous distribution for which the expectation does not
exist, let us consider the Cauchy distribution given in the previous section.
Recall that its density is given by
1
fX (x) =
(1 + x2 )
for all real values of x. Now remembering the care we must take with an
improper integral, we note that
Z1 Z1
1
xfX (x)dx = x dx = 1 ,
(1 + x2 )
0 0

R0 1
and x (1+x2 )
dx = 1 . Thus the condition given by theorem 2 is vi-
1
olated. An example of a random variable with a discrete distribution for

65
which the expectation does not exist, consider a random variable Y with
discrete density given by P ([Y = 2n ]) = 21n for n = 1; 2; . In this case,
P1
2n P ([Y = 2n ]) = 1 , thus violating the condition of theorem 1.
n=1

EXERCISES

1. Let X be a random variable with distribution function de…ned by


8
> 0 if x < 3:5
>
>
>
> :2 if 3:5 x < 1
>
<
:35 if 1 x<0
FX (x) =
>
> :45 if 0 x<2
>
>
>
> :65 if 2 x < 3:5
:
1:00 if x 3:5 .

(i) Compute E(X) by using theorem 1.


(ii) Compute E(X) by using the de…nition of expectation.
2. Let X be a random variable with distribution function de…ned by
8 1 x
< 2 e ifpx < 0
1
FX (x) = + 1 x if 0 x < 1
: 2 2
1 if x 1 .

(i) Find the density of FX (x) .


(ii) Compute E(X) using theorem 2.
(iii) Compute E(X) using the de…nition of expectation.
3. Let X be a random variable with absolutely continuous distribution
with density given by
p
x3 if 0 x 2
fX (x) =
0 otherwise.

Determine FX (x) and E(X),


4. Let X be a random variable with absolutely continuous distribution
with density given by
1 jx 2j
fX (x) = e for all real x .
2

66
(i) Find the distribution function of X.
(ii) Compute E(X).

3.2 Basic Properties of Expectation. We now develop some linearity


and order properties of expectation that will be used repeatedly from here
on.
Theorem 1. If X is a random variable that is bounded, i.e., if there
exists a constant K > 0 such that P ([j X j< K]) = 1, then the expectation
of X exists, and K < E(X) < K.
Proof: In this case it is easy to see that P ([X > K]) = 0 and P ([X <
K]) = 0. Thus
Z1 ZK
P ([X > x])dx = P ([X > x])dx K
0 0

and

Z0 Z0
P ([X x])dx = P ([X x])dx K ,
1 K

which proves the …niteness of both integrals in the de…nition of expectation.


The second conclusion is easily obtained from the above inequalities.
Theorem 2. If X is a random variable whose expectation exists, and if
c is a constant, then the expectation of cX exists and E(cX) = cE(X).
Proof: We shall prove this only for the case c < 0; the proof for c 0 is
R1
the very same and is left as an exercise. We observe that P ([cX > x])dx =
0
R1
P ([X < x=c])dx . Making the change of variable y = x=c, we obtain, using
0
proposition 1 in section 3.1,
Z1 Z0 Z0
P ([X < x=c])dx = c P ([X < y])dy = c P ([X y])dy
0 1 1

which is …nite by hypothesis and the de…nition of expectation. Similarly,


R0 R1
P ([cX x])dx = c P ([X > y])dy, which yields our conclusion.
1 0

67
Theorem 3. If X; Y and Z are random variables, if expectations exist
for X and Z, and if X Y Z; (i.e., if

P ([X Y Z]) = 1,

then the expectation of Y exists, and E(X) E(Y ) E(Z).


Proof: The hypothesis X Y Z is easily seen to imply

[X > x] [Y > x] [Z > x] and [Z x] [Y x] [X x] .

for every real number x. Thus

P ([X > x]) P ([Y > x]) P ([Z > x])

and
P ([Z x]) P ([Y x]) P ([X x]):
The hypothesis that X and Z have expectations, the de…nition of expectation
and the above inequalities easily imply the conclusions.
Corollary. If X and Z are random variables whose expectations exist,
and if X Z, then E(X) E(Z) .
Proof: This follows from theorem 3 by taking Y = X.
Lemma 1. If X and Y are random variables, then
Z1 Z1
lim P ([X > x] \ [j Y j< K])dx = P ([X > x])dx
K!1
0 0

and
Z0 Z0
lim P ([X x] \ [j Y j< K])dx = P ([X x])dx ,
K!1
1 1

no matter whether in each case the right hand side is …nite or in…nite.
Proof: We shall only prove the …rst conclusion; a similar proof holds for
the second conclusion. We …rst observe that [X > x] \ [j Y j K] [X > x],
so that P ([X > x] \ [j Y j K]) P ([X > x]) . From this we obtain that
Z1 Z1
P ([X > x] \ [j Y j K])dx P ([X > x])dx
0 0

68
for all K > 0 . Since the right side does not depend on K, and since the left
side is nondecreasing in K (prove this!), it follows that the following limit
exists and
Z1 Z1
limK!1 P ([X > x] \ [j Y j K])dx P ([X > x])dx.
0 0

We now prove the reverse inequality. Note that

P ([X > x]) = P ([X > x] \ [j Y j K]) + P ([X > x] \ [j Y j> K]) .

Now P ([X > x] \ [j Y j> K]) P ([j Y j> K]) , so

P ([X > x] \ [j Y j K]) P ([X > x]) P ([j Y j> K]) .

Let A > 0 and > 0 be …xed (with A as large as one wishes, and > 0
being as small as one wishes). Then, since P ([j Y j K]) ! 1 as K ! 1,
it follows that P ([j Y j> K]) ! 0 as K ! 1 . Hence there exists a K0 > 0
such that for all K K0 , the inequality P ([j Y j> K]) < A is true. Hence
for all K K0 ,
R1 RA
P ([X > x] \ [j Y j K])dx (P ([X > x]) P ([j Y j> K]))dx
0 0
RA
P ([X > x])dx .
0

Since the …rst expression in this last display is nondecreasing as K increases,


we have
Z1 ZA
limK!1 P ([X > x] \ [j Y j K])dx P ([X > x])dx .
0 0

Since this holds for all A, we can take the limit of both sides as A ! 1 to
obtain
Z1 Z1
limK!1 P ([X > x] \ [j Y j K])dx P ([X > x])dx .
0 0

Next, take the limit of both sides as ! 0, and the reverse inequality is
established, which concludes the proof of the …rst conclusion.

69
Corollary. If X is a random variable whose expectation exists, then

E(X) = limK!1 E(XI[jXj<K] ) .

Proof: We …rst observe the following easily proved identities: for x >
0; K > 0;

[X > x] \ [j X j< K] = [XI[jXj<K] > x]


and
[X x] \ [j X j< K] = [XI[jXj<K] x].

Using these identities, lemma 1 and the de…nition of expectation, we


obtain
R1 R0
E(X) = P ([X > x])dx P ([X x])dx
0 1
R1
= limK!1 P ([X > x] \ [j X j< K])dx
0
R0
limK!1 P ([X x] \ [j X j< K])dx
1
R1 R0
= limK!1 P ([XI[jXj<K] > x])dx P ([XI[jXj<K] x])dx
0 1
= limK!1 E(XI[jXj<K] ),

which proves the corollary.


Theorem 4. If X and Y are random variables whose expectations exist,
then the expectation of X + Y exists, and E(X + Y ) = E(X) + E(Y ).
Proof: We …rst prove that the expectation of X + Y exists. In order to
do this, we need only prove that
Z 1 Z 0
P ([X + Y > x])dx < 1 and P ([X + Y x])dx < 1.
0 1

We shall prove only the second inequality. We observe that for every x 0,
x x
[X + Y x] [X ] [ [Y ].
2 2
Thus
x x
P ([X + Y x]) P ([X ]) + P ([Y ]),
2 2
70
and, by properties of integration,
Z 0 Z 0 Z 0
x x
P ([X + Y x])dx P ([X ])dx + P ([Y ])dx.
1 1 2 1 2
By the change of variable y = x=2, this last inequality becomes
Z 0 Z 0 Z 0
P ([X + Y x])dx 2 P ([X y])dy + 2 P ([Y y])dy,
1 1 1

which is …nite by hypothesis. We now prove the theorem in special cases.


Case (i) Let X be as in the hypothesis, and let Y = cIA , where c is a
constant and A is an event. We next observe that
R1 R0
E(X + cIA ) = R0 P ([X + cIA > x])dx P ([X + cIA x])dx
1R
1 1
= 0 P ([X + cIA > x] \ A)dx + 0 P ([X > x] \ Ac )dx
R0 R0
R1 1 P ([X + cI A x] \ A)dx
R1 1
P ([X x] \ Ac )dx
= 0 P ([X + c > x] \ A)dx + 0 P ([X > x] \ Ac )dx
R0 R0
R1 1 P ([X + c x] \ A)dx
R1 1
P ([X x] \ Ac )dx
= P ([X > y] \ A)dy + c P ([X > y] \ Ac )dy
Rc c R c
P ([X y] \ A)dy P ([X y] \ Ac )dy
R0 1 R1 1
= P ([X > y] \ A)dy + 0 P ([X > y])dy
Rc c R0
P ([X y] \ A)dy P ([X y])dy
R0 0 R0 1
= P ([X > y] \ A)dy + c P ([X y] \ A)dy + E(X)
R 0c
= c
P (A)dy + E(X) = E(X) + cP (A) = E(X) + E(cIA ) .
Thus we have proved the theorem in case(i).
Case(ii) Again, let X be as in the hypothesis, and let Y be a discrete
random variable of the form
Xn
Y = yj IAj ,
j=1

where n is …nite, A1 ; ; An are events, and y1 ; ; yn are constants. The


random variable Y is bounded, so its expectation exists. Applying case (i)
again and again, we have E(X + y1 IA1 ) = E(X) + y1 P (A1 ); E(X + y1 IA1 +
y2 IA2 ) = E(X + y1 IA1 ) + y2 P (A2 ) = E(X) + y1 P (A1 ) + y2 P (A2 ), and after
n 2 more applications of case 1 we obtain
X
n X
n
E(X + Y ) = E(X + yj IAj ) = E(X) + yj P (Aj ) = E(X) + E(Y ) ,
j=1 j=1

71
which proves the theorem in case (ii).
Case (iii) Now suppose X is as in the hypothesis, and suppose Y is a
bounded random variable, i.e., suppose there is an integer N > 0 such that
j Y (!) j< N for all ! 2 . The expectation of Y exists since it is bounded.
In this case we de…ne two random variables, Yn+ and Yn for every positive
integer n by:
XN 2n
+ j
Yn = I n n
n [(j 1)=2 <Y j=2 ]
n
j= N 2 +1
2
and n
X
N2
j 1
Yn = I[(j 1)=2n <Y j=2n ] .
j= N 2n +1
2n
It is easy to verify that Yn (!) Y (!) Yn+ (!) over , and 0 Yn+ (!)
Yn (!) 1=2n for all n and for all ! 2 . Thus by theorem 3 and case (ii)
we have
E(Yn+ Yn ) 1=2n ,
E(Yn+ Yn ) = E(Yn+ ) E(Yn ) and
E(Yn ) E(Y ) E(Yn+ ) ,
which imply
0 E(Y ) E(Yn ) 1=2n and
0 E(Yn+ ) E(Y ) 1=2n .
Since X + Yn X+Y X + Yn+ , we have by case (ii) and theorem 3
that X + Y has …nite expectation, and hence by the inequalities above and
theorem 5, one obtains
E(X + Y ) E(X + Yn+ )
= E(X) + E(Yn+ )
E(X) + E(Y ) + 1=2n
for every positive integer n. Similarly E(X + Y ) E(X) + E(Y ) 1=2n ,
and thus for every positive integer n,
1=2n E(X + Y ) E(X) E(Y ) 1=2n .
Letting n ! 1, we obtain E(X + Y ) = E(X) + E(Y ) for case (iii).
Case (iv) We now prove the theorem in the general case. By lemma 1,
we obtain
R1 R1
0
P ([X + Y > x])dx = limK!1 R0 P ([X + Y > x] \ [j Y j< K])dx
1
= limK!1 0 P ([X + Y I[jY j<K] > x] \ [j Y j< K])dx ,

72
no matter whether the left side is …nite or in…nite. Similarly, …nite or in…nite,
we have by lemma 1,
Z 0 Z 0
P ([X + Y x])dx = lim P ([X + Y I[jY j<K] x] \ [j Y j< K])dx.
1 K!1 1

Now let us denote DK = UK LK where


Z 1
UK = P ([X + Y I[jY j<K] > x] \ [j Y j< K])dx
0
and Z 0
LK = P ([X + Y I[jY j<K] x] \ [j Y j< K])dx .
1
We …rst notice that this di¤erence is …nite. This follows from case (iii). We
next observe that
R1
DK = R 0 P ([X + Y I[jY j<K] > x])dx
1
P ([X + Y I[jY j<K] > x] \ [j Y j K])dx
R00
P ([X + Y I[jY j<K] x])dx
R 01
+ 1 P ([X + Y I[jY j<K] x] \ [j Y j K])dx
= E(X + Y I[jY j<K] )
R1 R0
R10
P ([X > x])dx + 1
P ([X Rx])dx
1
+ 0 P ([X > x] \ [j Y j< K])dx 0
P ([X x] \ [j Y j< K])dx,
which, by case (iii),
R1
= E(X) + E(Y I[jY j<K] ) E(X) + 0 P ([X > x] \ [j Y j< K])dx
R0
1
P ([X x] \ [j Y j< K])dx .
Now, applying lemma 1 and its corollary, we obtain
R1 R0
limK!1 DK = E(Y ) + 0 P ([X > x])dx 1
P ([X x])dx
= E(X) + E(Y ) .
Thus E(X + Y ) = E(X) + E(Y ), which proves case (iv) and the theorem.
Corollary. If X1 ; Xn are random variables whose P expectations exist,
n
and if c1 ; ; cn are constants, then the expectation of k=1 ck Xk exists,
and !
Xn Xn
E ck Xk = ck E(Xk ) .
k=1 k=1
Proof: This is an easy consequence of theorems 2 and 4.

73
EXERCISES

1. Use the de…nition of expectation to prove: if 0 < a < b are constants,


if X is a random variable, and if P ([a X b]) = 1, then the expectation
of X exists and a E(X) b.
2. Prove: if X is a random variable whose expectation exists, if b is a
constant, and if P ([X b]) = 1, then E(X) b.
3. A random variable X is said to be symmetric or symmetrically distrib-
uted about the number c if, for every x > 0, P ([c x X c]) = P ([c
X c + x]). Prove: if X is a random variable whose expectation exists, and
if X is symmetric about c, then E(X) = c.
4. Prove: if X is a random variable whose expectation exists, and if c is
a constant, then X + c is a random variable whose expectation exists, and
E(X + c) = E(X) + c.
5. Suppose X is a random variable with absolutely continuous distribu-
tion and whose density is
1
2
sin(x 2:3) if 2:3 x 2:3 +
fX (x) =
0 otherwise.

Evaluate E(X).
6. Prove: if K and x are positive constants, and if X is a random variable,
then
[X > x] \ [j X j< K] = [XI[jXj<K] > x] .
7. Verify the following: if c and x are real numbers, if A is an event, and
if X is a random variable, then
(i) [X + cIA > x] \ A = [X + c > x] \ A ,
c c
(ii) [X
R 0 + cIA > x] \ A = [XR>0 x] \ A and
(iii) c P ([X y] \ A)dy+ c P ([X > y] \ A)dy = cP (A).
8. Prove: if Y is a random variable that satis…es 0 Y (!) n for all
! 2 , where n is a positive integer, then
X
n X
n
(j 1)I[j 1<Y j] (!) Y (!) jI[j 1<Y j] (!)
j=1 j=1

for all ! 2 .
9. In problem 5 above, prove that the random variable X is symmetrically
distributed about 2:3 + 2 .

74
3.3 Moments and Central Moments. The purpose of this section is
to de…ne the various kinds of moments used in statistics and to derive some
of their properties. We note …rst that powers of random variables are also
random variables; we obtained this in theorem 5 in section 2.1.
De…nition. If r is a positive integer, and if X is a random variable, we
de…ne the rth moment of X to be mr = E(X r ), provided the expectation
exists. We de…ne the rth central moment to be r = E((X E(X))r ),
provided E(X) and E((X E(X))r ) both exist. The second central moment,
2 , of X is called the variance of X and is denoted by V arX or V ar(X).
Theorem 1. If X is a random variable, if r < s are positive integers,
and if E(X s ) exists, then E(X r ) exists.
Proof: Since by hypothesis E(X s ) exists, then by de…nition,
Z 1 Z 0
s s
E(X ) = P ([X > x])dx P ([X s x])dx ,
0 1
R1
and both integrals are …nite. We now prove that 0 P ([X r > x])dx < 1.
Indeed, by the same argument used to prove proposition 1 in section 2.3, we
have Z 1 Z 1
r
0 P ([X > x])dx = P ([X r x])dx 1
0 0
and Z Z
0 0
r
0 P ([X < x])dx = P ([X r x])dx 1.
1 1

Now for 1 x < 1; x1=r x1=s , and thus P ([X > x1=r ]) P ([X > x1=s ])
and P ([X < x1=r ]) P ([X < x1=s ]). Hence if r is odd,
R1 r
R1 1=r
0
P ([X > x])dx 1 + R11 P ([X > x1=s ])dx
1 + R1 P ([X > x ])dx
1
1 + 0 P ([X s > x])dx < 1 .
But if r is even, then
R1 R1
0
P ([X r > x])dx 1 + R1 (P ([X < x1=r ]) + P ([X > x1=r ]))dx
1
1 + 1 (P ([X < x1=s ]) + P ([X > x1=s ]))dx .
If s is odd, then by lemma 1 in section 3:1, we have
Z 1 Z 1 Z 1
1=s s
P ([X < x ])dx = P ([X x])dx = P ([X s x])dx < 1 ,
1 1 1

75
and if s is even, then
Z 1 Z 1
1=s
P ([X < x ])dx = P ([X > x1=s ])dx < 1:
1 1

No matter whether s is odd or even, we have


Z 1 Z 1
1=s
P ([X > x ])dx P ([X s > x])dx < 1 ,
1 1
R1
and thus we have shown that 0 P ([X r > x])dx < 1 . In a similar manner,
R0
one can show that 1 P ([X r x])dx < 1, and the theorem is proved.
Corollary to Theorem 1. If the rth moment of a random variable
exists, then its rth central moment exists.
Proof: This follows easily by theorem 1 and the binomial theorem.
Theorem 2. If X is a discrete random variable that takes values x1 ; x2 ;
with probabilities p1 ; p2 ;
P respectively, i.e., P ([X = xn ]) = pn for all n and
p
Pn 1 n r = 1, then the rth moment of X exists P if rand only if the series
r
n 1 j xn j pn < 1, in which case E(X ) = n 1 xn pn .
Proof: By the de…nition of a discrete random variable, = [n 1 [X = xn ].
Now [
[X r = x] = [X = xn ].
fn:xrn =xg

y2 ; g are the distinct values of the set fxr1 ; xr2 ;


Hence if fy1 ;P g, and if we
denote qn = fpj : xrj = yn g, we have
0 1
[ X
P ([X r = yn ]) = P @ [X = xj ]A = p j = qn .
fj:xrj =yn g fj:xrj =yn g

Hence
P by theorem 1 in section 3:2, E(X r ) exists if and only if the series
j yn j qn < 1. But
n
8 9 8 9
X X < X = X< X = X
jyn jqn = jyn j pj = jyn j pj = jxrj jpj ,
: ; : ;
n n fj:xrj =yn g n fj:xrj =yn g j

from which we obtain the theorem.

76
Lemma 1. If Z is a random variable with an absolutely continuous dis-
tribution, and if h : [a; b] ! R1 is a nondecreasing function over the interval
[a; b], then
Z b
h(a)P ([a Z < b]) h(x)fZ (x)dx h(b)P ([a Z < b]) .
a

Proof: This easily follows from the following:


Z b Z b Z b
h(a)P ([a Z < b]) = h(a) fZ (x)dx = h(a)fZ (x)dx h(x)fZ (x)dx ,
a a a

the last inequality following from the fact that the density is nonnegative
and h is nondecreasing. The second inequality follows by similar reasoning.
Lemma 2. If Z is a random variable, then

X1 Z1
1 k
P Z P ([Z > x])dx.
k=1
n n
0

Proof: Let N be a positive integer, and de…ne

k k 1 k
g(x) = P Z whenever x< for all x 2 [0; 1).
n n n

It follows that g(x) P ([Z x]) for all x 0. Thus

ZN XN
1 k
g(x)dx = P Z .
k=1
n n
0

RN RN
But g(x)dx P ([Z x]) dx, and by the de…nition of the improper in-
0 0
tegral, the conclusion follows.
Theorem 3. If X is a random variable with absolutely continuous dis-
tributionR function, and if r is a positive integer, then E(X r ) exists if and
1
only if 1
j x jr fX (x)dx < 1, in which case
Z 1
r
E(X ) = xr fX (x)dx .
1

77
Proof: Case (i) r is even. In this case, X r (!) 0 for every ! 2 , and
h(x) de…ned by h(x) = xr is increasing
R 1 over [0; 1) and is decreasing over
( 1; 0]. Thus, if we denote I = 0 P ([X r > x])dx, it follows from the
de…nition of expectation that E(X r ) exists if and only if I < 1, in which
case E(X r ) = I. Let us denote

( 1; x1=r ) [ (x1=r ; 1) if x 0
(x) =
( 1; 1) if x < 0 .

Then, if 0 x0 < x00 , it follows that (x00 ) (x0 ). Also P ([X


R
r
> x]) =
r
P ([X 2 (x)]). Let us de…ne g(x) = P ([X > x]); then g(x) = (x) fX (t)dt
is a nonincreasing function of x, g(x) = 1 if x = 0, and g(x) ! 0 as x ! 1.
Now denote Z
X1
1
Ln = fX (t)dt
k=1
n ( nk )
and Z
X1
1
Un = fX (t)dt .
k=1
n ( kn 1 )

By the above properties of g, it follows that Ln I Un for n = 1; 2; .


(Proof: Ln I follows from the fact that by lemma 2 and the fact that r is
even,
P
1
1
R P
1
1
n k fX (t)dt
( )
= n
P ([X 2 ( nk )])
n
k=1 k=1
P1
1
= n
P ([X r > nk ])
k=1
R1
P ([X r > x])dx:)
0
1
R
Now let ak = n k fX (t)dt, and applying lemma 1 above and lemma
( k n 1 )n ( n )
2 in section 3:1, we obtain
P
1
k 1
R
Ln = n ( k n 1 )n ( n
k
)
fX (t)dt
k=1
P1
k
R 1
= n ( k n 1 )n ( n
k
)
fX (t)dt n
k=1
P1 R
1
( k n 1 )n ( n
k
)
tr fX (t)dt n
Rk=1
1 1
= 1
tr fX (t)dt n
.

78
R1
In a similar manner, Un 1
tr fX (t)dt+ n1 . Thus, for every positive integer
n, Z Z
1 1
r 1 1
t fX (t)dt I tr fX (t)dt + ,
1 n 1 n
which yields the conclusions of the theorem in case r is even. Case (ii) r is
odd. In this case, de…ne
Z 1 Z 0
I1 = P ([X > x])dx and I2 = P ([X x])dx .
0 1

Then by de…nition of expectation, E(X r ) exists if and only if both I1 < 1


and I2 < 1.For every x > 0, de…ne 1 (x) = (x1=r ; 1) and for x < 0 de…ne
1=r
2 (x) = ( 1; x ). Using 1 (x) and 2 (x) as we did in the proving the
theorem in case (i) when r was even, we obtain
Z 1 Z 0
r
I1 = x fX (x)dx and I2 = xr fX (x)dx ,
0 1

which yields the conclusion of the theorem in this case, thus concluding the
proof of the theorem.

EXERCISES

1.Prove: if U is a random variable whose nth moment exists, then the


nth central moment exists and
Xn
n
n
E((U E(U )) ) = E(U j )( E(U ))n j
.
j=0
j

2. Suppose Y is a random variable that has the uniform distribution over


[0; 1), i.e., it has an absolutely continuous distribution with density given by
1 if 0 y < 1
fY (y) =
0 otherwise.

Compute E(Y n ) for any positive integer n.


3. In problem 2, compute the expectation and the variance of Y:
4. Suppose Z is a random variable that is uniformly distributed over the
set of numbers f n1 ; n2 ; ; nn g, i.e., P ([Z = ni ]) = n1 for 1 i n, where n is
a positive integer. Derive the formula for E(Z).

79
5. Prove: if Y and Zn are random variables as in problems 2 and 4, prove
that 0 FY (y) FZn (y) < n1 for all real values of y.
6. Prove: if Y and Zn are random variables as in problems 2 and 4, then
FZn (y) ! FY (y) uniformly in y as n ! 1, i.e., prove that for every > 0
there exists a positive integer N such that j FY (y) FZn (y) j< for all values
of y and for all n > N .
7. De…nition. If X is a nonnegative integer-valued random variable, its
probability generating function gX (u) is de…ned by

X
1
gX (u) = P ([X = n])un = E(uX )
n=0

for all values of u 2 [ 1; 1]. Prove that


(ii) gX (1) = 1,
0
(ii) if the expectation of X exists, then E(X) = gX (1).

3.4 Variances of Some Special Distributions. In section 3.3 we


introduced the notions of moments and central moments. A special case of
central moment was singled out, namely that which is usually referred to as
the variance of a random variable. This is an important concept in probability
and statistics. It is a natural measure of the dispersion of probability mass
along the real line, and in mechanics it is called the moment of inertia. Our
purpose in this section is to establish a few basic properties of variance and to
derive the formulas for the variances of six important discrete distributions.
As a reminder, let us recall that the variance of a random variable X, V ar(X),
is de…ned by V ar(X) = E((X E(X))2 )
Theorem 1. If X is a random variable with …nite second moment, then
V ar(X) = E(X 2 ) (E(X))2 .
Proof: By the corollary to theorem 1 in section 3.3, the existence of
E(X 2 ) implies the same for V ar(X). Using results already established for
expectation we have

V ar(X) = E((X E(X))2 )


= E(X 2 2E(X)X + (E(X))2 )
= E(X 2 ) 2(E(X))2 + (E(X))2 )
= E(X 2 ) (E(X))2 ,

which yields or formula.

80
Theorem 2. If X is a random variable with …nite second moment, and
if C is any constant, then V ar(CX) = C 2 V ar(X).
Proof: By already established properties of expectation, we see that
E((CX)2 ) = E(C 2 X 2 ) = C 2 E(X 2 ) and E(CX) = CE(X). By these re-
sults and by theorem 4,
V ar(CX) = E((CX)2 ) (E(CX))2
= C 2 E(X 2 ) C 2 (E(X))2
= C 2 V ar(X) .
Theorem 3. If X is a random variable with …nite second moment,
and if C is any constant, then X + C has a …nite second moment, and
V ar(X + C) = V ar(X):
Proof: If X has …nite second moment, then E(X 2 ) 2CE(X) + C 2 is a
…nite well-de…ned number. But E(X 2 ) 2CE(X) + C 2 = E(X 2 + 2CX +
C 2 ) = E((X E(X))2 ) = V ar(X). The second conclusion follows directly
from the de…nition of variance and the fact that E(X + C) = E(X) + C.
Now let us derive formulas for variances of the six important discrete
distributions.
Example 1. The binomial distribution. Suppose X is a random variable
with the binomial distribution, i.e.,
n k
P ([X = k]) = p (1 p)n k
for k = 0; 1; 2; ; n.
k
We shall compute V ar(X). In section 3.1 we found that E(X) = np. Now
P
n
n
E(X 2 ) = k2 k
pk (1 p)n k
k=0
P
n
n
= p (k(k 1) + k) k
pk (1 p)n k
k=1
nP1
n 1
= np (k 1) k 1
pk 1 (1 p)n k
k 1=0
nP1
n 1
+np k 1
pk 1 (1 p)(n 1) (k 1)
.
k 1=0

Now the …rst of the two sums in this last expression is the expectation of
a random variable whose distribution is Bin(n 1; p) whose expectation is
(n 1)p. The second sum is just (p + 1 p))n 1 = 1. Thus
E(X 2 ) = np(n 1)p + np = (np)2 np2 + np .

81
Using theorem 1 we obtain V ar(X) = (np)2 np2 + np (np)2 = np(1 p) .
Example 2. The geometric distribution. We now …nd the variance of
a random variable, X, whose distribution is geom(p), i.e., P ([X = n]) =
(1 p)n 1 p for n = 1; 2; , where 0 < p < 1. By theorem 2,
P
1 P
1
E(X 2 ) = n2 (1 p)n 1 p = n((n 1) + 1)(1 p)n 1 p
n=1 n=1
P
1 P
1
= (1 p)p n(n 1)(1 p)n 2
+p n(1 p)n 1 .
n=2 n=1

Since 0 < 1 p < 1, the two series in this last expression can be expressed
as term by term second and …rst derivatives respectively, and thus
d 2 P
1
d
P
1
E(X 2 ) = (1 p)p dp2 (1 p)n p dp (1 p)n
n=0 n=0
d2 1 d 1
= (1 p)p dp 2 1 (1 p)
p dp 1 (1 p)
2
= (1 p) p2 + p p12
= 2p2p .
By example 2 in section 3.1, E(X) = p1 . Thus, by theorem 1,
V ar(X) = E(X 2 ) (E(X))2
= 2p2p p12 , or
V ar(X) = 1p2p .
Example 3. The Poisson distribution. The discrete density of the Poisson
distribution is given by
n
P ([X = n]) = e for n = 0; 1; 2; .
n!
In example 3 in section 3.1, we found that E(X) = . In order to …nd
the formula for V ar(X); we shall …nd E(X 2 ) and then apply theorem 1. By
theorem 2, by the same di¤erentiating techniques
P1 n as in the previous example,
z
and remembering the expansion e = n=0 z =n! , we have
P1 n
E(X 2 ) = n2 e n!
n=0
P
1 n
=e n((n 1) + 1) n!
n=0
P1 n P
1 n
=e n(n 1) n! + e n n!
n=2 n=1
2 d2 d 2
=e d 2
e +e d
e = + .

82
Thus, by theorem 1, we have V ar(X) = .
We next wish to derive formulas for the variances of distributions of the
sample survey sum distribution and its special cases. For this we need the
following lemma.
Lemma 1. If x1 ; x2 ; ; xN are a …nite sequence of not necessarily
distinct numbers, if 2 n N , and if f (x; y) is any function of two
variables, then
( )
X X N 2 X
f (xiu ; xiv ) = f (xu ; xv ) .
1 i < <i N 1 u<v n
n 2 1 u<v N
1 n

Proof: Let u; v be integers that satisfy 1 u < v N . Then f (xu ; xv )


can appear in exactly Nn 22 sums of the form
X
f (xiu ; xiv ) ,
1 u<v n

which yields the above result.


Example 4. The sample survey sum distribution. Let X denote the sum
of a simple random sample of size n taken without replacement from the N
not necessarily distinct numbers
P x 1 ; x2 ; ; xN . In section 3.1 we obtained
E(X) = nx, where x = N1 N x
j=1 j . By the lemma from section 3.1 and by
the lemma above, we have
!2
P P
n
1
E(X 2 ) = xij N
1 i1 < <in N j=1 ( n)

P P 2 1
n
= xij N
( )
1 i1 < <iP n N j=1 Pn
+2 xiu xiv N1
1 i1 < <in N 1 u<v n (n)
N 1 P
( ) N
( ) P
N 2
= nN 1 x2 + 2 nN 2 x x
( n ) j=1 j ( n ) 1 u<v N u v
PN P
= Nn x2j + 2 Nn(n(N 1)
1)
xu xv .
j=1 1 u<v N

83
Now
V ar(X) = E(X 2 ) (E(X))2 8 9
!2
PN < PN P
N =
n n(n 1)
= N
x2j + N (N 1) xj x2j n2 x2
j=1 : j=1 j=1 ;
n(N n) P
N
= N (N 1)
x2j x2 n(N
N 1
n)

j=1 ( )
n(N n) 1 P
N
= N N 1
x2j N x2 .
j=1

Thus ( N )
(N n) 1 X
V ar(X) = n x2j N x2 .
N N 1 j=1

Recalling that
X
N X
N
x2j 2
Nx = (xi x)2 ,
j=1 i=1

we also have

n 1 X
N
V ar(X) = n 1 (xi x)2 :
N N 1 i=1

Example 5. The Wilcoxon distribution. If W is a random variable with


the W ilcoxon(n; N ) distribution, then it is the sum of a simple random
sample of size n taken from the integers 1; 2; ; N without replacement.
We obtained in section 3.1 that E(W ) = n(N + 1)=2 . Recall from freshman
calculus courses that
X
N
N (N + 1) XN
N (N + 1)(2N + 1)
i= and i2 = .
i=1
2 i=1
6

We now apply the formula for variance obtained in example 4, where we take
xi = i for 1 i N , to obtain

N n 1 N (N + 1)(2N + 1) (N + 1)2
V ar(W ) = n N .
N N 1 6 4

84
After a little algebra, we arrive at
1
V ar(W ) = n(N n)(N + 1) .
12
Example 6. The hypergeometric distribution. Let H denote the number
of red balls in a simple random sample of size n selected without replacement
from an urn that contains N balls, of which R are red. The problem is to
R
…nd the formula for V ar(H). We determined in section 3.1 that E(H) = n N .
We now apply the result of example 4, where we let R of the xi ’s equal 1 and
P
N PN
the remaining N R of them equal 0. Then xj = x2j = R and
j=1 j=1

N n 1 R2
V ar(H) = n R N .
N N 1 N2
Again, after a little algebra, we obtain the formula
N n
V ar(H) = n R(N R) .
N (N R)
Example 7. The uniform[0; 1] distribution. Let X be a random variable
that is uniformly distributed over [0; 1], i.e.,

1 if 0 x 1
fX (x) =
0 if x < 0 or x > 1.

Then Z Z
1 1
1
E(X) = xfX (x)dx = xdx = .
1 0 2
Also, Z Z
1 1
2 2 1
E(X ) = x fX (x)dx = x2 dx = .
1 0 3
Thus
1
V ar(X) = E(X 2 ) . (E(X))2 =
12
Example 8. The exponential distribution. This absolutely continuous
distribution function has a density given by

e x if x 0
fX (x) =
0 if x < 0,

85
where > 0 is a positive constant. By example 8 in section 3.1, E(X) = 1= .
Now R1 2 R1 2
E(X 2 ) = 1 R x fX (x)dx = 0
x e x dx
1
= 12 0 y 2 e y dy = 22 .
Thus,
2 1 1
V ar(X) = 2
( )2 = 2 .

EXERCISES

1. Let X be a random variable with discrete density fX (k) = 13 for


k = 1; 2 and 3. De…ne h(x) = E((X x)2 ).
(i) Express h in the form h(x) = ax2 + bx + c, and solve for a; b; c.
(ii) Compute E(X) and V ar(X).
(iii) Find the minimum value of h(x), and …nd the value of x at which
this minimum value is achieved.
2. Prove: if X is a random variable with …nite second moment, then
the smallest value that the function h(x) = E((X x)2 ) can achieve is
h(E(X)) = V ar(X). p
3. Prove:pif X is a random variable with …nite variance, then E(X 2 )
E(X) E(X 2 ).
4. Compute the variance of a random variable Z that has the uniform
distribution over [0; 1].
5. Compute the variance of a random variable W with absolutely contin-
uous distribution with density
1
6
if 14 w 20
fW (w) =
0 otherwise:
6. Let U and V be random variables with absolutely continuous distrib-
utions, and suppose that
fU (t) = fV ( t) for all real t.
Prove that E(U ) = E(V ) and V ar(U ) = V ar(V ):
7. Find the expectation and the variance of a random variable Z that
has an absolutely continuous distribution with density given by
1 2
2
z e z if z 0
fZ (z) = .
0 if z < 0.

86
8. Find the expectation and variance of a random variable W , the sum
of the upturned faces resulting from tossing a pair of fair dice.
9. Prove that if X is a non-constant random variable whose expectation
exists, then P ([X < E(X)]) > 0.

87

You might also like