Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

LESSON 4: MEAN AND VARIANCE OF A DISCRETE RANDOM VARIABLES

 Recall the lesson 2 for this topic (on Probability Distribution of Discrete Random Variables).
The table below is the lists of the distribution of the number of heads in tosses of three fair
coins (or three independent tosses of one fair coin). The third column will be the product of
the entries of the first and second columns, (X)P(X).

X = number of heads P(X) (X)P(X)


0 1/8 0
1 3/8 3/8
2 3/8 6/8
3 1/8 3/8
Total 12/8 = 1.5

Definition: given a discrete random variable X, the mean, denoted by µ, is the sum of the products
formed from multiplying the possible of X with their corresponding probabilities. It is
called the expected value of X, and given a symbol E(X).
More formally:

µ = 𝑬(𝑿) = ∑ 𝒊 𝑷(𝑿 = 𝒊)
𝒊
 Recall that the empirical probabilities tend toward theoretical probabilities and, in
consequence, the mean is also a long-run average. This can be observed from the results
of the activity. As the number of trials of a statistical experiment increases, the empirical
average also gets closer and closer to have the value of the theoretical average. This is
why we can interpret the mean as a long-run average.
If the three coins tossed (and if all coins are fair), then it would have eight outcomes –
HHH, HHT, HTH, HTT, THH, THT, TTH, TTT. And if they would repeat tossing these coins
8000 times, expect 1000 tosses of each of the outcomes and, thus, the expected frequency
of 3 heads would be 1000; of 2 heads, would be 3000 tosses; of 1 head, would be 3000
tosses; and no heads, would be 1000 tosses. If we average these, we would have
(1000)(3) + (3000)(2) + (3000)(1) + (1000)(1)
= 1.5
8000
Although the Mean is called “expected value”, this should not be interpreted as the actual
expected result when they do an experiment. In the example for tossing three coins, the
mean is 1.5. Point out that when you toss three dice, you cannot get 1.5 heads out of it.
This indicates that the Mean is not necessarily a possible value of the random variable.
So you cannot simply say that the Mean is what you expect to be the number of heads
when you toss three coins. Rather, it is to be interpreted as a long-run average. Also,
Mean is the value that we expect the long-run average to approach and it is not the value
of the random variable X that we expect to observe.
Next, recall that the average of a given set of data is a measure of central tendency. The
expected value-being and average-measures the center of the distribution of the possible
values of X.
The mean of a (discrete) random variable X can be given as a physical interpretation.
Suppose we imagine that the x-axis is an infinite see-saw in a=each direction, and at each
possible value of X, we place weights equal to the corresponding probabilities. Then, the
mean is at a point which will make the see-saw balanced. In other words, it is at the centre
of gravity of the system.

 Examples of Finding and Interpreting the Mean and Variance


I. Example with biased dice
 Recall that the biased six-sided dice with a probability distribution for X, the number of
spots on the upward face when the die is rolled, given as follows:

I 1 2 3 4 5 6
P(X=i) 1−𝜃 1−𝜃 1−𝜃 1+𝜃 1+𝜃 1+𝜃
6 6 6 6 6 6

The expected value of the distribution may be calculated as follows:


Solution:

µ = 𝑬(𝑿) = ∑ 𝒊 𝑷(𝑿 = 𝒊)
𝒊
1−𝜃 1−𝜃 1−𝜃 1−𝜃 1−𝜃 1−𝜃 1−𝜃 𝟕−𝟑𝜽
= + 2( ) + + 2( ) + 3( ) + 4( ) + 5( ) + 6( )=
6 6 6 6 6 6 6 𝟐
If q = 0, then this reduces to a fair dice, for which, we would have a long-run average of 3.5
for the number of spots on the upward face.
II. Practical Example Used in Insurance
An insurance company sells life insurance of Php500,000 for a premium or payment of
Php10,400 per year. Actuarial tables show the probability of normal death in the year
following the purchase of this policy is 0.1%. What is the expected “gain: for this life
insurance policy?
There are two simple events here. Either the customer will live this year or will die (a
normal death). The probability of normal death, as given by the problem, is 0.001, which
will yield a negative gain to the insurance company in the amount of -489,600 = Php10,400
– Php500,000. The probability that the customer will live is 1 – 0.001 = 0.999. Thus, the
insurance company’s expected gain X from this insurance policy in the year after the
purchase has the following probability distribution:

Gains Outcome Probability


10,400 Live 0.999
-489,600 Normal Death 0.001
Solution:
m = (10,400)(0.999) + (-489,600)(0.001) = 9,900
 Take note that if the insurance company were to sell a very large number N of the
Php500,000 insurance policies to many people, with the long-run average profit per
insurance policy is Php9,000, the company would be expected to make a total profit of N
times Php9,900.
Next, whether a measure of central tendency is the only relevant summary measure. You
should remember that for a set of data, you also need other summary measures, such as
measures of variability. You have already met the concepts of variance and standard
deviation when summarizing data. Remember that the variance and standard deviation of a
set of data are measures of spread. Random variables also have a variance (and a standard
deviation). The variance is derived by getting the expected value of (X - µ)2 where µ is the
Mean.
To illustrate, go back to the Table on flipping three coins and get the number X of heads in
these three coins. Now, add the two columns with the following heading in bold (See below).

X = number of
P(X) (X)P(X) (X - µ)2 (X - µ)2P(X)
heads
0 1/8 0 (0 – 1.5)2 = 2.25 0.28125
1 3/8 3/8 (1 – 1.5)2 = 0.25 0.09375
2 3/8 6/8 (2 – 1.5)2 = 0.25 0.09375
3 1/8 3/8 (3 – 1.5)2 = 2.25 0.28125
Total 12/8 = 1.5 0.75

The total in the last column is called the variance of the random variable, and the square
root, 0.866, is the standard deviation.
Now, define the variance of a random variable as the weighted average of squared
deviations of the values X from the mean, where the weights are the respective probabilities.
The variance, usually denoted by the symbol 𝜎2, is also denoted as Var(X) and formally
defined as

𝝈𝟐 = 𝑽𝒂𝒓(𝑿) = 𝑬(𝑿 − µ)𝟐

= ∑(𝒊 − µ)𝟐 𝑷(𝑿 = 𝒊)


𝒊
The variance gives measure of how far the values of X are from the mean. In nontrivial cases
(i.e. when there is more than one possible distinct value of X), the variance will be a positive
value. The bigger the value of the variance, the farther the values of X get from the mean.
Define the standard deviation as the square root of the variance of X. That is,

𝝈 = √𝑽𝒂𝒓(𝑿)
III. Example of Gains in Life Insurance
Show the following calculations for “deviations” formed from subtracting the mean from the
gains, as well as squared deviations, and weighted squared deviations.

Gains Probability Deviations Squared Weighted


Deviations Squared
Deviations
10,400 0.999 107925794 107817868
-489,600 0.001 240556.92 240.55692

Solution:
The variance is the sum of the entries on the last column, i.e.,
s2 = 107817868 + 240.55692 = 107818108
while the standard deviation is the square root of the variance
s = 10383.55
Remember that the standard deviation is the more understandable of the two measures of
spread, since the standard deviation is in the same units as X. for example, if X is a random
variable representing the number of heads in three tosses of a fair coin, then the units for a
standard deviation is “heads”, while the variance is in square heads (heads 2).
Unlike the mean, there is no simple interpretation for the variance or standard deviation. In
relative terms, variance,

 a small standard deviation (and variance) means that the distribution of the random
variable is quite concentrated around the mean
 a large standard deviation (and variance) means that the distribution is rather
spread out, with some chance of observing values at some distance from the mean
Variance is not computed with the definition, but rather using the following result:

𝝈𝟐 = 𝑽𝒂𝒓(𝑿) = 𝑬𝑿𝟐 − µ𝟐
Thus, the variance is the difference between the expected value of X2 and the square of the
mean.

Note: This can be derived from the definition, some algebraic expansion of a binomial
expression, and some properties of expected values (such as the mean of a constant is
the constant):

𝝈𝟐 = 𝑽𝒂𝒓(𝑿) = 𝑬(𝑿 − µ)𝟐


= 𝑬(𝑿𝟐 − 𝟐µ𝑿 + µ𝟐 ) = 𝑬𝑿𝟐 ) − 𝟐µ𝑬(𝑿) + 𝑬(µ𝟐 )
= 𝑬(𝑿𝟐 ) − 𝟐µ𝟐 + µ𝟐 = 𝑬𝑿𝟐 − µ𝟐
It is suggested though that this derivation not be discussed. It may be helpful though to use
the computational formula, and to use computers whenever possible.

KEY POINTS:

 The mean (or expected value) of a discrete random variable, say X, is a weighted
average of the possible values of the random variable, where the weights are the
respective probabilities

µ = 𝑬(𝑿) = ∑ 𝒊 𝑷(𝑿 = 𝒊)
𝒊
 The variance is the expected value of the squared deviations from the mean.

𝝈𝟐 = 𝑽𝒂𝒓(𝑿) = 𝑬(𝑿 − µ)𝟐

= ∑(𝒊 − µ)𝟐 𝑷(𝑿 = 𝒊)


𝒊

 The standard deviation is the square root of the variance

𝝈 = √𝑽𝒂𝒓(𝑿)

MORE ABOUT MEANS AND VARIANCE


 In previous lesson (above), in adding or subtracting a constant from data shifts the mean but it
does not change the variance and standard deviation. This is also the case for random variables,
E(X ± c) = E(X) ± c
Var(X ± c) = Var(X)
If a teacher decides to give extra points to everyone in an exam, the average in the exam increases by
the number of extra points given by the teacher, but the variability of the new increased scores stays
the same.
If a company (or the government) decides to double the income of its employees, this would double
the average income, and increase it by four times the variability in income. (The latter is the reason
why government should be careful of doubling incomes, as this would increase income inequality).
You may have observed that multiplying or dividing data by a constant changes both the mean and
the standard deviation by the same factor. Variance, being the square of standard deviation, would
be affected even more, by the square of this constant. This is also the case for random variables.
E(aX) = a E(X)
Var(aX) = a2 Var(X)
When the random variable X denoted the number of heads which will occur when a fair coins is
tossed three times. It is natural to view X as the sum of the random variables X 1, X2, X3, where X1 is
defined to be 1 if the ith toss comes up heads, and 0 if the ith toss comes up tails. The expected values
of the X1’s are extremely easy to compute. It turns out that the expected value of X can be obtained by
simply adding the expected values of the X1’s.
To make it simple, consider a case of two independent random variables, X and Y. The expected value
of the sum of independent random variables X and Y is the sum of the expected values:
E(X + Y) = E(X) + E(Y)
while the expected value of the difference of X and Y is the difference of the expected values:
E(X - Y) = E(X) - E(Y)
How about the variance? If the random variables are independent, then there is a simple Addition
Rule for variance (for a sum of random variables):
Var(X + Y) = Var(X) + Var(Y)
What about the variances of a difference? Variance also adds up for a difference of random variables:
Var(X - Y) = Var(X) + Var(Y)
Variances are added for both the sum and difference of two independent random variables because
the variation in each variable contributes to the variation in each case. The variability of the increases
as much as the variability of sums.
To illustrate this notion about sums (or differences of random variables),
EXAMPLE 1: Consider a team of four swimmers that are supposed to perform 4 medley relay events
and swimming 100 meters. The swimmer’s performances are independent, having the
following means and standard deviations of the times (in seconds) to finish 100 meters.

Swimmer Mean Standard


Deviation
1 (freestyle) 45.02 0.20
2 (butterfly) 50.25 0.26
3 (backstroke) 51.45 0.24
4 (breaststroke) 56.38 0.22

Solution:
You should be able to obtain the mean of the team’s total time in the relay as the sum of the means
45.02 + 50.25 + 51.45 + 56.38 = 203.1 seconds,
with a variance equal to the sum of the variances, i.e.
0.202 + 0.262 + 0.242 + 0.222 = 0.2136,
so that the standard deviation is the square root of 0.2136 = 0.46 seconds. The best time of 201.62
seconds is 3.2 standard deviations below the mean, thus it would be very likely for the team to swim
faster than this best time.
EXAMPLE 2: The crucial assumption is independence of the random variables. Suppose the amount
of money you spent for lunch is represented by the random variable X, and the amount
of money of the same group spends on afternoon snacks is represented by variable Y.
the variance of the sum X + Y is not the sum of the variances, since X and Y are not
independent random variables.
Consider tossing a fair coin 10 times: What would be the number of heads expected?
The answer is 5.
Solution:
Define X as 1 if the ith toss comes up heads, and 0 if the ith toss comes up tails, and assuming in general
that the coin has a chance p of yielding heads (with p = 1/2 when the coin is fair), then the probability
mass function for Xi is

X 0 1
P(X = x) 1–p p

For all values of i = 1, up to 10 (or whatever number of tosses we make).


Here the mean of Xi is
E(X) = 0(1 – p) + 1(p) = p,
while the variance od X is
E(Xi2) – (E(Xi))2 = (02)(1 – p) + (1)2(p) – p2 = p(1 – p)
 Recall in the previous topic that for tossing a fair coin (where p = ½) three times, the expected
value of the number of heads is 1.5. We can also derive this as:
E(X1 + X2 + X3) = E(X1) + E(X2) + E(X3) = (1/2) + (1/2) + (1/2) = 1.5
While the variance was 0.75, and we can get this of
E(X1 + X2 + X3) = Var(X1) + Var(X2) + Var(X3) = (1/2)(1/2) + (1/2)(1/2) + (1/2)(1/2) = 0.75
For tossing a fair coin ten times, the expected value of the number of heads is
E(X1 + X2 + X3) = E(X1) + E(X2) + E(X3) + … + E(X10) = 10(1/2) = 5
while the variance here is
Var(X1 + X2 + X3 + … + X10) = Var(X1) + Var(X2) + Var(X3) + … + Var(X10) = 10(1/2)(1/2) = 5/2
= 2.5
and thus a standard deviation of approximately 1.58.
Using Chebyshev’s Inequality, we know that when tossing a fair coin ten times (and repeating this
coin tossing process many, many times), at least three fourths of the time, we would the number of
heads range between 5 heads (the expected value) and, give or take, 3 heads (3 = 2 times the standard
deviation 1.58)
In general, when we have a sequence of independent random variables X1, X2, X3, …, Xn, with a
common mean m, and a common standard deviation s, then the sum
𝒏
𝑺=∑ 𝑿𝟏
𝒊=𝟏

will have an expected value of (n m) and a variance of (n s2).


EXAMPLE: If we were to toss a fair coin 100 times, then expected value of the number of heads
obtained is 100(1/2) = 50, while the variance is 100(1/2)(1/2) = 25.
Solution: According to Chebyshev’s Inequality, at least three fourths of the distribution of the
number of heads in 100 tosses of a fair coin is within 50 – 2(5) = 40 heads to 50 + 2(5)
= 60 heads.
For tossing a coin n times where the probability of getting a head is p, if S is the number of heads,
then
E(S) = n(p) while Var(S) = n(p)(1 – p)
REMINDER: Variances of independent random variables are the ones that add up (not
the standard deviations: variances have squared units, so the intuition
here is underlying use of the Pythagorean Theorem; the square of the
hypotenuse is the sum of the squares of the legs). In addition, variances of
independent random variables add even when we are considering
differences between them.
KEY POINTS

 Adding or subtracting a constant from the distribution of a random variable X


shifts the mean E(X)) by the constant but it does not change the variances

E(X ± c) = E(X) ± c
Var(X ± c) = Var(X)
 Multiplying or dividing distribution of X by a constant changes the mean by a
factor equal to the constant, and the variability by the square of the constant.

E(aX) = a E(X)
Var(aX) = a2 Var(X)
 The expected value of the sum (difference) of independent random variables X
and Y is the sum (difference) of the expected values.

E(X ± Y) = E(X) ± E(Y)


while the variance of the sum (difference) of the random variables is the sum of
the variances.

Var(X ± Y) = Var(X) ± Var(Y)


APPLICATION AND ASSESSMENT
Do the following:
1. The probability distributions of four random variables are shown below
For each of these probability distributions:
a. Confirm that the graph represents a probability distribution
b. Guess what the means are.
c. Compute the actual value of the mean.
d. Provide a guess on which has the smallest variance among the distributions, and
the one with the largest variance.
e. Calculate the variance and standard deviation.

2. A grade 12 student use the Internet to get information on temperatures in the city where he
intends to go for college. He finds information in degrees Fahrenheit.
Determine the summary statistics equivalent in Celsius scale given °C = (°F – 32)(5/9)

Maximum = 82.4 Range = 23.4 Median = 71.6


Mean = 73.4 Standard Deviation = 7.2 IQR = 10.8

3. Suppose that in a casino, a certain slot machine pays out an average of Php15, with a standard
deviation of Php5000. Every play of the game costs a gambler Php20.

a. Why is the standard deviation so large?


b. If your parent decides to play with this slot machine 5 times, what are the mean
and standard deviation of the casino’s profit?
c. If gamblers play with this slot machine 1000 times in a day, what are the mean
and standard deviation of the casino’s profit?

You might also like