Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Advanced Statistics

Discrete Random Variables

Economics
University of Manchester

1
Discrete Probability
Distributions
Axioms (Rules) of probability tell us how to combine
and use probabilities
where do probabilities come from?

We develop MODELS of PROBABILITY


Statistical Models

Discrete variables this session;


Continuous random variables: subsequent
sessions

2
Random Variables
Example: Flipping a coin 3 times
The SAMPLE SPACE is
W={(H,H,H), (H,H,T), (H,T,H), (T,H,H), etc...}
Eight possible events
Each EQUALLY LIKELY, with probability of 1/8

But may be interested in the NUMBER OF HEADS

By convention, a random variable is indicated by


an UPPER CASE letter (X, Y, Z and so forth)
A specific outcome is denoted by the corresponding
LOWER CASE letter (x, y, z, etc)
3
Statistical models
(mathematical functions)

p(x) (3) assign probabilities


(1) sample outcomes

x S
(2) random variable of interest x
4
Discrete random variables I
X = number of HEADS obtained

X(H,H,H) = 3
X(H,H,T)=X(H,T,H)=X(T,H,H) = 2
and so forth . . .
X(T,T,T) = 0

Values taken by X are: x = 0, 1, 2 or 3

Example of a DISCRETE RANDOM VARIABLE

5
Discrete random variables II
The possible outcomes for a discrete random
variable, X, are isolated values

Often only integer values (…, -2, -1, 0, 1, 2, …)

A specific value is denoted x

Probabilities are assigned to possible x values


p(x) = Pr(X=x), for any value x
e.g., p(1) = Pr(X=1)
p(x) is a mathematical function
p(x) must be bounded between 0 and 1

6
Probability mass function (pmf)
Function p(x) that assigns probabilities to each
possible (discrete) outcome x is a PROBABILITY
MASS FUNCTION
Or “probability distribution”
Each value p(x) must be a valid probability
Probability Mass Function

0.4 0.375 0.375

0.3
Probability

0.2
0.125 0.125
0.1

0
01 12 23 34
Number of HEADS
7
Properties of a pmf
Probability Mass Function

0.4 0.375 0.375

0.3

Probability
0.2
Properties of p( x ): 0.125 0.125

p( x )  0, for all values of x;


0.1


x
p( x ) = 1, 01 12 32
Number of HEADS
4
3

where the sum is taken over all possible values of X

p(r)=0  value r is NOT one of


the possible outcomes

8
Example
A company claims 95% of its light-bulbs will last
longer than 10,000 hours
From a large consignment of light-bulbs, 3 are
randomly chosen
Of these 3 light-bulbs, what is the probability that
only ONE will last longer than 10,000 hours?
Let X = “number of light-bulbs which last longer
than 10,000 hours”
Possible x = 0, 1, 2, 3
Require Pr(X=1).

9
Example contd.
The probability that any one light bulb lasts longer than
10,000 hours is 0.95
Pr(Failure)=0.05 Pr(Success)=0.95
The three light-bulbs are chosen independently

x = 1, if we observe any of the following mutually


exclusive events:
F, F, S or F, S, F or S, F, F
Pr (F  F  S) = Pr(F) x Pr(F) x Pr(S); independence
Pr(X=1) = Pr(F  F  S)+Pr(F  S  F)+Pr(S  F F)
= 3 x (0.05 x 0.05 x 0.95)
= 0.007125

10
Example pmf
Pr(X=1) = 0.007125
Pr(X=2) = 3 x (0.05 x 0.95 x 0.95)
= 0.135375
Pr(X=3) = 0.95 x 0.95 x 0.95 = 0.857375
Pr(X=0) = 0.05 x 0.05 x 0.05 = 0.000125
these FOUR probabilities add to 1, and they define
the pmf
example of a BINOMIAL RANDOM VARIABLE
A binomial random variable is the “number of
successes in n independent experiments”

11
Tabulating the pmf
If entire pmf is required, often best to present in
tabular form:
x Pr (X = x)
0 0.0001
1 0.0071
2 0.1354
3 0.8574
1.0000

12
Cumulative distribution
function (cdf)
The cdf is defined as P ( x) = Pr( X  x)
Previous example:
Pr(X  1) = Pr(X=0) + Pr(X=1)
= 0.000125 + 0.007125
= 0.00725

Pr(X  2) = Pr(X=0) + Pr(X=1) + Pr(X=2)


= 0.142625
= 1 - Pr(X=3)

Pr(X > 1) = 1 - Pr(X  1)


= 0.99275

13
Mean of a random variable
The mean of a random variable is analogous to
the sample mean
But computed using probabilities of outcomes
Refer to the “mean of X” or “mean of the
distribution” or “population mean”
Notation: mean of a random variable  = E[X]
the EXPECTED value of X
Note upper case X in E[X]
Population mean is value anticipated “on
average”

14
Mean of a discrete random
variable

sum is taken over
E [ X ] =  =  x Pr( X = x );  all possible
x 
 values of x
• Example: heads in 3 coin tosses
3
E X  =  x Pr( X = x) BINOMIAL DISTRIBUTION

x =0 0.4

( ) ( ) ( ) ( )
= 0  18 + 1 83 + 2  83 + 3  18

probability mass
0.3

= 18 (0 + 3 + 6 + 3) 0.2

0.1

=1 1
2
0 1
x
2 3

X = number of heads 15
Variance of a random variable
Variance of the random variable X, var[X],
characterises the SPREAD of the distribution

Variance of a random variable is a positive value

Analogous to the sample variance, but relates to all


possible outcomes

var[X] = σ2 is the expected squared distance of X


from the mean  = E[X]:

var[ X ] =  = E[( X −  ) ]  0
2 2

16
Variance of a discrete random
variable
var[ X ] =  =  ( x −  ) Pr( X = x )
2 2

Example: Number of heads in 3 coin tosses


E[X] =  = 1½
3
var[ X ] =  ( x − 1.5) 2 p ( x)
x =0

1 3
= (0 − 1.5)  + (1 − 1.5) 
2 2

8 8
3 1
+ ( 2 − 1.5)  + (3 − 1.5) 
2 2

8 8
3
=
4 17
Standard Deviation
Standard deviation defined as:
 =+  2

Also a measure of spread


Always positive
Same units of measurement as original
variable
easier to interpret than variance
 = (3/4)=0.866 for coin tossing example

18
Calculating the variance

Var[X] can be calculated as:



VarX  = E ( X −  )
2

 
= E X 2 − 2
= E X − E X  0
2 2

Will be shown later


Often easier to compute variance using latter
form

19
Calculating variance: example
Number of heads in 3 coin tosses
x p(x) x p(x) x 2 p( x)
0 0.125 0.000 0.000
1 0.375 0.375 0.375
2 0.375 0.750 1.500
3 0.125 0.375 1.125
Total 1.000 1.500 3.000

Var[X] = E[X2] – μ 2
= 3.0 – (1.5)2 = 0.75, as before
20
Bernoulli distribution
A single experiment, outcomes Y=1 (success) &
Y=0 (failure)
probability of “success” π & “failure” (1 – π)
For example: Toss a coin once
Success = head; π = 0.5
pmf can be written: p ( y ) =  y (1 −  )1− y
1
E[Y ] =  y p ( y )
y =0

= 1  + 0  (1 −  ) = 
21
Bernoulli distribution: variance
var[Y ] =  = E[(Y −  ) ]
2 2

1
=  ( y −  ) 2 p( y )
y =0

= (0 −  ) 2  (1 −  ) + (1 −  ) 2  
= (1 −  )[ 2 +  (1 −  )]
=  (1 −  )
π → 1 or π → 0
Variance is smaller for
Largest variance when π = 0.5

22
Binomial distribution
Perform a Bernoulli experiment n times and monitor the
X =TOTAL NUMBER of SUCCESSES

The probability of success in any one experiment is .


Then in n experiments:
n x
p( x) = Pr( X = x) =   (1 −  ) n − x ; 0    1; x = 0, 1, ..., n
 x
where
n n!
  = ; with n!= n  (n − 1)  (n − 2)  ...  2 1;
 x  x!(n − x)!
0!  1
23
Binomial distribution:
mean & variance
A binomial variable X has
E[ X ] = n 
var[ X ] = n (1 −  )

For example, with n = 5 &  = 0.2


E [ X ] = n  = 5  0 . 2 = 1 .0
var[ X ] = n (1 −  )
= 5  0 . 2  0 .8 = 0 . 8

Mean & variance depend on the parameters n & 

24

You might also like