DiscreteDistributions SAKAI

Mathematical Statistics 1
STAT 1003
School of Statistics and Actuarial Science
Wits University
E-mail: Herbert.Hove@wits.ac.za
DISCRETE RANDOM VARIABLES AND PROBABILITY
DISTRIBUTIONS LECTURE SERIES
Devore & Berk: Pages 94 – 137
2|Page
1.0 Introduction
To this end, we have seen how to relate events to sets and how to calculate probabilities
by working with the sets that represent them. Often, we are NOT interested in the
particular outcome of an experiment that occurs, but rather in some number associated
with that outcome.
Illustration:
Suppose one plays some game which involves tossing a fair coin 3 times. For each
“heads”, they win $1 and for each “tails”, they lose $1. The player is only interested in
their net winnings AND NOT in the particular sequence of heads and tails that constitute
the result of 3 tosses.
It is important to distinguish between random variables and the values they take.
 Notationally, we use upper case letters such as X, Y, … for random variables and
lower case letters such as x, y, … for the actual values that the random variables
may assume.
The expression  X  x  is the set of all points in S assigned the value x by the random
variable X.
 It is therefore meaningful to talk about the probability that X takes on the value x,
denoted by P  X  x 
Definition: A random variable (rv, henceforth), say X is a real valued function whose
domain is the sample space and whose range is the set of real numbers.
Remark: Random variables are defined in such a way that if the experiment is
performed, then one and only one of the outcomes, say “ X  x ” occurs.
 That is, possible values of X form a partition of the points in S for the experiment
under consideration
3|Page
In the game involving the experiment of tossing a fair coin 3 times, Let
X  " net winnings " . Each of the outcomes “ X  x ” represents an event and hence, a
real number can be associated with each point is S.
1.1 Discrete Random Variables

A rv X is said to be discrete if it can assume only a finite number or a countably infinite
number of possible distinct values.
Definition: The probability that X takes on the value x, denoted P( X  x) , is
defined as the sum of the probabilities of all sample points in S that are assigned the
value x. Often, P( X  x) is denoted by p( x)
p( x) is a function that assigns probabilities to each value x of the rv X, and is often

called the probability mass function for X.
Definition: If X is a discrete rv, then the function
p( x)  P( X  x)
is called the probability mass function (pmf) of X.
The pmf can be presented by a table, a graph (line or bar graph) or a formula that
provides p( x)  P( X  x) for all x
Computing a pmf:
Calculating the pmf for a rv X is conceptually straightforward. For each possible value of
X:
 Collect all possible outcomes for which X is equal to x
 Add their probabilities to obtain p(x)
 Repeat for all x
Exercise:
A local movie store periodically puts aside its used movies and offers to sell them at a
reduced price. Twelve copies of a popular movie have just been put aside, but 3 of these
are damaged. A customer randomly selects two of the copies for gifts. Let X be the
number of damaged movies the customer selected. Find the pmf for X and graph the
function.
4|Page
1.2 The Cumulative Distribution Function
Sometimes, we study the behaviour of rvs by looking at the cumulative probabilities.

That is, for any rv X, the cumulative probability for X evaluated at x is defined as:
FX ( x)  P( X  x)
If X is discrete, the cumulative distribution function is
FX ( x)  
all xi  x
p( xi )
where p(x) is the pmf.
1.21 Connection Between F(x) and p(x)

Let X be a discrete rv whose possible values are x1, x2, … where
x1< x2< …. For any real value x, if x  x1 , then FX ( x)  0 , and for x  xi ,
FX ( x)   p( x )
xi  x
i  p( xi )  FX ( xi )  FX ( xi 1 ), i  1, 2,...
with, for completeness, p( x1 )  FX ( x1 ) .
Illustration:- The rv X denoting the number of defective movies the customer selected
from the video store as defined earlier has pmf
x 0 1 2
p(x) 12 9 1
22 22 22
Since positive probability is associated only for x = 0, 1 and 2, the cumulative
distribution function changes values only at those points.
Exercise:-
Compute
a) P  X  1 b) P  X  1.2  c) P  X  1.9  d) P 1.3  X  2.6 
5|Page
1.22 Properties of Discrete CDF FX
Every distribution function must satisfy the following 3 properties and similarly, any
function satisfying these 3 properties is a distribution function.
1. In the limiting cases;

lim FX ( x)  0 lim FX ( x)  1
x x
2. The function FX () is continuous from the right,
lim FX ( x  h)  FX ( x)
h0
but not continuous on .
3. FX () is a nondecreasing function. That is, if x1<x2, FX ( x1 )  FX ( x2 )
1.3 Expected Values of Discrete Random Variables

The “Frequentist” interpretation of probabilities assumes that if an infinite sequence of
independent replications of an experiment is performed, then for an event E, the
proportion of the time that E occurs will be P(E). This provides a motivation for the
definition of the “Expectation”.
Illustration: You and a friend are matching balanced coins. You each flip a coin. If the
upper faces match, you win $1 (your friend loses $1); if they don’t, you lose $1 (your
friend wins $1). On average, how much will you win per game over the long run?
Definition: Let X be a discrete rv with set of possible values D and pmf p(x). Then the
expected value of X, denoted by E(X) is
E ( X )   xp( x)
xD
assuming the expectation exists. (may not be a possible value)
If p(x) is the accurate characterisation of the population frequency distribution, then

E ( X )   , the population mean.
6|Page
Exercise:
There are two questions in a quiz show. A contestant gets to choose the order to answer
them. If the contestant try question 1 first, s/he will be allowed to proceed to question 2
only if her/his answer to question 1 is correct and vice-versa. If s/he is p1% certain of
answering question 1 worth $V1 correctly and s/he is p2% certain of answering question
2 worth $V2 correctly, which question should s/he chose first?
1.31 Expected Value of a Function:

Suppose that X is a discrete rv with pmf p(x) and that Y is another rv that is a known
function of X. That is Y = g(X) for some function g () . What is the expectation of Y?
Since Y = g(X) is a function of a discrete rv, it is itself a discrete rv. The pmf of g(X) can
be determined from the pmf of X and E[g(X)] can be computed by using the definition of
the expected value
Proposition: Let X be a rv with pmf p(x) and let g(X) be a function of X. Then the
expected value of the rv g(X) is given by
E  g ( X )   g ( xi ) p( xi )
i
assuming the expectation exists.
Often, the function g(X) is a linear function aX + b. Hence a simple logical consequence
of the above Proposition is the following Corollary.
Corollary: If a and b are constants, then

E  aX  b  aE ( X )  b
1.4 Variance
The expected value of a rv gives a measure of the centre of its distribution. It does not
give information about the spread (or variation) of the possible values of the rv.
Formally, variation is often measured by the variance and a related quantity called the
standard deviation.
7|Page
Definition: If X is a rv with mean (or expected value) , then the variance of X is
given by:
Var ( X )  E ( X   )2   2
 Variance can be thought of as the average squared distance between values of X

and the expected value 
 The units associated with variance are therefore the square of the units of
measurement for X
 The variance assumes a value of zero (the smallest possible value) when all the
probability is concentrated at a single point. That is, when the rv, say X takes on a
constant value with probability 1. The variance becomes larger as points of
positive probability spread out more
A computationally convenient alternative formula for the variance of a rv X is
 
Var ( X )  E X 2   E  X 
2
Proposition: For any rv X and constants a and b,
Var (aX  b)  a 2Var ( X )
Standard Deviation: We may prefer to measure spread of X in the same units as X. The
standard deviation is a measure of variation that maintains the original units of measure.
Definition: The standard deviation of a rv X is the square root of the variance. It is given
by
  2  E ( X   )2
 Standard deviation can be thought of as the size of a “typical” deviation between

an observed outcome and the expected value
 Together with the mean, the standard deviation often yields a useful summary of
the probability distribution of a rv
 Rule of thumb:- Almost all the probability mass of a distribution lies within two
standard deviations of the mean
8|Page
Discussion problem:- An oil exploration firm is to drill 10 wells, each well having
probability 0.1 of successfully producing oil. It costs the firm R10 million to drill each
well. A successful well will bring in R500 million. Calculate:
a) The firm’s expected gain from the 10 wells Answer: R400 million
b) The standard deviation of the firm’s gain Answer: R474.34 million
2.0 Discrete Probability Distributions

2.1 Introduction
We now have a good understanding of discrete rvs and their properties. Some random
experiments occur often and their probability distributions have been expressed in a
general form. In this lecture series, we cover a few of the standard families of discrete rvs
which we may encounter when analysing discrete data. They are
 The Discrete uniform distribution
 Bernoulli distribution
 Binomial distribution
 Hypergeometric distribution
 Geometric distribution
 Negative binomial distribution
Discrete distributions play important roles in statistics and our aim is to be able to apply
these simple probability models to real-world problems.
Useful Series in Probability

Many series crop up in the study of probability. Below is a list of a few of the more
commonly encountered series.
n
1  an n
1
i) a i 1
 , 0  a 1 ii)  ia i 1
 , 0  a 1
1 a 1  a 
2
i 1 i 1
n
1 a n n  n  1
iii)  i 2ai1  , 0  a 1 iv) i 
1  a 
3
i 1 i 1 2
n n  n  1 2n  1
v) i
i 1
2

6
9|Page
2.2 The Discrete Uniform Distribution
A rv that can assume, say n different values with equal probability is said to have a
discrete uniform distribution.
Definition: A rv X has a discrete uniform distribution and is said to be a discrete uniform

rv if and only if its probability distribution is given by
1
 , x  x1 , x2 ,...., xn
p X ( x)   n

0, otherwise
where xi  x j when i  j and the parameter “n” ranging over the positive integers.
In the special case where xi  i , the discrete uniform distribution reduces to:
1
 , x  1, 2,...., n
p X ( x)   n

0, otherwise
This applies, for example to the number of points we roll with a balanced die, coin etc.
where the possible values are equally likely each time the experiment is performed
Exercise:
Let X have the discrete uniform distribution
1
 , x  1, 2,..., n
p X ( x)   n

0, otherwise
1. Show that
a) E ( X )  n  1
2
b) Var ( X )  n  1
2
12
2. Assume a die roll (n = 6). Graph the discrete pmf p X ( x) and CDF FX ( x)
10 | P a g e
2.3 The Bernoulli Distribution
A random experiment that has two possible outcomes, “success (S)” and “failure (F)”
with respective probabilities P(S) = p and P(F) = 1-p is called a Bernoulli trial.
For example, an item leaving a manufacturing production line is either defective or good;
a new born child will be either female or male etc.
Let X be a rv which takes on a value 1 if a Bernoulli trial results in a success and 0 if the
same Bernoulli trial results in a failure. Then X has a Bernoulli distribution with
parameter p = P(S) and pmf
 p x (1  p)1 x , x  0,1
p X ( x)   (2.31)
0, otherwise
We write X ~ Bernoulli( p)
Theorem:
Let X ~ Bernoulli( p) . Then the mean and variance of the Bernoulli rv are:
E(X) = p and Var(X) = p(1-p)
 Often, we are not only interested in one outcome of a Bernoulli trial.

 We use the Bernoulli rv as a building block to form other probability distributions
such as the binomial distribution to be discussed next
2.4 The Binomial Distribution

The assumptions leading to the binomial distribution are as follows:
 The experiment consists of a fixed number, n, of identical trials
 Each trial results in one of two outcomes, success (S) or failure (F)
 The probability of success, p, remains the same from trial to trial
 The trials are independent of each other
 The random variable of interest is the number of successes observed for the n
trials
11 | P a g e
Illustration:-
 Consider the Bernoulli rx X whose pmf is given in Equation 2.31
 Now suppose that n such independent Bernoulli trials, each resulting in a success
with probability p and failure with probability (1 – p) are to be performed.
 For example, instead of inspecting a single item leaving the production
line, we independently inspect n items and record values for X i ,
i  1,..., n where X i  1 if the ith inspected item results in a success
and X i  0 otherwise.
n
 The sum Y  X
i 1
i denotes the number of successes among the n
sampled items from the production line. That is, Y is a binomial rv with
parameters (n, p)
 We write Y ~ Bin(n, p) . Accordingly, the Bernoulli rv X is a binomial rv with
parameters (1, p)
Definition: A rv Y has a binomial distribution and is referred to as a binomial rv if and

only if its probability distribution is given by
 n  y n y
  p (1  p) , y  0,1,..., n
pY ( y )   y 
0,
 otherwise
We write Y ~ Bin(n, p)
Remark:- To use binomial rvs as a chance model, look for independent trials with equal
probability of success
Theorem:
Let X ~ Bin(n, p) . Then the mean and variance of the binomial rv are:
E(X) = np and Var(X) = np(1-p)
12 | P a g e
Discussion problems
1. Twenty Grade 12 learners are randomly selected and asked whether or not they
would wish to pursue a scientific career. Historical records show that 2 in every 5
Grade 12 learners prefer a scientific career.
a. What is the probability that at least 3 learners will indicate their preference
for a scientific career? Answer: 0.9964
b. What is the expected value and variance of the number of learners preferring
a non-scientific career? Answer: 4.8
2. Oliver is a 1st year student registered in the Humanities faculty. He misreads the time
table and ends up in a venue where a 4th year Electrical Engineering exam is being
held. The exam is a multiple choice type and students are required to choose the
correct answer from 5 possible answers to each question. Assuming Oliver is clueless
altogether and the exam has 25 questions,
a. What is the probability that Oliver will get at least one answer right if he
embarks on a guesswork strategy? Answer: 0.9962
b. What mark should Oliver expect to get for the exam? Answer: 5
2.5 The Hypergeometric Distribution

Recall that the binomial distribution can be regarded as sampling with replacement from
a finite population. But in practice, selections are usually made without replacement:
 There is no need to test the item leaving the production line again
 It is unlikely that the same person wants to answer the same survey questions
more than once
The Hypergeometric distribution arises when sampling is performed from a finite

population without replacement. The general set up as follows:
 The set to be sampled (population) has N items in total
 Of these N items, let m be of a type of interest (called successes) and the
remaining “N – m” are of another type (called failures)
13 | P a g e
 A sample of size n items is randomly draw (without replacement) from this set of
N items in such a way that each subset of size n is equally likely to be chosen
 The random variable of interest, X denotes the number of items of a type of
interest drawn that are in the sample
 We require the distribution of X
The significant difference between a binomial probability and a Hypergeometric

probability is that binomial picks in finite samples are done “with replacement” and
Hypergeometric picks are done “without replacement”
Hypergeometric distribution pmf

Before giving the pmf of the Hypergeometric distribution, let’s look at possible values of
X first.
If the number of failure (N - m) is greater than or equal to the sample size (n), then we
can have all failures, and X can be 0
 But if N  m  n , the smallest possible value of X is n   N  m 
Similarly, if the sample size (n) is less than the number of successes (m), then we can
have all successes, and X can be n
 But if n  m , then X can only be at most m
Hence possible values of X satisfy the restriction: max  0, n   N  m    x  min  n, m 
Definition: If X is the number of successes in a random sample of size n, drawn from a

set consisting of m successes and N – m failures, then the rv X is defined to have a
hypergeometric distribution with pmf
 m  N  m 
   
  x  n  x  , max(0, n  N  m)  x  min( m, n)

p X ( x; N , m, n)   N
  
 n 

0, otherwise
14 | P a g e
Theorem:
Let X be a hypergeometric rv with parameters N, m and n. Then the mean and variance
of the hypergeometric rv are:
nm( N  n)( N  m)
E( X )  n
m
and Var ( X ) 
N N 2 ( N  1)
Exercises:
A tyre dealership has 12 size-14 tyres in stock; 3 Dunlop and 9 Continental. Each Dunlop
tyre costs R1050 whereas each Continental tyre costs R1100. Peter runs a catering
company and would like to buy 5 size-14 spare tyres (either brand) for his delivery
vehicles. He randomly selects 5 tyres from the dealership.
a) What is the probability that Peter selects at least 3 continental tyres? Answ: 0.954
b) Find the expected value and standard deviation of the amount of money Peter will
have to pay to buy these 5 tyres. Answers: R5437.50 & R29.83 respectively
2.6 Geometric Distribution

The rv with the geometric probability function is associated with experiments that share
some of the characteristics of the binomial experiment:
 The experiment consists of identical trials

 The probability of success, p, remains constant from trial to trial
 The trials are independent of each other
 The random variable of interest is the number of the trial on which the first
success occurs
Definition: A discrete rv X is defined to have a geometric distribution with parameter p

if the pmf of X is given by
 p(1  p) x 1 , 0  p  1; x  1, 2,...
P( X  x)  p X ( x; p)  
0, otherwise
15 | P a g e
Properties of a Geometric rv
When studying the properties of a geometric rv, we need to recall the sum of a geometric
series.
a
 a 1  r 
1
 Sum to infinity: S 
1 r

a 1 rn  
a r n 1 
 Partial sum: S n  or equivalently S n 
1 r r 1
Theorem:
Let X ~ Geom( p) . Then the mean and variance of the geometric rv are:
1 1 p
E( X )  and Var ( X )  2
p p
Theorem: Markov Property

Let X ~ Geom( p) and suppose that m and n are positive integers. Then
P  X  n  m | X  n  P  X  m .
That is, the probability of events happening in the future is independent of what went
before. For example, if you are told that there have been “n” failures initially, the chance
of at least “m” more failures before the first success is exactly the same as if you started
the experiment for the first time and the information of initial “n” failures is not given to
you.
Exercises:
A six-sided die is biased such that even numbered faces are twice as likely to occur as
odd numbered faces. Let X be the number of times you toss the die until you get the first
odd number:
a) What is the probability that you will toss the die at least 4 times? Answer: 8/27
b) Compute the expected value and standard deviation of X Answer: 2.449
c) The first 5 tosses have resulted in even numbered faces. What is the probability
that the first odd number will occur in the 10th toss? Answer: 0.0658
16 | P a g e
2.7 Negative Binomial Distribution
The Negative Binomial distribution refers to the probability of the number of times
needed to do something until achieving a fixed number of desired results. For example:
1. How many times will I throw a coin until it lands on heads for the 15th time?
2. How many children will I have until I get my 2nd daughter?
3. How many cards will I have to draw from a pack until I get the 3rd ace?
Clearly, the negative binomial distribution is related to the geometric distribution in the
number of trials needed to get a fixed number of successes. Consequently, a rv with a
negative binomial originates from a context similar to the one that yields the geometric
distribution:
 The experiment consists of a sequence of independent and identical trials

 The probability of success, p, remains constant from trial to trial
 The random variable of interest is the number of the trial on which the rth success
occurs (r = 2, 3, …)
Illustration:-
It is helpful to begin with an example
Example: Weld strength

A test of weld strength involves loading welded joints until a fracture occurs. For a
certain type of weld, 80% of the fractures occur in the weld itself, while the other 20%
occur in the beam. Weld tests are independent. Find the probability that the 3rd beam
fracture occurs on the 6th trial Answer: 0.04096
The negative binomial distribution has two controlling parameters: the probability of
success p in any independent trial and the desired number of successes r
17 | P a g e
Definition: A discrete rv X is said to have a negative binomial probability distribution if
the pmf of X is given by
 x  1 r xr
  p (1  p ) 0  p  1;
 r  1 

P( X  x)  nb( x; r , p )   x  r,  r  1 , ...



 0 otherwise
Theorem:
If X is a random variable with a negative binomial distribution, then the mean and
variance of the negative binomial rv are:
r r 1  p 
E( X )  and Var ( X ) 
p p2
Discussion problem:
A paediatrician wishes to recruit 5 couples, each of whom is expecting their first child to
take part in a new childbirth regimen. For each couple, the probability of agreement is
0.4. Assuming that couples are asked to participate independently, what is the probability
of recruiting the last couple when 11 couples have been asked? Answer: 0.1003
18 | P a g e

DiscreteDistributions SAKAI

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DiscreteDistributions SAKAI

Uploaded by

Copyright:

Available Formats

Mathematical Statistics 1

School of Statistics and Actuarial Science

DISTRIBUTIONS LECTURE SERIES

Devore & Berk: Pages 94 – 137

real number can be associated with each point is S.

1.1 Discrete Random Variables

p( x) is a function that assigns probabilities to each value x of the rv X, and is often

Sometimes, we study the behaviour of rvs by looking at the cumulative probabilities.

where p(x) is the pmf.

1.21 Connection Between F(x) and p(x)

with, for completeness, p( x1 )  FX ( x1 ) .

1. In the limiting cases;

2. The function FX () is continuous from the right,

but not continuous on .

3. FX () is a nondecreasing function. That is, if x1<x2, FX ( x1 )  FX ( x2 )

1.3 Expected Values of Discrete Random Variables

assuming the expectation exists. (may not be a possible value)

If p(x) is the accurate characterisation of the population frequency distribution, then

1.31 Expected Value of a Function:

assuming the expectation exists.

Corollary: If a and b are constants, then

 Variance can be thought of as the average squared distance between values of X

A computationally convenient alternative formula for the variance of a rv X is

Proposition: For any rv X and constants a and b,

Var (aX  b)  a 2Var ( X )

 Standard deviation can be thought of as the size of a “typical” deviation between

2.0 Discrete Probability Distributions

Useful Series in Probability

Definition: A rv X has a discrete uniform distribution and is said to be a discrete uniform

 Often, we are not only interested in one outcome of a Bernoulli trial.

2.4 The Binomial Distribution

i  1,..., n where X i  1 if the ith inspected item results in a success

Definition: A rv Y has a binomial distribution and is referred to as a binomial rv if and

2.5 The Hypergeometric Distribution

The Hypergeometric distribution arises when sampling is performed from a finite

The significant difference between a binomial probability and a Hypergeometric

Hypergeometric distribution pmf

Hence possible values of X satisfy the restriction: max  0, n   N  m    x  min  n, m 

Definition: If X is the number of successes in a random sample of size n, drawn from a

2.6 Geometric Distribution

 The experiment consists of identical trials

Definition: A discrete rv X is defined to have a geometric distribution with parameter p

Theorem: Markov Property

 The experiment consists of a sequence of independent and identical trials

Example: Weld strength

You might also like