Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Chapter 5 – Discrete Random Variables and

Probability Distributions

Definition

A random variable is a variable which


assigns a numerical value to every outcome
of a random experiment

There are two types of random variables:

Discrete and Continuous


Probability Distributions for Discrete
Random Variables (section 5.2)

A discrete random variable, x, has a finite


number of possible values (or a countably
infinite number of values)

Example 5.1 (example 4.1 revisited)


In examining an item with respect to the 3
categories, let

x = number of satisfactory categories

What values can x take on? What are the


probabilities of each value?
Example 5.2
A factory manager is interested in how
many accidents occur in the factory in a
given year.

x = number of accidents in a year

Probability Distribution of a Discrete r.v.

The probability distribution of a discrete r.v.


x lists all possible values of x and their
probabilities:

x x1 x2 x3 ... xk
P(x=xi) P(x1) P(x2) P(x3) ... P(xk)

P(xi)= P(x = xi) is called a probability


function (pf)
Properties of a pf

1. 0 ≤ P(xi) ≤ 1

2. P(x1) + P(x2) +…+ P(xk) = Σ P(xi) = 1

Back to Example 5.1


The probability distribution of
x = number of satisfactory categories:

x 0 1 2 3
P(x) 0.006 0.092 0.398 0.504

Back to Example 5.2


Suppose the probability function for
x = number of accidents per year is:
1
P(xi) = P(x = xi ) = 2 xi +1

The probability distribution is:


x 0 1 2 3 4 5 …
P(x) 1/2 1/4 1/8 1/16 1/32 1/64 …
Example 5.3 (back to example 4.2)
In example 4.2, you are given the following events:

W = poor weight item = item that is unsatisfactory


in weight
R = reworkable item = item that is satisfactory
only in dimension or weight
but not both
V = salvageable item = item that is satisfactory
only in texture
J = rejected item = item that is total unsatisfactory
in all categories

A penalty point system is used by the quality


assurance department, with the following penalty
points assigned to each item:
If unsatisfactory in dimension 1 point
If unsatisfactory in weight 1 point
If unsatisfactory in texture 2 points
Additional penalty points
If item is reworkable (event R) 3 points
If item is salvageable (event V) 5 points
If item is rejected (event J) 10 points
Let x = number of penalty points assessed for any
future item. What is the probability function for x?
Solution to 5.3
Uses of the pf
We can use the probability function to
answer questions about x.

Back to Example 5.2


(a) What is the probability of 3 or 4
accidents in a year?
(b) What is the probability of fewer than 4
accidents in a year?
(c) What is the probability of 3 or more
accidents in a year?
(d) What is the probability of more than 2
accidents in a year, given that you know
there has been at least one accident in a
year?
Solution
Bonus Example
A municipal bus company has started operations
in a new subdivision. Records were kept over
several months on the number of riders on one
bus route during the early morning weekday
service. The results are:

Number 20 21 22 23 24 25 26 27
of Riders
Proportion 0.02 0.12 0.23 0.31 0.19 0.08 0.03 0.02

(a) What is the probability that on a randomly


chosen weekday, there will be at least 24
riders?
(b) Two weekdays are chosen at random.
(i)What is the probability that on both of
these days, there will be fewer than 23
riders? (ii) How about on exactly one of the
days?
Solution to Bonus Example
Mean and Variance of a Discrete Random
Variable (section 5.2)

Two numbers that are often used to


summarize a probability distribution for a
random variable, x, are:

1. Mean

The mean of a r.v. is a measure of the


centre of the probability distribution
(it is the average value of the x’s)
• This is also referred to as the expected
value of x

2. Variance

The variance of a r.v. is a measure of


how spread out the distribution is
(the variance measures how far the
values of x are from their mean)
1. Mean of x

The mean, or expected value, of x is

μx = E(x) = Σ xi P(xi) = Σ xi P(x = xi)

The mean can be thought of as a weighted


value of all possible values of x, with the
weights being the probabilities
Back to Example 5.1 (number of satisfactory
categories)

Back to Example 5.3 (penalty points)

Back to Bonus Example (bus riders)


2. Variance of x

σx2 = Σ (xi – μx)2 P(xi)

Computing Formula for

σx2 = [ Σ xi2P(xi) ] − μx2

Related Definition: Standard Deviation

2
σ = standard deviation of x = σ
Back to Example 5.1
Calculate the standard deviation of the number
of satisfactory categories.

Bonus Example
Calculate the standard deviation of the early
morning route of bus riders.
Example 5.4
A company is to compare two new product
designs based on revenue potential and only one
design can be chosen. The following are the
estimated probability distributions based on
input from the company’s marketing
department.

Product x (x = amount of annual revenue)


X 3 million 5 million 7 million 10 million
pi 0.1 0.4 0.3 0.2

Product y (y = amount of annual revenue)


X 4 million 5 million 6 million 7 million
pi 0.1 0.1 0.1 0.7

Which product should be chosen?


Solution to 5.4
Special Discrete Probability Distributions

The Binomial Distribution (Section 5.3)

Bernoulli Random Variables

Let x be a random variable that can take on


only two possible values:

x = 1 if a “success”
0 if a “failure”

Let
p = P(success) = P(x = 1)
1− p = P(failure) = P(x = 0)
If the experiment is repeated just once
(n = 1), then

(i) μ = p

(ii) σ 2 = p (1 − p)

When you repeat a Bernoulli experiment


more than once (that is, n ≥ 2), you get what
is known as a binomial distribution
The following are the characteristics of a
discrete random variable that has a
binomial distribution

1. There are n independent trials

2. There are only 2 possible outcomes on


each trial: success or failure

3. The probability of success on each trial


is p

4. The random variable of interest is


x = number of successes in n-trials

We say that x ~ Bin(n, p)


The probability function (pf) of x is:

n x
P ( x ) =   p (1 − p ) n - x , x = 0, 1, 2, ..., n.
 x

Note
The mean and variance of a binomial
random variable are:

μx = n p

σx2 = n p (1−p)
Example 5.5
A man and woman, each with one recessive
(blue – happens 1 out of 4 times) and one
dominant (brown – happens 3 out of 4
times) gene for eye colour, have 3 children.
What is the probability distribution for the
number of blue-eyed children? Graph the
resulting distribution.
Solution to 5.5
Example 5.6
A large retailer purchases a shipment of a certain
type of electronic device from a manufacturer.
The manufacturer indicates that each shipment
contains about 3% defective devices.

(a) If the inspector of the retailer randomly


picks 20 devices from a shipment and tests
them, what is the probability there will be
(i) Exactly 3 defective devices?
(ii) At least 1, but no more than 3,
defective devices?
(iii) At least 1 defective device?

(b) Suppose the retailer receives 10 shipments


in a month and the inspector randomly tests
20 devices per shipment. What is the
probability that there will be 3 shipments
(out of the 10) containing at least one
defective device?

(c) If the inspector randomly selects and tests


50 devices in a shipment, how many are
expected to be defective? What is the
standard deviation?
Solution to 5.6
Example 5.7
Define the random variable x as:
x = number of people who have ever read a
text message while driving a car, out of a
random sample of n people.
You are given:
• The expected value of x = 2.25
• The variance of x = 1.9125

What is P(x = 4)?

Solution to 5.7
The Poisson Distribution (section 5.4)

It is often useful to determine the number of


times that a certain event occurs per unit of
time, distance or volume.

For example
A researcher may be interested in the
• number of defects in an item of a certain
length
• number of radioactive particles emitted by
a substance in a 10 minute period
• number of telephone calls received by an
operator within an hour
• number of people arriving in a lineup per
hour

The Poisson distribution is often used to


model such situations
The probability function (pf) of a Poisson
random variable is:

λx e − λ
P(x) = x! , x = 0, 1, 2, 3, …

where λ is the parameter of the


distribution.

Mean and Variance of a Poisson


Distribution

1. µ x = E[x] = λ

2. σx2 = λ
Example 5.8
A bank has a drive through window. On
average, 4 customers per hour arrive at the
window during non-business hours. If we
assume that arrivals at the window during the
non-business hours follow a Poisson
distribution, what is the probability that
(a) A randomly selected non-business hour will
have between 5 and 7 customers (inclusive)
arrive at the window?
(b) No more than 2 customers will arrive during
the next 30 minutes?
(c) Exactly 6 customers arrive during any 2-
hour time period?
Solution to 5.8
Example 5.9
Miss I. M. Lonely is the owner and sole
employee of the Heavensent Computer Dating
Service. She receives, on average, 18 telephone
calls per each nine hour business day. Next
Tuesday, Miss Lonely has a dental appointment
and estimates that she will be away from the
office for 3 hours.

(a) What is the expected number of calls she


will miss while she is away? What is the
standard deviation?
(b) What is the probability that at least one call
will be missed during the time she will be
away? More than 4 calls?
Solution to 5.9
Poisson Approximation to the Binomial

When you have a binomial situation, it can


be sometimes difficult to use the probability
mass function when n is large

What can you do in this situation?

Suppose x ~ Bin(n, p)

1. If np ≥ 7 and n(1−p) ≥ 7, then can use


a normal approximation (see section 6.3)

2. If either np < 7 or n(1−p) < 7 and


n > 20, then can use a Poisson
distribution to approximate the binomial
distribution with λ = np.
Example 5.10
A milk processing plant fills 2-litre cartons of
milk. About 1% of the time, the carton is not
sealed properly and the milk spoils before it
reaches the grocery store. If a grocery store
receives a shipment of 250 2-litre cartons of
milk, what is the probability that more than 5
cartons will contain spoiled milk?

Solution to 5.10
The Hypergeometric Distribution (sect. 5.5)

What do you do when you have what looks


like a binomial situation (that is, only two
possible outcomes on each trial or
experiment), BUT you do not have
independent events OR the probability of
success changes with each trial or repetition
of the experiment?

This happens when you sample without


replacement

You can use the hypergeometric distribution


For example
Suppose you have a shipment of 100 items,
of which 10 of them are defective.
Suppose you randomly select 15 of them.
What is the probability that 3 of them are
defective?

P(1st item is defective) = 10/100 = 0.10

P(2nd item is defective|1st item is defective) = 9/99 = 0.0909


P(2nd item is defective|1st item is not) = 10/99 = 0.1010

P(3rd item is defective|1st 2 items defective) = 8/98 = 0.0816


P(3rd item is defective|1st 2 are not) = 10/98 = 0.1020
P(3rd item is defective|1st is, 2nd is not) = 9/98 = 0.0918
:
:

We can see that the probability of success


(defective item) is not the same each time an
item is randomly selected; it depends on
whether or not “success” happened on the
previous trial
Thus we should not use the binomial
distribution to answer the question.

Instead we should use:

Hypergeometric Distribution

Characteristics
1. We have a population of N items

2. It is known that A of them are of a


certain kind (“success”) and N−A of
them are of another kind (“failure”)

3. We take a random sample of n items,


drawn without replacement, BUT
p = P(success) is not the same from trial
to trial

4. The random variable of interest is


x = number of successes in n items
Then x has a hypergeometric distribution
with probability mass function:

 A  N − A
   
 x  n − x 
P(x) = N 
 
 n 

where

max [0, n + A − N ] ≤ x ≤ min [n , A ]

Mean and Variance

1. μx = nA / N

 N − n   A  A 
2. σx2 =   n  1 − 
 N − 1   N  N 
Example 5.11
In a ground water contamination study,
researchers identify 25 possible sites for drilling
and sample collection. Unknown to the
researchers, 19 of these sites have ground water
with a high contamination, while the other 6
sites have low contamination. The researchers
have a budget that only allows them to drill at 5
sites, so they randomly choose these 5 sites from
their list of 25.

What is the probability that at least 4 out of


these 5 sites have ground water with a high
contamination?
Solution to 5.11
Note
When do you use the binomial distribution
and when do you use the hypergeometric
distribution?

You must have a situation where there are


only two possible outcomes on each trial:
success or failure

1. If you are sampling with replacement


OR if you are told the probability of
success is the same on each trial OR if
you do not know the value of N, then use
BINOMIAL

2. If you know the value of N, then you


need to compare the value of n (the
sample size) to N before deciding which
distribution to use.
(i) If n is large relative to N, then use
the hypergeometric
n
General rule: N > 0.05

(ii) If n is small relative to N, then use


the binomial
n
General rule: N ≤ 0.05

Back to Example 5.2


A large retailer purchases a shipment of
electronic devices from a manufacturer, where it
is known that 3% of the devices are defective.
The inspector of the retailer randomly selects 20
devices and tests them. What is the probability
that exactly 2 of them will be defective if

(a) The shipment contains 1000 devices?


(b) The shipment contains 200 devices?
Solution

You might also like