Discrete Probability Distribution2

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

ENGINEERING DATA ANALAYSIS

CHAPTER 3: DISCRETE PROBABILITY DISTRIBUTION

Random variables
 A random variable X is a quantity whose value cannot be predicted with certainty.
 A random variable is a quantity that may take any of a given range of values that
cannot be predicted exactly but can be described in terms of their probability.
 A random variable is a variable whose value is determined by the outcome of a
random procedure.
 Discrete if the value is taken from a fixed number of numerical values generally
arise from experiments involving counting, for example, road deaths, car
production and aircraft sales.
 Continuous if the values can fall anywhere over a range and the scale is only
restricted by the accuracy of measuring, for example, voltage, corrosion and oil
pressure.
Example
A fair coin is spun vertically on a flat surface. Here are two related random
variables:
• Let X be the number of heads showing when the coin comes to rest.
Then X takes
the value x=0 if the coin finishes up ‘tails’, or x=1 if the coin finishes up
‘heads’.

• Let Y be the time between the commencement of the spin and the coin
coming to
rest, measured in seconds.

Here X is discrete and Y is continuous.

 We will use capital letter X as our placeholder for the random variable. While
small letter x as our placeholder for whatever value the random variable X might
have, as well as P(X=x) to denote probability that X has value x.

Example:
Let X be the sex of a student selected at random from the collection of all
possible students at a given university.

x=0, when the gender is male


x=1, when the gender is female
P(X=0), is the probability that the selected at random from the collection of all
possible students is a male.
P(X=1), is the probability that the selected at random from the collection of all
possible students is a female.
Probability Distribution

 A probability distribution defines the relationship between the outcomes and their
likelihood of occurrence. To define a probability distribution we make a probability
model which is the set of assumptions used to assign probabilities to each
outcome in the sample space.

 A probability distribution is a function that describes the likelihood of obtaining the


possible values that a random variable can assume. Statisticians use the
following notation to describe probabilities:

 Probability distributions describe the dispersion of the values of a random


variable. Consequently, the kind of variable determines the type of probability
distribution. For a single random variable, statisticians divide distributions into the
following two types: Discrete probability distributions and continuous probability
distributions.

Example of a probability distribution: Let X be the sex of a student selected at random


from the collection of all possible students at a given university.

x 0 1
P(X=x) 0.53 0.47

Expectation

 Statistical expectation is the “long range average”.


 The mean is also sometimes called the expected value or expectation of X and
denoted by E(X). These are both somewhat curious terms to use; it is important
to understand that they refer to the long-run average. The mean is the value that
we expect the long-run average to approach. It is not the value of X that we
expect to observe.
 In general, for a discrete random variable X, which can take specific values of x,
the expected value of the random variable is defined by
𝐸(𝑋) = ∑ 𝑥 ∙ 𝑃(𝑋 = 𝑥)
𝑎𝑙𝑙 𝑥
 Where the summation over all x means all values of x for which the random
variable X has a non zero probability.

Example: When throwing a normal die, let X be the score shown on the die.
Construct a probability distribution of X. What is the expectation of X?
Example: When throwing a normal die, let X be the square of the score shown on
the die. Construct a probability distribution of X. What is the expectation of X?

Example: Let X=number of heads obtained when tossing a fair coin 3 times.
 What are the possible values of X?
 What are the associated probabilities?
 Determine the mean value of X.

Variance of X

 We have seen that the mean of a random variable X is a measure of the central
location of the distribution of X.
 The second most important feature is the spread of the distribution.
 These ideas lead to the most important measure of spread, the variance, and a
closely related measure, the standard deviation.
 A small standard deviation (or variance) means that the distribution of the
random variable is narrowly concentrated around the mean.
 A large standard deviation (or variance) means that the distribution is spread out,
with some chance of observing values at some distance from the mean.
 In general, the variance is defined by
𝑉(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2
𝑜𝑟
2
𝑉(𝑋) = 𝐸[(𝑋 − 𝐸(𝑋)) ]

Example: When throwing a normal die, let X be the score shown on the die. What is
the variance of X?

Example: Let X=number of heads obtained when tossing a fair coin 3 times.
Determine the variance of X.
DISCRETE PROBABILITY DISTRIBUTIONS

 Discrete Uniform Distribution


 Bernoulli and Binomial Distributions
 Hypergeometric Distribution
 Geometric and Negative Binomial Distributions
 Poisson Distribution

Probability Mass Function (PMF)


 Let X be a discrete random variable with range 𝑅𝑥 = {𝑥1, 𝑥2 , 𝑥3 , … } ( finite or
countably infinite). The function
𝑓𝑋 (𝑥𝑘 ) = 𝑃(𝑋 = 𝑥𝑘 ), 𝑓𝑜𝑟 𝑘 = 1, 2, 3, . .,

Is called the discrete probability mass function (PMF) of X.

 The PMF is a probability measure that gives us probabilities of the possible


values for a random variable.

 Since values of the PMF represent probabilities, PMFs enjoy certain properties.
In particular, all PMFs satisfy

1. 𝑓𝑋 (𝑥) > 0 𝑓𝑜𝑟 𝑥 ∈ 𝑆


2. 𝑓𝑋 (𝑥) = 0 𝑓𝑜𝑟 𝑥 ∉ 𝑆
3. ∑ 𝑓𝑋 (𝑥) = 1

Discrete Uniform distribution


 A random variable X follows the discrete uniform distribution on the interval [a, b],
if it may attain each of these values with equal probability.

 A random variable X with the discrete uniform distribution on the integers


1, 2,..., m has a PMF
𝟏
𝒇𝑿 (𝒙) = 𝑷(𝑿 = 𝒙) = , 𝒙 = 𝟏, 𝟐, … 𝒎
𝒎
𝑚+1
𝐸(𝑋) =
2
2)
𝑚2 − 1
𝑉(𝑋) = 𝐸(𝑋 − [𝐸(𝑋)]2 =
12

Example: Roll a die and let X be the upward face showing. Find the E(X) and the V(X).

Example: A random experiment where this distribution occurs is the choice of an integer
at random between 1 and 100, inclusive. Let X be the number chosen. Find the E(X)
and the V(X).
Bernoulli and Binomial Distribution

 The binomial distribution is based on a Bernoulli trial, which is a random


experiment in which there are only two possible outcomes: success (S ) and
failure (F). We conduct the Bernoulli trial and let
1, 𝑖𝑓 𝑡ℎ𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑖𝑠 𝑆
𝑋={
0, 𝑖𝑓 𝑡ℎ𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑖𝑠 𝐹
 An example of a Bernoulli random variable is the result of a toss of a coin with
Head, say, equal to one and Tail equal to zero. Some example uses include a
random binary digit, whether a disk drive crashed, and whether someone likes a
Netflix movie.

The Binomial Model


The Binomial model has three defining properties:
• Bernoulli trials are conducted n times,
• The trials are independent,
• The probability of success p does not change between trials.

If X counts the number of successes in the n independent trials, then the PMF of X is

𝒏
𝒇𝑿 (𝒙) = 𝑷(𝑿 = 𝒙) = ( ) 𝒑𝒙 (𝟏 − 𝒑)𝒏−𝒙 , 𝒙 = 𝟎, 𝟏, 𝟐, … 𝒏
𝒙
𝐸(𝑋) = 𝑛𝑝
𝑉(𝑋) = 𝑛𝑝𝑞

Example: Two percents of a product are defective. If a lot of 100 items are ordered what
is the probability that there are no defective items? What is the probability that there are
exactly two defective items?

Example: A basketball player takes 4 independent free throws with a probability of 0.7
of getting a basket on each shot. Let X the number of baskets he gets. Write out the full
probability distribution for X.

Example: A bag contains 6 red Bingo chips, 4 blue Bingo chips, and 7 white Bingo
chips. What is the probability of drawing a red Bingo chip at least 3 out of 5 times?

Example: A family consists of 3 children. What is the probability that at most 2 of the
children are boys?
Hypergeometric Distribution

 When sampling without replacement from a finite sample of size n from a


dichotomous (S–F) population with the population size N, the hypergeometric
distribution is the exact probability model for the number of S’s in the sample.

 The assumptions leading to the hypergeometric distribution are as follows:


1. The population or set to be sampled consists of N individuals, objects,
or elements (a finite population).

2. Each individual can be characterized as a success (S) or a failure (F),


and there are M successes in the population.

3. A sample of n individuals is selected without replacement in such a


way that each subset of size n is equally likely to be chosen.

 If X is the number of S’s in a completely random sample of size n drawn from a


population consisting of M S’s and (N –M) F’s, then the probability distribution of
X, called the hypergeometric distribution, is given by

(𝑴
𝒙
)(𝑵−𝑴
𝒏−𝒙
)
𝒇𝑿 (𝒙) = 𝑷(𝑿 = 𝒙) = , 𝒙 = 𝟎, 𝟏 , … , 𝒏
(𝑵
𝒏
)

𝐸(𝑋) = 𝑛𝑝
𝑁−𝑛
𝑉(𝑋) = ( ) 𝑛𝑝(1 − 𝑝)
𝑁−1
Example:
1. In the manufacture of car tyres, a particular production process is known to
yield 10 tyres with defective walls in every batch of 100 tyres produced. From
a production batch of 100 tyres, a sample of 4 is selected for testing to
destruction.
Find:
(a) the probability that the sample contains 1 defective tyre
(b) the expectation of the number of defectives in samples of size 4
(c) the variance of the number of defectives in samples of size 4.

2. An urn contains 4 red balls and 10 blue balls. Five balls are drawn at random
without replacement from this urn. What is the probability that exactly two red
balls are drawn?

3. What is the probability of getting at most 2 diamonds in the 5 selected without


replacement from a well shuffled deck?
Geometric Distribution
 The geometric is also called a “waiting-time” distribution
 Consider repeating a Bernoulli trial with success probability p, until the first
success occurs, let X be the number of failures before a success. Then X follows
a geometric distribution with parameter p is defined as

𝒇𝑿 (𝒙) = 𝑷(𝑿 = 𝒙) = 𝒑(𝟏 − 𝒑)𝒙−𝟏 , 𝒙 = 𝟏, 𝟐, …


1
𝐸(𝑋) =
𝑝
1−𝑝
𝑉(𝑋) = 2
𝑝
Example:
1. A representative from the National Football leagues Marketing Division randomly
selects people on a random street in Kansas City until he finds a person who
attended the last home football game. Let p, the probability that he succeeds in
finding such a person equal to 0.20. Let X denotes the number of people he
selects until he finds his first success.
a. What is the probability that the marketing representative must select 4 people
before he finds one who attended the last home football game?
b. How many people should we expect the marketing representative needs to
select before he finds one who attended the last home football game?
c. What is the variance?

2. You play a game of chance that you can either win or lose (there are no other
possibilities) until you lose. Your probability of losing is 0.57. What is the
probability that it takes five games until you lose? Find the expected number of
game you are going to play until you lose. Find also the variance.
Negative Binomial Distribution

 The negative binomial rv and distribution are based on an experiment satisfying


the following conditions:
1. The experiment consists of a sequence of independent trials.
2. Each trial can result in either a success (S) or a failure (F).
3. The probability of success is constant from trial to trial, so for i= 1,2,3,…
4. The experiment continues (trials are performed) until a total of r
successes have been observed, where r is a specified positive integer.

 The random variable of interest is X = the number of failures that precede the rth
success; X is called a negative binomial random variable because, in contrast
to the binomial rv, the number of successes is fixed and the number of trials is
random.
 The pmf of the negative binomial rv X with parameters r = number of S’s and p =
P(S) is
𝒙+𝒓−𝟏 𝒓
𝒇𝑿 (𝒙) = 𝑷(𝑿 = 𝒙) = ( ) 𝒑 (𝟏 − 𝒑)𝒙 , 𝒙 = 𝟎, 𝟏, 𝟐, …
𝒓−𝟏
𝑟𝑞
𝐸(𝑋) =
𝑝
𝑟𝑞
𝑉(𝑋) = 2
𝑝
Example:
1. A pediatrician wishes to recruit 5 couples, each of whom is expecting their
first child, to participate in a new natural childbirth regimen. Let p = P(a
randomly selected couple agrees to participate).
If p = .2, what is the probability that 15 couples must be asked before 5 are
found who agree to participate? That is, with S = {agrees to participate}, what
is the probability that 10 F’s occur before the fifth S?

2. We flip a coin repeatedly and let X count the number of Tails until we get
seven Heads. What is P(X = 5)?

Poisson Distribution
• The Poisson distribution is used for modelling the occurrence of events
during a fixed time interval or the number of rare successes in a very large
number of trials, such as:
 the number of misprints on a book page,
 traffic accidents,
 typing errors, or
 customers arriving in a bank.
 the number of telephone calls during a fixed time interval
• The following conditions must be fulfilled for the Poisson distribution to be
applicable:
1. The number of events in nonoverlapping time intervals must be
independent.
2. The probability of occurrence must be the same in all time
intervals of the same length.
3. The probability of more than one event occurring during a short
interval must be small relative to the occurrence of only one event.
4. The probability of an event occurring in a short interval must be
approximately proportional to the interval's length.

• Let 𝜆 be the average number of events in the time interval. Let the random
variable X count the number of events occurring in the interval and e =
2.71828 . . . is the base of the natural logarithm. Then the PMF of X is
given by

𝝀𝒙 𝒆−𝝀
𝒇𝑿 (𝒙) = 𝑷(𝑿 = 𝒙) = , 𝒙 = 𝟎, 𝟏, 𝟐, …
𝒙!
𝐸(𝑋) = 𝜆
𝑉(𝑋) = 𝜆

Example:
1. Suppose that in a certain area there are on average five traffic accidents per
month, that is X=number of traffic accidents. What is the probability that there
are 4 accidents in a given month?
2. On the average, five cars arrive at a particular car wash every hour. Let X
count the number of cars that arrive from 10AM to 11AM. What is the
probability that no car arrives during this period?
3. Suppose the car wash above is in operation from 8AM to 6PM. What is the
probability that 55 car arrives during this period?

You might also like