Lecture Notes 4 Continuous Probability Distributions

Lecture Notes 4 – Continuous Probability Distributions 1
Engr. Caesar Pobre Llapitan
Topics:
I. Continuous Random Variables and their Probability Distribution
II. Expected Values of Continuous Random Variables
III. Normal Distribution
IV. Normal Approximation to the Binomial and Poisson Distribution
V. Exponential Distribution
I. CONTINUOUS RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTION
A continuous random variable has a probability of assuming exactly any of its values. Consequently,
its probability distribution cannot be given in tabular form.
Introduction
Many random variables observed in real life are not discrete random variables because the number of
values they can assume is not countable. In contrast to discrete random variables, these variables can
take on any value within an interval. For example: the daily rainfall at some location, the strength of a
steel bar and the intensity of sunlight at a particular time of day. These random variables are called
continuous random variables.
Definition 1
Let X be a continuous random variable assuming any value in the interval (- ∞ , + ∞ ).
Then the cumulative distribution function F(x) of the variable X is defined as follows
F(x)  P(X  x)
i.e., F(x) is equal to the probability that the variable X assumes values, which are less than or
equal to x.
Note that here and from now on we denote by letter X a continuous random variable and denote by x a
point on number line.
From the definition of the cumulative distribution function F(x) it is easy to show the following its
properties.
Properties of the cumulative distribution function F(x) for a continuous random variable X
1. 0≤F ( x )≤1 ,
2. F(x) is a monotonically non-decreasing function, that is, if a≤b then F( a)≤F (b ) for any
real numbers a and b.
3. P(a  X  b)  F (b)  F (a)
4. F( x )→0 as x →−∞ and F( x )→1 as x →+∞
A large data set can be described by means of a relative frequency distribution. If the data represent
measurements on a continuous random variable and if the amount of data is very large, we can reduce
the width of the class intervals until the distribution appears to be a smooth curve. A probability
density is a theoretical model for this distribution.
The distinction between discrete random variables and continuous random variables is usually based
on the difference in their cumulative distribution functions.
Let X be such a random variable. We say that X is a continuous random variable if there exists a
nonnegative function f , defined for all real x ∈ (−, ), having the property that, for any set B of real
numbers,
P  X    f  x  dx
B (1)
The function f is called the probability density function of the random variable X.
In words, Equation (1) states that the probability that X will be in B may be obtained by integrating
the probability density function over the set B. Since X must assume some value, f must satisfy

1  P  X   ,      f (x) dx (2)
The density function for a continuous random variable X, the model for some real-life population of
data, will usually be a smooth curve as shown in Figure 1.
Figure 1 Density function f(x) for a continuous random variable
It follows that
x
F( x)= ∫ f (t )dt
−∞
Thus, the cumulative area under the curve between - ∞ and a point x0 is equal to F(x0).
All probability statements about X can be answered in terms of f. If an interval is likely to contain a
value for X, its probability is large and it corresponds to large values for f(x). The probability that X is
between a and b is determined as the integral of f(x) from a to b.
b
P  a  X  b  a f (x) dx (3)
P(a ≤ X ≤ b) = area of shaded region
Figure 2 Probability density function f

The density function for a continuous random variable must always satisfy the two properties given in
the box.
Definition 2
For a continuous random variable X, a probability density function is a function such that
1. f(x) ≥ 0

2.  f (x) dx  1
b
P  a  X  b  a f  x  dx = area under f(x) from a to b for any a and b.
3.
Properties of a density function

1. f(x)  0

 f (x)dx  F ()  1
2. 
Examples
1. Suppose that X is a continuous random variable whose probability density function is given by
C 4 x  2 x 2
f  x  

0 x2 
0 otherwise
a) What is the value of C?

b) Find P{X > 1}.
Solution:

Since f is a probability density function, we must have 

f (x) dx  1
a) ; implying that
 
2
C  4 x  2 x 2 dx  1
0
x 2
 2x3 
C 2 x 2   1
 3 
x 0
3
C
8
Hence,
 
 3 2 1
b)

P{X > 1} = 1
f (x) dx  
8 1
4 x  2 x 2 dx 
2
2. The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density function given by
 e  x / 100 x0
f  x  
0 x 0
What is the probability that

a) a computer will function between 50 and 150 hours before breaking down?
b) it will function for fewer than 100 hours?
Solution:
a. Since
 
1 f  x  dx    e  x/100 dx
 0
we obtain
 1
1    100  e  x /100  100 or =
0 100
Hence, the probability that a computer will function between 50 and 150 hours before
breaking down is given by
150 1
P  50  X  150  
150
e  x/100 dx  e  x /100
50 100 50
1/2 3/2
 e  e  0.384
b. Similarly,
1  x/100
P  X  100  
100 100
e dx  e  x/100
0 100 0
 1  e 1  0.633
In other words, approximately 63.3 percent of the time, a computer will fail before registering
100 hours of use.
3. The lifetime in hours of a certain kind of radio tube is a random variable having a probability
density function given by
0 x  100

f  x    100
 2 x > 100
x
What is the probability that exactly 2 of 5 such tubes in a radio set will have to be replaced within
the first 150 hours of operation? Assume that the events Ei, i =1, 2, 3, 4, 5, that the ith such tube will
have to be replaced within this time are independent.
Solution:
From the statement of the problem, we have
P  Ei    f  x  dx
150
0
150
 100  x 2 dx
100
1

3
Hence, from the independence of the events Ei, it follows that the desired probability is
2
 5  1   2  80
     
 2   3   3  243
The relationship between the cumulative distribution F and the probability density f is expressed
by
F  a   P  X  (, a]   f  x  dx
a

Differentiating both sides of the preceding equation yields

d
F  a  f  a
da
That is, the density is the derivative of the cumulative distribution function. A somewhat more
intuitive interpretation of the density function may be obtained
    a 
P a   X  a     2 f  x  dx   f  a 
 2 2  a 2
when ε is small and when f (·) is continuous at x = a. In other words, the probability that X will be
contained in an interval of length ε around the point a is approximately εf (a). From this result we
see that f (a) is a measure of how likely it is that the random variable will be near a.
4. If X is continuous with distribution function F(X) and density function f(X), find the density
function of Y = 2X.
Solution:
We will determine fY in two ways. The first way is to derive, and then differentiate, the
distribution function of Y:
FY  a   P  Y  a
 P  2 X  a
 P  X  a / 2
 FX  a / 2 
Differentiation gives
1
fY  a   f X  a / 2
2
Exercises:
1. Let the continuous random variable X denote the current measured in a thin copper wire in
milliamperes. Assume that the range of X is [0, 20 mA], and assume that the probability
density function of X is f(x) = 0.05 for 0 ≤ x ≤ 20. What is the probability that a current
measurement is less than 10 milliamperes?
2. Let the continuous random variable X denote the diameter of a hole drilled in a sheet metal
component. The target diameter is 12.5 millimeters. Most random disturbances to the process
result in larger diameters. Historical data show that the distribution of X can be modeled by a
f x  20e 
20 x  12.5 
probability density function   , x ≥ 12.5.
a. If a part with a diameter larger than 12.60 millimeters is scrapped, what proportion of
parts is scrapped?
b. What proportion of parts is between 12.5 and 12.6 millimeters?
Definition 3
The cumulative distribution function of a continuous random variable X is
x
f  x P X x   f  u  du

for - ≤ x ≤ .
Extending the definition of f(x) to the entire real line enables us to define the cumulative distribution
function for all real numbers.
Example:
1. Suppose the cumulative distribution function of the random variable X is
0 x  2

F  x   0.25 x  0.5  2  x  2
1 2 x

Determine the following:

a. P(X < 1.8) c. P(X > -1.5)
b. P(X < -2) d. P(-1 < x < 1)
Solution:
2. The gap width is an important property of a magnetic recording head. In coded units, if the
width is a continuous random variable over the range from 0 < x < 2 with f(x) = 0.5x,
determine the cumulative distribution function of the gap width.
Solution:
II. EXPECTED VALUES OF CONTINUOUS RANDOM VARIABLES
Definition 3
Let X be a continuous random variable with density function f(x). Then the mean or the expected
value of X is

E(X)   xf(x)dx
-
Definition 4
Let X be a continuous random variable with density function f(x) and g(x) is a function of x. Then the
mean or the expected value of g(X) is

E[g(X)]   g(x)f(x)dx
-
Definition 5
Let X be a continuous random variable with the expected value E(X)   . Then the variance of X is
 2  E[(X -  )2 ]
The standard deviation of X is the positive square root of the variance σ =√ σ 2

Uniform Distribution
The continuous random variable X has the Uniform distribution between 1 and 2, with 1 < 2 if
 1
 1  x   2
f  x    2  1
0 otherwise

X U  1 ,  2 
for short
   
2
 
 1 2  2
 2 1
Mean and variance: 2 12
Occurrence of the Uniform distribution
1. Waiting times from random arrival time until a regular event (see below)
2. Engineering tolerances: e.g. if a diameter is quoted "0.1mm", it sometimes assumed (probably
incorrectly) that the error has a U(-0.1, 0.1) distribution.
3. Simulation: programming languages often have a standard routine for simulating the U(0, 1)
distribution. This can be used to simulate other probability distributions.
Example: Disk wait times

In a hard disk drive, the disk rotates at 7200rpm. The wait time is defined as the time between the
read/write head moving into position and the beginning of the required information appearing under
the head.
a. Find the distribution of the wait time.
b. Find the mean and standard deviation of the wait time.
c. Booting a computer requires that 2000 pieces of information are read from random
positions.
d. What is the total expected contribution of the wait time to the boot time, and rms
deviation?
Solution:
Rotation time = 8.33ms. Wait time can be anything between 0 and 8.33 ms and each time in
this range is as likely as any other time.
Therefore, distribution of the wait time is U(0, 8.33ms) (i. . 1 = 0 and 2 = 8.33ms).
0  8.33
 ms  4.2 ms
2
 8.33  0 
2
 2
 ms  5.8 ms 2   = 2.4 ms
12
For 2000 reads the mean time is 2000 × 4.2 ms = 8.3s.

The variance is 2000 5.7ms2 = 0.012s2, so  = 0.11s.
III. NORMAL DISTRIBUTION
The normal (or Gaussian) density function was proposed by C.F.Gauss (1777-1855) as a model for the
relative frequency distribution of errors, such errors of measurement. Amazingly, this bell-shaped
curve provides an adequate model for the relative frequency distributions of data collected from many
different scientific areas.
The Density Function, Mean and Variance for A Normal Random Variable
A density curve is a curve that:

- is always on or above the horizontal axis
- has an area of exactly 1 underneath it
- Measures of center and spread apply to density curves as well as to actual sets of observations
o Density curves are lines that show the location of the individuals along the horizontal axis and
within the range of possible values.
o They help researchers to investigate the distribution of a variable.
o Some density curves have certain properties that help researchers draw conclusions about the
entire population.
A density curve describes the overall pattern of a distribution. The area under the curve and above
any range of values on the horizontal axis is the proportion of all observations that fall in that range.
Distinguishing the Median and Mean of a Density Curve

 The median of a density curve is the equal-areas point, the point that divides the area under
the curve in half.
 The mean of a density curve is the balance point, at which the curve would balance if made of
solid material.
 The median and the mean are the same for a symmetric density curve. They both lie at the
center of the curve.
A Normal distribution is described by a Normal density curve. Any particular Normal distribution is
completely specified by two numbers: its mean 𝜇 and its standard deviation 𝜎.
• The mean of a Normal distribution is the center of the symmetric Normal curve.
• The standard deviation is the distance from the center to the change-of-curvature points on
either side.
• The Normal distribution is abbreviated with mean 𝜇 and standard deviation 𝜎 as 𝑁(𝜇, 𝜎)
The density function:

1
e ( x   ) /2
2 2
f ( x) 
 2
The parameters  and 2 are the mean and the variance, respectively, of the normal random variable
There is infinite number of normal density functions – one for each combination of  and . The
mean measures the location and the variance measures its spread. Several different normal density
functions are shown in Figure 1.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Figure 2 Several normal distributions:

Curve 1 with μ=3 , σ =1 , Curve 2 with μ=−1 , σ =0 , and Curve 3 with μ=0 , σ=1 .5
The 68-95-99.7 Rule

P(     σ) 
0.6826
P(     2σ) 
0.9544
P(     3σ) 
0.9973
These equalities are known as  , 2 and 3 rules, respectively and are often used in statistics.
Namely, if a population of measurements has approximately a normal distribution the probability that
a random selected observation falls within the intervals ( - ,  + ), ( - 2,  +2), and ( - 3,  +
3), is approximately 0.6826, 0.9544 and 0.9973, respectively.
Characteristics of the Normal distribution

1. Normal distributions are symmetric around their mean.
2. The mean, median, and mode of a normal distribution are equal.
3. The area under the normal curve is equal to 1.0.

4. Normal distributions are denser in the center and less dense in the tails.
5. Normal distributions are defined by two parameters, the mean (μ) and the standard deviation
(σ).
6. 68% of the area of a normal distribution is within one standard deviation of the mean.
7. Approximately 95% of the area of a normal distribution is within two standard deviations of
the mean.
Areas Under Normal Distributions

Figure 4 shows a normal distribution with a mean of 50 and a standard deviation of 10. The shaded
area between 40 and 60 contains 68% of the distribution.
Figure4 Normal distribution with a mean of 50 and standard deviation of

10. 68% of the area is within one standard deviation (10) of the mean (50)
Figure 5 shows a normal distribution with a mean of 100 and a standard deviation of 20. As in Figure
4, 68% of the distribution is within one standard deviation of the mean.
Figure 5 Normal distribution with a mean of 100 and standard deviation of

20. 68% of the area is within one standard deviation (20) of the mean (100).
The normal distributions shown in Figures 4 and 5 are specific examples of the general rule that 68% of
the area of any normal distribution is within one standard deviation of the mean.
Figure 6 shows a normal distribution with a mean of 75 and a standard deviation of 10. The shaded
area contains 95% of the area and extends from 55.4 to 94.6. For all normal distributions, 95% of the
area is within 1.96 standard deviations of the mean. For quick approximations, it is sometimes useful
to round off and use 2 rather than 1.96 as the number of standard deviations you need to extend from
the mean so as to include 95% of the area.
Figure 6. A normal distribution with a mean of 75 and a standard deviation

of 10. 95% of the area is within 1.96 standard deviations of the mean
Standard Normal Distribution

1 ( x   )2 /2
f (x)  e
If  = 0 and  = 1 then 2 . The distribution with this density function is called the
standardized normal distribution. The graph of the standardized normal density distribution is
shown in Figure 3.
0.5
0.4
0.3
0.2
0.1
0
-3.4
-3
-2.6
-2.2
-1.8
-1.4
-1
-0.6
-0.2
0.2
0.6
1
1.4
1.8
2.2
2.6
3
3.4
Figure 3 The standardized normal density distribution
If  is a normal random variable with the mean  and variance  then

1) the variable
 
z

is the standardized normal random variable.
P(     nσ)  2(n)
2) , where
x
1
 e t /2 dt
2
( x ) 
2 0
This function is called the Laplace function and it is tabulated.
Table 1. A portion of a table of the standard normal distribution.

The first column titled “Z” contains values of the standard normal distribution; the second column
contains the area below Z. Since the distribution has a mean of 0 and a standard deviation of 1, the Z
column is equal to the number of standard deviations below (or above) the mean.
For example, a Z of -2.5 represents a value 2.5 standard deviations below the mean. The area below Z
is 0.0062.
A value from any normal distribution can be transformed into its corresponding value on a standard
normal distribution using the following formula:
X 
z

where Z is the value on the standard normal distribution, X is the value on the original distribution, μ
is the mean of the original distribution, and σ is the standard deviation of the original distribution.
If all the values in a distribution are transformed to Z scores, then the distribution have a mean of 0
and a standard distribution. This process of transforming a distribution to one with a mean of 0 and a
standard deviation of 1 is called standardizing the distribution.
General Procedure
1. We first convert the problem into an equivalent one dealing with a normal variable measured
in standardized deviation units, called a standardized normal variable. To do this, if X ∼ N(μ,
σ2), then
X 
z

2. A table of standardized normal values can then be used to obtain an answer in terms of the
converted problem.
3. If necessary, we can then convert back to the original units of measurement. To do this, simply
note that, if we take the formula for Z, multiply both sides by σ, and then add μ to both sides,
we get
z=Xσ+μ
4. The interpretation of Z values is straightforward. Since σ = 1, if Z = 2, the corresponding X

value is exactly 2 standard deviations above the mean. If Z = -1, the corresponding X value is
one standard deviation below the mean. If Z = 0, X = the mean, i.e. μ.
Rules for using the standardized normal distribution
Recall that, for a random variable X: F(x) = P(X ≤ x)
1. P(Z ≤ a)
= F(a) (use when a is positive)
= 1 - F(-a) (use when a is negative)
Example 1: Find P(Z ≤ a) for a = 1.65, -1.65, 1.0, -1.0
P(Z ≤ 1.65) = F(1.65) = .95

P(Z ≤ -1.65) = F(-1.65) = 1 - F(1.65) = .05
Example 2: Find a for P(Z ≤ a) = .6026, .9750, .3446
P(Z ≤ .26) = .6026

P(Z ≤ 1.96) = .9750
P(Z ≤ -.40) = .3446 (since 1 - .3446 = .6554 = F(.40))
2. P(Z ≥ a)
= 1 - F(a) (use when a is positive)
= F(-a) (use when a is negative)
Example 3: Find P(Z ≥ a) for a = 1.5, -1.5
P(Z ≥ 1.5) = 1 - F(1.5) = 1 - .9332 = .0668

P(Z ≥ -1.5) = F(1.5) = .9332
3. P(a ≤ Z ≤ b) = F(b) - F(a)

Example 4: Find P(a ≤ Z ≤ b) for a = -1 and b = 1.5
To solve: determine F(b) and F(a), and subtract.

P(-1 ≤ Z ≤ 1.5) = F(1.5) - F(-1) = F(1.5) - (1 - F(1)) = .9332 - 1 + .8413 = .7745
4. For a positive, P(-a ≤ Z ≤ a) = 2F(a) - 1
Proof:
P(-a ≤ Z ≤ a)
= F(a) - F(-a) (by rule 3)
= F(a) - (1 - F(a)) (by rule 1)
= F(a) - 1 + F(a)
= 2F(a) - 1
Example 5: Find P(-a ≤ Z ≤ a) for a = 1.96, a = 2.58
P(-1.96 ≤ Z ≤ 1.96) = 2F(1.96) - 1 = (2 * .975) - 1 = .95

P(-2.58 ≤ Z ≤ 2.58) = 2F(2.58) - 1 = (2 * .995) - 1 = .99
For a positive, F(a) = [1 + P(-a ≤ Z ≤ a)] / 2
Example 6: Find a for P(-a ≤ Z ≤ a) = .90, .975
F(a) = (1 + .90)/2 = .95, implying a = 1.65.

For P(-a ≤ Z ≤ a) = .975,
F(a) = (1 + .975)/2 = .9875, implying a = 2.24
Note:
Suppose we were asked to find a and b for P(a ≤ Z ≤ b) = .90. There are an infinite number of
values that we could use; for example, we could have a = negative infinity and b = 1.28, or a =
-1.28 and b = positive infinity, or a = -1.34 and b = 2.32, etc.
The smallest interval between a and b will always be found by choosing values for a and b such
that a = -b.
For example, for P(a ≤ Z ≤ b) = .90, a = -1.65 and b = 1.65 are the “best” values to choose, since
they yield the smallest possible value for b - a.
Exercises:
Find the following probabilities.
1. P(0 < Z < 1.5) 5. P(Z > 1.8)
2. P(1.5 < Z < 1.8) 6. P(Z < 1.8)
3. P(−1.5 < Z < 0) 7. P(Z < −1.5)
4. P(−1.8 < Z < −1.5) 8. P(−1.5 < Z < 1.8)
Using the standardized normal distribution

1. The top 5% of applicants (as measured by GRE scores) will receive scholarships. If GRE ~ N(500,
1002), how high does your GRE score have to be to qualify for a scholarship?
Solution
Let X = GRE. We want to find x such that
P(X ≥ x) = .05
This is too hard to solve as it stands - so instead, compute

Z = (X - 500)/100 (NOTE: Z ~ N(0,1) )
and find z for the problem,

P(Z ≥ z) = .05
Note that P(Z ≥ z) = 1 - F(z) (Rule 2). If 1 - F(z) = .05, then F(z) = .95.
Looking at Table I in Appx E, F(z) = .95 for z = 1.65 (approximately).
Hence, z = 1.65.
To find the equivalent x, compute

x = (z × 100) + 500 = (1.65 × 100) + 500 = 665.
Thus, your GRE score needs to be 665 or higher to qualify for a scholarship.
2. Family income ~ N($25000, $10000 2). If the poverty level is $10,000, what percentage of the
population lives in poverty?
Solution
Let X = Family income. We want to find P(X ≤ $10,000). This is too hard to compute directly,
so let
Z = (X - $25,000)/$10,000.
If x = $10,000, then z = ($10,000 - $25,000)/$10,000 = -1.5.
So,
P(X ≤ $10,000) = P(Z ≤ -1.5) = F(-1.5) = 1 - F(1.5) = 1 - .9332 = .0668.
Hence, a little under 7% of the population lives in poverty.
3. A new tax law is expected to benefit “middle income” families, those with incomes between
$20,000 and $30,000. If Family income ~ N($25000, $10000 2), what percentage of the population
will benefit from the law?
Solution
Let X = Family income. We want to find P($20,000 X ≤ $30,000).
To solve, let
Z = (X - $25,000)/$10,000.
Note that when x = $20,000, z = ($20,000 - $25,000)/$10,000 = -0.5,

and when x = $30,000, z = +0.5.
Hence,
P($20,000 ≤ X ≤ $30,000) = P(-.5 ≤ Z ≤.5) = 2F(.5) - 1 = 1.383 - 1 = .383.
Thus, about 38% of the taxpayers will benefit from the new law.
4. An expert witness in a paternity suit testifies that the length (in days) of human gestation is
approximately normally distributed with parameters μ = 270 and σ2 = 100. The defendant in the
suit is able to prove that he was out of the country during a period that began 290 days before the
birth of the child and ended 240 days before the birth. If the defendant was, in fact, the father of
the child, what is the probability that the mother could have had the very long or very short
gestation indicated by the testimony?
Solution
Let X denote the length of the gestation, and assume that the defendant is the father. Then the
probability that the birth could occur within the indicated period is
P  X > 290 or X < 240  P  X  290  P  X  240
 X  270   X  270 
 P  2  P   3 
 10   10 
 1    2  1    3 
~ 0.0241
IV. NORMAL APPROXIMATION TO THE BINOMIAL POISSON DISTRIBUTION
Although the normal distribution is continuous, it is interesting to note that it can sometimes be used
to approximate discrete distributions. Namely, we can use normal distribution to approximate
binomial probability distribution.
Suppose we have a binomial distribution defined by two parameters: the number of trials n and the
probability of success p. The normal distribution with the parameters  and  will be a good
approximation for that binomial distribution if both
  2  np  2 np(1  p)   2  np  2 np( 1  p)
and lie between 0 and n.
If X ~ B(n, p) and n is large and np is not too near 0 or 1, then X is approximately N(np, np(1-p)).
For example, the binomial distribution with n = 10 and p = 0.5 is well approximated by the normal
σ  np( 1  p)
distribution with  = np = 10×0.5 = 5.0 and = 0.5× 10 = 1.58. See Figure 7 or Table 2.
0.3
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6 7 8 9 10
Figure 7 Approximation of binomial distribution (bar graph) with n=10, p=0.5 by a normal distribution
(smoothed curve)
Table 2The binomial and normal probability distributions for the same values of x
X Binomial Normal
distribution distribution
0 0.000977 0.0017
1 0.009766 0.010285
2 0.043945 0.041707
3 0.117188 0.113372
4 0.205078 0.206577
5 0.246094 0.252313
6 0.205078 0.206577
7 0.117188 0.113372
8 0.043945 0.041707
9 0.009766 0.010285
10 0.000977 0.0017
The probability of getting k from the Binomial distribution can be approximated as the probability
under a Normal distribution for getting x in the range from k – ½ to k + ½ .
6.5
f  x  dx
For example P(k ≤ 6) can be approximated as   where f(x) is the Normal distribution:
The normal approximation will, in general, be quite good for values of n satisfying  = np(1 − p)  10.
Examples.
1. Suppose 50% of the population approves of the job the governor is doing, and that 20 individuals
are drawn at random from the population. Solve the following, using both the binomial
distribution and the normal approximation to the binomial.
a. What is the probability that exactly 7 people will support the governor?
b. What is the probability that 7 or fewer people will support the governor?
c. What is the probability that exactly 11 will support the governor?
d. What is the probability that 11 or fewer will support the governor?
Solution:
Note that N = 20, p = .5, so μ = Np = 10 and σ = Npq = 5, σ = 2.236. Since Npq ≥ 3, it is probably safe to
assume that X has approximately a N(10, 5) distribution.
a. For the binomial, find P(X = 7). Appx. E, Table II shows P(7) = .0739.
For the normal, find P(6.5 ≤ X ≤ 7.5).
We convert 6.5 and 7.5 to their corresponding z-scores (-1.57 and -1.12), and the problem
becomes finding
P(-1.57 ≤ Z ≤ -1.12) = F(1.57) - F(1.12) = .9418 - .8686 = .0732.
b. To use the binomial distribution, find P (X ≤ 7). Using Appx. E, we get

P(7) + P(6) + P(5) + P(4) + P(3) + P(2) + P(1) + P(0)
= .0739 + .0370 + .0148 + .0046 + .0011 + .0002 + 0 + 0 = .1316.
To use the normal approximation to the binomial, find P(X ≤ 7.5).
As noted above, the z-score that corresponds to 7.5 is -1.12.
F(-1.12) = 1 - F(1.12) = 1 - .8686 = .1314.
c. For the binomial, find P(X = 11). Appx. E shows P(11) = .1602.
For the normal, find P(10.5 ≤ X ≤ 11.5).
If we convert 10.5 and 11.5 to their corresponding z-scores, the problem becomes a matter of
finding
P(.22 ≤ Z ≤.67) = F(.67) - F(.22) = .7486 - .5871 = .1615.
d. For the binomial, find P(X ≤ 11). From Appx E Table 2, you can determine that this is .7483.
For the normal, find P(X ≤ 11.5).
The z-score that corresponds to 11.5 is .67, and F(.67) = .7486.
In all of the above, note that the results obtained using the binomial distribution and the
normal approximation to the binomial are almost identical.
2. In each of 25 races, the Democrats have a 60% chance of winning. What are the odds that the
Democrats will win 19 or more races? Use the normal approximation to the binomial.
Solution.
Np = 15, Npq = 6, so X ~ N(15, 6).
Using the normal approximation to the binomial, we want to find P(X ≥ 18.5).
Let Z = (X - 15)/√6. When x = 18.5, z = 3.5/√6 = 1.43.
Hence,
P(X ≥ 18.5) = P(Z ≥ 1.43) = 1 - F(1.43) = 1 - .9236 = .0764.
Hence, Democrats have a little less than an 8% chance of winning 19 or more races.
Incidentally, note that, since N = 25 is not included in Appendix E, Table II, it would be very
tedious to calculate this using the binomial distribution.
3. In a family of 11 children, what is the probability that there will be more boys than girls? Use the
normal approximation to the binomial.
Solution
μ = Np = 5.5, σ5 = Npq = 2.75, so X ∼ N(5.5, 2.75).
If we were using the binomial distribution, we would find P(X ≥ 6); since we are using the
normal approximation to the binomial, we find P(X ≥ 5.5).
Hence,
P(X ≥ 5.5) = P(Z ≥ 0) = .5.
4. Let X be the number of times that a fair coin that is flipped 40 times lands on heads. Find the
probability that X = 20. Use the normal approximation and then compare it with the exact
solution.
Solution
To employ the normal approximation, note that because the binomial is a discrete integer-
valued random variable, whereas the normal is a continuous random variable, it is best to
write P{X = i} as P{i − 1/2 < X < i + 1/2} before applying the normal approximation (this is called
the continuity correction). Doing so gives
P  X  20  P  19.5  X  20.5
 19.5  20 X  20 20.5  20 
P   
 10 10 10 
 X  20 
~ P 0.16   0.16 
 10 
   0.16     0.16   0.1272
The exact result is

 40   1 
40
P  X  20      0.1254
 20   2 
5. The ideal size of a first-year class at a particular college is 150 students. The college, knowing from
past experience that, on the average, only 30 percent of those accepted for admission will actually
attend, uses a policy of approving the applications of 450 students. Compute the probability that
more than 150 first-year students attend this college.
Solution
If X denotes the number of students that attend, then X is a binomial random variable with
parameters n = 450 and p = .3. Using the continuity correction, we see that the normal
approximation yields
 X   450   0.3  150.5   450   0.3  

P  X  150.5  P   
 450  0.3   0.7  450  0.3   0.7  
~ 1    1.59 
 0.0559
Hence, less than 6 percent of the time do more than 150 of the first 450 accepted actually
attend. (What independence assumptions have we made?)
6. To determine the effectiveness of a certain diet in reducing the amount of cholesterol in the
bloodstream, 100 people are put on the diet. After they have been on the diet for a sufficient
length of time, their cholesterol count will be taken. The nutritionist running this experiment has
decided to endorse the diet if at least 65 percent of the people have a lower cholesterol count after
going on the diet. What is the probability that the nutritionist endorses the new diet if, in fact, it
has no effect on the cholesterol level?
Solution
Let us assume that if the diet has no effect on the cholesterol count, then, strictly by chance,
each person’s count will be lower than it was before the diet with probability 12. Hence, if X is
the number of people whose count is lowered, then the probability that the nutritionist will
endorse the diet when it actually has no effect on the cholesterol count is
  1 
100 100 100
     P  X  64.5
i  65  i   2 
 X   100   1 / 2  
 P  2.9 
 100  1 / 2   1 / 2  
 1    2.9 
 0.0019
Normal approximation to the Poisson
If Y  Poisson parameter and is large (> 7, say), then Y has approximately a N(, ) distribution.
Example: Stock Control

At a given hospital, patients with a particular virus arrive at an average rate of once every five days.
Pills to treat the virus (one per patient) have to be ordered every 100 days. You are currently out of
pills; how many should you order if the probability of running out is to be less than 0.005?
Solution
Assume the patients arrive independently, so this is a Poisson process, with rate 0.2/day.
Therefore, Y, number of pills needed in 100 days, ~ Poisson, = 100 × 0.2 = 20.
We want, or P(Y > n) < 0.0005 or P(Y ≤ n + ½) > 0.995 under the Normal approximation,
where a probability of 0.995 corresponds (from tables) to z = 2.575.
Since  = =  = 20 this corresponds to n + ½ = 20 + 2.575√20 = 31.5 , so we need to order n >

32 pills.
Don’t use approximations that are too simple if their failure might be important! Rare
events in particular are often a lot more likely than predicted by (too-) simple approximations
for the probability distribution.
V. EXPONENTIAL DISTRIBUTION
Definition
The random variable X that equals the distance between successive counts of a Poisson
process with mean  > 0 is an exponential random variable with parameter . The
probability density function of X is
f  x   ex for 0  x  
The exponential distribution obtains its name from the exponential function in the probability
density function.
For any value of , the exponential distribution is quite skewed. The following results are easily
obtained.
If the random variable X has an exponential distribution with parameter ,

1 1
 E X  and  2  V  X   2
 
It is important to use consistent units in the calculation of probabilities, means, and variances
involving exponential random variables.
Four exponential probability distribution functions are shown in figure 1 on the same scale. Note that
they all have the same shape. The greater the rate , the more likely it is that the corresponding
exponential random variable takes a small value. This makes sense: if the events are occurring at a
high rate, it will tend to be a short time until the first event, and vice versa.
Exponential distribution with  = 1.33 Exponential distribution with  = 1
Exponential distribution with  = 0.8 Exponential distribution with  = 0.67

An exponential random variable can be regarded as the waiting time until the first event in a Poisson
process with rate .
It is appropriate to think of a ‘random process’ in which events occur in time, independently of each
other, at a rate per unit time. This means that processes that are systematic (such as train timetables)
or approximately regular (the arrival of waves on a beach) are not Poisson processes.
Examples of phenomena that might be suitably modelled with this distribution include:
 radioactive decay
 the occurrences of a rare disease in a large population
 arrival of a packet of information on the internet.
Lack of Memory Property

An even more interesting property of an exponential random variable is concerned with conditional
probabilities.
Roughly speaking, it is as the name suggests: the process ‘does not remember what has happened up
until now’ and the distribution of the waiting time, given that it has already exceeded some amount of
time t0, has the same exponential-distribution form, just translated by t0.
The lack of memory property is quite readily established. For t0 > 0 and t > t0:
P  X >t and X >t 0 
P  X  t X  to   (rule of conditional probability)
P  X  t0 
P  X t 
= (since "X > t"  "X > t0 ")
P  X  t0 
 e    t  t0 

Lecture Notes 4 Continuous Probability Distributions

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Notes 4 Continuous Probability Distributions

Uploaded by

Copyright:

Available Formats

Lecture Notes 4 – Continuous Probability Distributions 1

Engr. Caesar Pobre Llapitan

I. CONTINUOUS RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTION

Figure 1 Density function f(x) for a continuous random variable

P(a ≤ X ≤ b) = area of shaded region

Figure 2 Probability density function f

Properties of a density function

a) What is the value of C?

Since f is a probability density function, we must have 

What is the probability that

Differentiating both sides of the preceding equation yields

Determine the following:

II. EXPECTED VALUES OF CONTINUOUS RANDOM VARIABLES

The standard deviation of X is the positive square root of the variance σ =√ σ 2

Example: Disk wait times

For 2000 reads the mean time is 2000 × 4.2 ms = 8.3s.

The variance is 2000 5.7ms2 = 0.012s2, so  = 0.11s.

III. NORMAL DISTRIBUTION

A density curve is a curve that:

Distinguishing the Median and Mean of a Density Curve

The density function:

Figure 2 Several normal distributions:

The 68-95-99.7 Rule

Characteristics of the Normal distribution

3. The area under the normal curve is equal to 1.0.

Areas Under Normal Distributions

Figure4 Normal distribution with a mean of 50 and standard deviation of

Figure 5 Normal distribution with a mean of 100 and standard deviation of

Figure 6. A normal distribution with a mean of 75 and a standard deviation

Standard Normal Distribution

Figure 3 The standardized normal density distribution

If  is a normal random variable with the mean  and variance  then

is the standardized normal random variable.

This function is called the Laplace function and it is tabulated.

Table 1. A portion of a table of the standard normal distribution.

4. The interpretation of Z values is straightforward. Since σ = 1, if Z = 2, the corresponding X

Rules for using the standardized normal distribution

Recall that, for a random variable X: F(x) = P(X ≤ x)

Example 1: Find P(Z ≤ a) for a = 1.65, -1.65, 1.0, -1.0

P(Z ≤ 1.65) = F(1.65) = .95

Example 2: Find a for P(Z ≤ a) = .6026, .9750, .3446

P(Z ≤ .26) = .6026

Example 3: Find P(Z ≥ a) for a = 1.5, -1.5

P(Z ≥ 1.5) = 1 - F(1.5) = 1 - .9332 = .0668

3. P(a ≤ Z ≤ b) = F(b) - F(a)

Example 4: Find P(a ≤ Z ≤ b) for a = -1 and b = 1.5

To solve: determine F(b) and F(a), and subtract.

4. For a positive, P(-a ≤ Z ≤ a) = 2F(a) - 1

Example 5: Find P(-a ≤ Z ≤ a) for a = 1.96, a = 2.58

P(-1.96 ≤ Z ≤ 1.96) = 2F(1.96) - 1 = (2 * .975) - 1 = .95

For a positive, F(a) = [1 + P(-a ≤ Z ≤ a)] / 2

Example 6: Find a for P(-a ≤ Z ≤ a) = .90, .975

F(a) = (1 + .90)/2 = .95, implying a = 1.65.

Using the standardized normal distribution

This is too hard to solve as it stands - so instead, compute

and find z for the problem,

Looking at Table I in Appx E, F(z) = .95 for z = 1.65 (approximately).

To find the equivalent x, compute

If x = $10,000, then z = ($10,000 - $25,000)/$10,000 = -1.5.

Hence, a little under 7% of the population lives in poverty.

Note that when x = $20,000, z = ($20,000 - $25,000)/$10,000 = -0.5,

IV. NORMAL APPROXIMATION TO THE BINOMIAL POISSON DISTRIBUTION

b. To use the binomial distribution, find P (X ≤ 7). Using Appx. E, we get