ES209 Module 4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

ES209

Engineering Data Analysis


Module No. 04
Topic Continuous Probability Distributions
Period Week no. 04 Date: September 26 - October 1,2022

Continuous Probability Distributions

Introduction
Hello dear young engineers!
The module Discrete probability distributions introduces the fundamentals of
random variables, noting that they are the numerical outcome of a random procedure.

Most of the examples considered in last module involve counts of some sort: the number
of things, or people, or occurrences, and so on. When we count, the outcome is definitely
discrete: it can only take integer values, and not other numerical values.

However, many measured variables are not like this. Rather, they take a value in a specified range
(for example, a variable might be positive), but within that range they can take any numerical
value. The paradigm phenomenon in this category is ‘time’.

Many other variables are measured on a continuum like this. These variables include height,
weight, blood pressure, temperature, distance, speed and many others. We need a way to
represent the probability distribution of such continuous variables, and the purpose of this
module is to describe this.

There are different ways to describe the probability distribution of a continuous random
variable. In this module, we introduce the cumulative distribution function and the probability
density function. We shall see that probabilities associated with a continuous random variable
are given by integrals. This module also covers the mean and variance of a continuous random
variable.

Objective/Intended Learning Outcomes

At the end of this module, you are expected to:

• understand the use of continuous probability distributions and the use of area to calculate
probabilities
• be able to use probability functions to calculate probabilities and find measures such as the
mean and variance
1
Continuous Probability Distribution
A continuous distribution has a range of values that are infinite, and therefore uncountable.
For example, time is infinite: you could count from 0 seconds to a billion seconds…a trillion
seconds…and so on, forever.

For a discrete random variable X the probability that X assumes one of its possible values
on a single trial of the experiment makes good sense. This is not the case for a continuous
random variable. For example, suppose X denotes the length of time a commuter just
arriving at a bus stop has to wait for the next bus. If buses run every 30 minutes without
fail, then the set of possible values of X is the interval denoted [0,30], the set of all decimal
numbers between 0 and 30. But although the number 7.211916 is a possible value of X,
there is little or no meaning to the concept of the probability that the commuter will wait
precisely 7.211916 minutes for the next bus. If anything the probability should be zero,
since if we could meaningfully measure the waiting time to the nearest millionth of a minute
it is practically inconceivable that we would ever get exactly 7.211916 minutes. More
meaningful questions are those of the form: What is the probability that the commuter's
waiting time is less than 10 minutes, or is between 5 and 10 minutes? In other words, with
continuous random variables one is concerned not with the event that the variable assumes
a single particular value, but with the event that the random variable assumes a value in a
particular interval.

are used to model continuous phenomena or quantities, such as


Continuous Random Variables We refer to continuous random variables with
time, length, mass, ... that depend on chance. capital letters, typically X, Y, Z, ... .

For instance the heights of people selected at random would correspond to possible values
of the continuous random variable X defined as:

X : height, in cm, of a person selected at random

When working with continuous random variables, such as X, we only calculate the
probability that X lie within a certain interval; like P(X ≤ k) or P(a ≤ X ≤b).

We don't calculate the probability of X being equal to a specific value x. In fact that
following result will always be true:
P(X = k) = 0

This can be explained by the fact that the total number of possible values of a continuous
random variable X is infinite, so the likelihood of any one single outcome tends towards 0

2
Calculating Probabilities
Probability Density Function (PDF)
Probability mass function (PMF) and (probability) density function (PDF) are two
names for the same notion in the case of discrete random variables. We say PDF or
simply a density function for a general random variable, and we use PMF only for
discrete random variables.

Cumulative Density Function (CDF)

a.k.a the cumulative distribution function

PDF and CDF is defined, further down, but the idea is to integrate the probability density
function f(x) to define a new function F(x), known as the cumulative density function.

To calculate the probability that X be within a certain range, say a ≤ X ≤ b, we calculate


F(b)−F(a), using the cumulative density function.

Put "simply" we calculate probabilities as:

where f(x) is the variable's probability density function.

Probability Density Function (PDF)

Given a continuous random variable X, its probability density function f(x) is the function
whose integral allows us to calculate the probability that X lie within a certain range, P(a
≤ X ≤ b).

The curve y = f(x) serves as the "envelope", or contour, of the probability distribution.

3
Properties of the PDF

❏ Probability density functions are always greater than or equal to 0:


❏ The area enclosed by a probability density function and the horizontal axis equals to 1:

❏ If the area isn't equal to 1 then X is not a continuous random variable.

Examples:
The piecewise function defined as:

could be the probability density function for some continuous random variable X.

Indeed, we can see from its graph (Fig 4.1) that f(x) ≥ 0. Furthermore we can check
that the area enclosed by the curve and the x-axis equals to 1:

Figure 4.1
4
Cumulative Density Function (CDF)

Given a continuous random variable X and its probability density function f(x), the
cumulative density function, written F(x), allows us to calculate the probability that X be less
than, or equal to, any value of x, in other words: P(X ≤ x) = F(x).
Where:

Three types of probability calculations:

Calculating P(X ≤ k)
Since F(x) = P(X ≤ x) we write:

This "tells us" that the probability that the continuous random variable X be less than or equal
to some value k equals to the area enclosed by the probability density function and the
horizontal axis, between −∞ and k.

Calculating P(a ≤ X ≤ b)
To calculate the probability that a continuous random variable X, lie between two values say a
and b we use the following result:

Calculating P(X ≥ k)
To calculate the probability that a continuous random variable X be greater than some value k
we use the following result:

5
Expected Value and Variance of Continuous Random Variables

If X is a continuous random variable with pdf f(x) , then the expected value (or mean) of
X is given by

For the variance of a continuous random variable:

σ² =
Example:
Let the random variable X denote the time a person waits for an elevator to arrive. Suppose the
longest one would need to wait for the elevator is 2 minutes, so that the possible values of X (in
minutes) are given by the interval [0,2]. A possible pdf for X is given by

The graph of f(x) is given in Figure 4.2, and we verify that f(x) has satisfies the properties of
PDF
1. From the graph, it is clear that f(x)≥0 , for all x∈R .
2. Since there are no holes, jumps, asymptotes, we see that f(x) is (piecewise)
continuous.
3. Finally we compute:

Figure 4.2
6
So, if we wish to calculate the probability that a person waits less than 30 seconds (or 0.5
minutes) for the elevator to arrive, then we calculate the following probability using the pdf
if we wish
to calculate the expected value of X, then:

Thus, we expect a person will wait 1 minute for the elevator on average. Figure 4.3
demonstrates the graphical representation of the expected value as the center of mass of the
pdf

Figure 4.3: The


red arrow
represents the
center of mass,
or the expected
value, of X

Now we calculate the variance and standard deviation of X, by first finding the expected value
of X2

Thus, we have

7
Example 2:
A continuous random variable X has probability density function defined as:

1. Find P(X ≤ 1.5).


2. Find P(0.5 ≤ X ≤1).
3. Find P(X ≥ 1).

Solution:

1. To find P(X ≤ 1.5), we use write:

8
Graphically, this result can be interpreted as follows:
The area enclosed by the probability density functions curve and the horizontal axis, from
−∞ upto x = 1.5 is equal to 0.844 (rounded to 3 significant figures).
The probability is equal to the area so: P(X ≤ 1.5) = 0.844

2. To find P(0.5 ≤ X ≤ 1) we write:

9
Graphically, this result can be interpreted as follows:
The area enclosed by the probability density functions curve and the horizontal axis,
between x = 0.5 and x = 1 is equal to 0.344 (rounded to 3 significant figures). The
probability is equal to the area so: P(0.5 ≤ X ≤ 1) = 0.344

2. To find P(X ≥ 1) we write:

10
Graphically, this result can be interpreted as follows:
The area enclosed by the probability density functions curve and the horizontal axis,
between x = 1 and beyond is equal to 0.5.
The probability is equal to the area so: P(X ≥ 1) = 0.5.

Note: we could have stated this result directly, without integrating, as x=1 is the axis of

symmetry of the parabola

11
Important Result

When working with continuous random variables the following results will always be true:
P(X ≤ k) = P(X < k)
and
P(X ≥ k) = P(X > k)

Consequently, the following four probabilities are equal and calculated in exactly the same
way:

12

Practice Activity 01
Write your answer in a piece of paper. Show your solution. Take a picture of your answers
and submit it in our Google classroom.
filename format: Lastname_Activity_1.pdf

13

Normal Distribution
Normal distribution, also known as the Gaussian distribution, is a probability distribution
that is symmetric about the mean, showing that data near the mean are more frequent in
occurrence than data far from the mean. X ~ N(µ,σ)

In graphical form, the normal distribution appears as a "bell curve".


The Normal Distribution has:
● mean = median = mode
● symmetry about the center
● 50% of values less than the mean and 50% greater than the mean

All normal distributions can be described by just two parameters: the mean
and the standard deviation.
Skewness measures the degree of symmetry of a distribution.The normal distribution is
symmetric and has a skewness of zero.

Kurtosis measures the thickness of the tail ends of a distribution in relation to the tails
of a distribution. The normal distribution has a kurtosis equal to 3.0.

14
Formula of Normal Distribution
The normal distribution follows the following formula. Note that only the values of the
mean (μ ) and standard deviation (σ) are necessary
where:
x = value of the variable or data being examined and f(x) the probability function
μ = the mean
σ = the standard deviation

The Empirical Rule

For all normal distributions, 68.2% of the observations will appear within plus or minus
one standard deviation of the mean; 95.4% of the observations will fall within +/- two
standard deviations; and 99.7% within +/- three standard deviations. This fact is
sometimes referred to as the "empirical rule," a heuristic that describes where most of the
data in a normal distribution will appear.

This means that data falling outside of three standard deviations ("3-sigma") would signify

rare occurrences.

The number of standard deviations from the mean is also called the "Standard Score",
"sigma" or "z-score".

z-score gives you an idea of how far from the mean a data point is. But more technically it’s
a measure of how many standard deviations below or above the population mean a raw
score is. (Standard Score) x is the value to be
standardized μ ('mu") is the mean
σ ("sigma") is the standard deviation
z-score formula:
where:
z is the "z-score"

15
Example 1:

Given X ~ N(50,10), (a) what are the values of the mean and standard deviation? (b)
What value of x has a z-score of 1.4? (c) What isthe z-score that corresponds to x=30?
(d) What is the difference between positive and negative z values?

Solution:
a. X ~ N(µ,σ) so: µ = 50 and σ = 10

a. Given z = 1.4, x = ?
we know that the formula for z-score is z = (x - µ) / σ so:
x = µ + zσ = 50 + 1.4 * 10
x = 64

a. Given x = 30, find z = ?


Using z-score formula:
z = (x - µ) / σ
= (30 - 50) / 10
z = -2

a. as shown in the number line below, negative z value corresponds to the x values
below the mean, while positive z value corresponds to x values above the mean

16
Example 2:

95% of students at school are between 1.1m and 1.7m tall

Assuming this data is normally distributed can you calculate the mean and standard
deviation?

The mean is halfway between 1.1m and 1.7m:

Mean = (1.1m + 1.7m) / 2 = 1.4m


95% is 2 standard deviations either side of the mean (a total of 4 standard deviations) so:

1 standard deviation = (1.7m-1.1m) / 4


= 0.6m / 4
= 0.15m

And this is the result:

In that same school one of your friends is 1.85m tall

You can see on the bell curve that 1.85m is 3 standard deviations from the mean of 1.4, so:

Your friend's height has a "z-score" of 3.0

It is also possible to calculate how many standard deviations 1.85 is from the mean

How far is 1.85 from the mean?


It is 1.85 - 1.4 = 0.45m from the mean

How many standard deviations is that? The standard deviation is 0.15m,

so: 0.45m / 0.15m = 3 standard deviations

17

Normal Approximation to the Binomial and Poisson Distribution


The normal distribution can be a good approximation to a discrete distribution when the discrete
distribution takes on a symmetric bell-shape.
Formula for Normal Approximation:

Binomial Approximation: if x is a binomial random variable, then


is approximately a standard normal random variable when n is large.
The approximation if good for
µ = np > 5 and n(1-p) > 5 and σ2 = np(1-p)

Poisson Approximation:
The Poisson distribution was developed as the limit of a Binomial distribution as
the number of trials increased to infinity. Consequently, the Normal distribution cal also
approximate probabilities of a Poisson random variable.

if x is Poisson random variable with µ = E[x] = V[x] = λ, then


is approximately a standard normal random variable if:
µ = λ and σ2 = λ and λ > 5

Using the Continuity Correction Factor:

Example
Golfer hits driver shot in the fairway 72% of the time.
a. What is the probability he hits 43 fairway drives in 54 total driver shots? b.
What is the probability he hits at most 39 fairway drives in 54 total driver
shots?
c. What is the probability he hits more than 41 fairway drives in 54 total
driver shots?

Solution:
Given n = 54 p = 72% (0.72) q = 0.28 or (1-p) so,
mean = np = 54 * .072 = 38.88
standard deviation = sqr(npq) = sqr(54*0.72*0.28) = 3.299

18
Does it meet the conditions for Normal Approximation?
1. It meets the requirements for Binomial distribution
a. fixed number of trials
b. only 2 outcomes
c. independent
d. probability is the same
2. n * p >= 5 and n * q >= 5
so we apply correction for continuity,

a. P(X = 43), looking at the correction for continuity table we use: P(42.5 < x
< 43.5) to calculate, we convert 42.5 and 43.5 to Z
scores:
z = (x-µ)/σ
z1 = (42.5 - 38.88) / 3.299 = 1.09
z2 = (43.5 - 38.88) / 3.299 = 1.40

then look for the z-score table: 1.09 is 0.8621 and 1.40 = 0.9192, to
find the probability of these values then:
z2- z1 = 0.9192 - 0.8621 = 0.057 or 5.7%

so, there is a 5.7% probability that the golfer hits 43 of the 54 drives in the fairway

b. P(X <= 39) so P(x < 39.5) then convert to z-score:


z = (39.5 - 38.88) / 3.299 = 0.19 which has 0.5753 probability using
z-score table.
so, there is 0.5753 or 57.53% that the golfer hits at most 39 of 54 drives in the
fairway

c. P(X > 41) so P(x > 41.5)


z = (41.5 - 38.88) / 3.299 = 0.794 which has 0.7852 probability
using z-score table.
since we want the area greater than the z area then:
P(x > 41.5) = 1 - 0.7852 = 0.2148 or 21.48%

so, there is 0.2148 or 21.48% that the golfer hits at most 41 of 54 drives in the
fairway

19

Practice Activity 02
1. Suppose a manufacturing company specializing in semiconductor chips
produces 50 defective chips out of 1,000. If 100 chips are sampled
randomly, without replacement, approximate the probability that at least
1 of the chips is flawed in the sample.

1. Suppose the manufacturing company specializing in semiconductor chips


has a mean production of 10,000 chips per day. Approximate the
expected number of days in a year that the company produces more than
10,200 chips in a day.
Write your answer in a piece of paper. Show your solution. Take a picture of your answers
and submit it in our Google classroom.
filename format: Lastname_Activity_2.pdf

20

Exponential distribution

Exponential distribution is a continuous probability distribution that often concerns the


amount of time until some specific event happens. It is a process in which events happen
continuously and independently at a constant average rate. The exponential distribution
has the key property of being memoryless. The exponential random variable can be either more
small values or fewer larger variables.

The continuous random variable, say X is said to have an exponential distribution, if it has the
following probability density function:
exponential distribution has a parameter λ > 0, shown as X∼Exponential(λ). where λ is
called the distribution rate

Figure 4.5 shows the PDF of exponential distribution for several values of λ.

Fig.4.5 - PDF of the exponential random


variable

21
Mean and Variance of Exponential Distribution

Mean:
The mean of the exponential distribution is calculated using the integration by
parts.

µ= µ=EX
Variance:

Thus,

we obtain If
X∼Exponential(λ), then µ = 1/λ and σ2 = 1/λ2

22
Example
Laptops produced by company XYZ last, on average, for 5 years. The life span of each
laptop follows an exponential distribution.
(a). Calculate the rate parameter.
(b) Write the probability density function and graph it.
(c) What is the probability that a laptop will last less than 3 years?
(d) What is the probability that a laptop will last more than 10 years?
(e) What is the probability that a laptop will last between 4 and 7
years?

Solution:

(a) given that the value of mean or µ = 5 yrs so we get the rate parameter:
λ = 1/µ = ⅕ = 0.20 yrs-1
(a) using the pdf of exponential distribution,
f(x) = λe-λx = 0.20e-0.20x
so the graph will look like this:
(c) What is the probability that a laptop will last less than 3 years?
looking at figure below, we are looking for the probability to the area shaded with red

so we look for the area to the left:


AL = P(X<x) = 1 - e-λx
P(X<3) = 1 - e-0.20(3) = 0.4512 or 45.12%
23
Example
(d) What is the probability that a laptop will last more than 10 years?

so we look for the area to the right:


AR = P(X>x) = e-λx
P(X>10) = e-0.20(10) = 0.1353 or 13.53%
(e) What is the probability that a laptop will last between 4 and 7
years? P(4 < x < 7) = P(x < 7) - P(x < 4)

P(x<7)

so,
P(4 < x < 7) = (1 - e-0.20(7)) - (1 - e-0.20(4)) = 0.75340 -
0.55067 P(4 < x < 7) = 0.20273 = 20.27%

24

Practice Activity 03
The time spent waiting between events is often modeled using the exponential
distribution. For example, suppose that an average of 30 customers per hour
arrive at a store and the time between arrivals is exponentially distributed.

Problem
a. On average, how many minutes elapsed between two successive arrivals?
b. When the store first opens, how long on average does it take for three
customers to arrive?
c. After a customer arrives, find the probability that it takes less than one
minute for the next customer to arrive.
d. After a customer arrives, find the probability that it takes more than five
minutes for the next customer to arrive.
e. Is an exponential distribution reasonable for this situation?
Write your answer in a piece of paper. Show your solution. Take a picture of your answers
and submit it in our Google classroom.
filename format: Lastname_Activity_3.pdf

25

References

• https://www.radfordmathematics.com/probabilities-and-statistics/continuous-probability
distributions/continuous-random-variables-probability-ditributions/probability-density
functions-continuous-random
variables.html?fbclid=IwAR0FjniGORHMRf6a9PCweDNml5YAeXb0CeDoQR1D81A1Kt
n8aIFk0F7ozTE

• https://www.mathsisfun.com/data/standard-normal-distribution.html
• https://byjus.com/maths/exponential-distribution/

• https://stats.libretexts.org/Courses/Saint_Mary's_College_Notre_Dame/MATH_345__-
_Probability_(Kuter)/4%3A_Continuous_Random_Variables/4.2%3A_Expected_Value_an
d_Variance_of_Continuous_Random_Variables#:~:text=%CE%BC%3D%CE%BCX%3D
E%5B,recall%20Sections%203.6%20%26%203.7).
26

You might also like