RM185101 - Applied Statistics and Probability

Ira Mutiara Anjasmara, PhD

Department of Geomatics Engineering

Faculty of Civil, Planning, and Geo Engineering
Institut Teknologi Sepuluh Nopember

Measurements are assumed to be random variables, with

each measurement representing an individual sample in a
random distribution.

Probability is the numerical measure of the chance or

likelihood that a particular event will occur.

A random variable is the numerical description of the

outcome of an experiment.

Random Variable

There are two types of random variable:

• discrete: the variable may only assume certain particular

• continuous: the variable can assume any real value within

a certain range
Discrete and continuous random variables are treated differently.
They are termed “random” because they are a result of chance.

Example 1

Consider an experiment that consists of flipping a coin twice. If

H indicates a head and T a tail, the possible outcomes for this
experiment are:

(H, H) (H, T) (T, H) (T, T)

The number of heads occurring in this experiment can be 0, 1

or 2. Therefore, the number of heads is a random variable in
that it can assume values of 0, 1, or 2.

Example 2

The mean score of Statistics students in the mid-semester test

can be anything between 0% and 100%. This value is a random
variable that can assume any value in the range 0 - 100.

Discrete Probability Distributions

A probability function, f (x), gives the probability that a

particular random variable will assume a particular value.
A probability distribution is a table, graph or mathematical
formula that shows all possible values of the random variable, x,
and the associated probability function, f (x).
The sum of all possible outcomes in a probability distribution =
1: ∞
f (x) = 1 (1)

There are two types of probability distributions: discrete and

continuous. Here we focus upon the discrete case.
Example 3

Tossing 2 coins
From 1 coin: p(H) = 0.5 , p(T) = 0.5.
Outcomes for 2 coins: (H, H) (H, T) (T, H) (T, T)
If we let our discrete random variable, x, be the number of

f (2) = p(x = 2) = 0.5 × 0.5 = 0.25

f (1) = p(x = 1) = (0.5 × 0.5) + (0.5 × 0.5) = 0.50
f (0) = p(x = 0) = p(2T ) = 0.5 × 0.5 = 0.25

Note that f (x) = 1
Example 4

Roll of one 6-sided dice.

There is the same probability for each outcome:

x 1 2 3 4 5 6
f (x) 1/6 1/6 1/6 1/6 1/6 1/6

The probability function is:

no. of occurances of x
f (x) =
total no. of outcomes

Example 5
Probability distribution for the sum of two 6-sided dice.
It can be given as a table:
x f (x) Dice rolls
2 1/36 1,1
3 2/36 1,2 2,1
4 3/36 1,3 2,2 3,1
5 4/36 1,4 2,3 3,2 4,1
6 5/36 1,5 2,4 3,3 4,2 5,1
7 6/36 1,6 2,5 3,4 4,3 5,2 6,1
8 5/36 2,6 3,5 4,4 5,3 6,2
9 4/36 3,6 4,5 5,4 6,3
10 3/36 4,6 5,5 6,4
11 2/36 5,6 6,5
12 1/36 6,6

Example 5

Or as a graph:

Expected value

The expected value of a random variable gives us the mean

value for the random variable:
E[x] = µ = xf (x)

NOTE: µ is population mean, not sample mean.

For discrete random variables the expected value is not
necessarily discrete:
e.g., expected number of children for an Australian family
= 2.4.
This figure does not make sense in the original data set.

The variance is used to provide a measure of the dispersion or
variability of the random variable, x. As with the mean, we refer
to the population variance. As with frequency distributions,
there are two ways to compute it:
var[x] = σ 2 = {(x − µ)2 f (x)} = {x2 f (x)} − µ2

The variance measures how far the value of a particular random

variable is from the expected value (mean).
The standard deviation of the probability distribution is the
square root of the variance:

σ = σ2
Example 6
Probability distribution for the sum of two 6-sided dice.
x f (x) xf (x) x2 f (x)
2 1/36 2/36 4/36
3 2/36 6/36 18/36
4 3/36 12/36 48/36 N
5 4/36 20/36 100/36 µ= xf (x) = 7
6 5/36 30/36 180/36 i=1
7 6/36 42/36 294/36 σ 2 = {x f (x)} − µ2
8 5/36 40/36 320/36
9 4/36 36/36 324/36 = 54, 833 − 72
10 3/36 30/36 300/36 = 5, 833
11 2/36 22/36 242/36
P 1/36 12/36 144/36
1 7 54,833
Binomial Distribution

The binomial probability distribution is a discrete

probability distribution that has many applications.
It is associated with a multiple-step experiment that we call
the binomial experiment.
Its properties are:
1 The experiment consists of a sequence of n identical trials.
2 Each trial has only two possible outcomes: success or
3 The probability of success does not change from trial to
4 The trials are independent.

Binomial Distribution
2. Each trial has only two possible outcomes: success or failure.
This property is not as restrictive as it first looks, if you think in
terms of things either happening or not happening.
E.g., true or false, heads or tails, male or female.
We define: probability of success = p; probability of failure = q
Since there are only 2 possible outcomes then:p + q = 1
When observing dice rolls, we can score 1, 2, 3, 4, 5 or 6. If we
want to see how many times 5 comes up in n rolls, we can use
the binomial probability distribution if we think in terms of two
5 occurring, with p = 1/6; or
all other results (1, 2, 3, 4, 6) as not 5, with q = 5/6.
The number of outcomes

The number of outcomes of a binomial experiment that result

in exactly x successes from n trials is computed from:

n n!
Cx =
x!(n − x)!

This expression is called a combination, or the binomial

coefficient, and is the number of ways of selecting x objects
from n objects, without replacement, and irrespective of order.
(As opposed to the permutation, n Px , which takes order into

The number of outcomes

Some simple values of the binomial coefficient are:

C0 = n Cn = 1 n
C1 = n Cn−1 = n

The expression n! is called n factorial, and is computed by:

n! = n(n − 1)(n − 2)(n − 3) . . . 3 × 2 × 1

with 1! = 1, and 0! = 1

For very large n the Stirling formula gives n! :

n! ≈ 2πnn+ /2 e−n

A coin is flipped 3 times. In how many ways can we get exactly

2 heads?
So number of ways we
can get exactly 2 heads is
3 (HHT, HTH, THH).
Or, using the
combination equation,
with n = 3, x = 2:

3 3! 6
C2 = = =3
2!(3 − 2)! 2×1

Binomial probability function

To determine the probability of x successes we need to

know the probability of success and failure.
Since the trials of a binomial experiment are independent,
we can multiply the probabilities associated with each trial
outcome to the find the probability of a particular sequence
of outcomes.
The binomial probability distribution is given by:

f (x) = n Cx px q n−x

f (x) gives the probability of x successes from n trials.

Find the probability of scoring exactly 2 heads from 3 tosses of

a coin.
We have: p = 0.5, q = 1 − 0.5 = 0.5, n = 3.

f (2) = 3 C2 (0.5)2 (0.5)1 = 3 × 0.25 × 0.5 = 0.375

Mean and variance

The expected number of successes in a binomial experiment is

given as:
E[x̄] = µ = np
The variance of a binomial distribution is given as:

var[x] = σ 2 = npq

Example 1

A particular class has 35 students. From past experience it is

known that 7% of the students fail the course.
What is the probability of a) 0 students, and b) 5 students
failing the course? c) What is the expected number of students
who will fail the course, and the standard deviation of this?

Sara has a Rocket Science exam tomorrow, and she hasn’t done
any revision. So she decides to answer the 20 multiple choice
questions randomly. Each question has 4 options: A, B, C or D,
only one of which is the correct answer.
a) What is the probability distribution function for Sara
getting a correct answer?
b) What is the probability that she will get every question
c) What is the probability that she will get every question
d) What is the probability that she will scrape half marks (i.e.,
get 10 questions right)?
e) What is her expected mark?
f) What is the standard deviation on this expected value?
Continuous Probability Distributions

A continuous probability distribution is the probability

distribution that applies to a continuous random variable.
Note: a continuous random variable is a random variable
that can take on all values over a range (e.g., distance,
time, temperature).
For continuous random variables, the probability
distribution function, f (x), is usually called the
probability density function.

Uniform probability distribution

The continuous uniform probability distribution is used in all

situations in which all values of the random variable are equally
E.g., the random number generator on a calculator can be
used to generate random numbers between 0 and 1.
Each number has an equal probability of being generated.
The uniform probability density function for a random number
generator has the formula:

1 0≤x≤1
f (x) =
0 elsewhere

Uniform probability distribution
A plot of the probability density function, looks like:

For each possible outcome the probability density function is the

Area as a measure of probability

The value of the probability density function does not represent


The probability that a continuous random variable will assume a

value between given limits a and b is given by the area under
the graph of its probability density function between a and b.

Uniform distribution:

So, if a = 0.25, and b = 0.75, then:

p(0.25 ≤ x ≤ 0.75) = 1 × (0.75 − 0.25) = 0.5
Area as a measure of probability
In general, for any probability density function, f (x):

p(a ≤ x ≤ b) = f (x)dx, f (x) ≥ 0

Since the total probability of all outcomes cannot exceed 1:

f (x)dx = 1

i.e., the total area under the probability density function must
equal 1.
Area as a measure of probability

Consider the probability that the random variable assumes a

distinct value, a:

Discrete: p(x = a) = f (x = a)
has a definite value, the value of the discrete probability
Continuous: p(x = 1) = f (x)dx = 0,
and does not represent a probability.

Area as a measure of probability

We must therefore compute the probability of the random

variable taking a value in a particular interval surrounding a, say:
p(x = a) =≈ p(a −  ≤ x ≤ a + ) = f (x)dx

where  is some small number.

The Normal Distribution

The normal probability distribution (also called the

Gaussian distribution) is the most widely used distribution
in statistical analysis.
Two parameters define this distribution: the mean (µ), and
the standard deviation (σ). The probability density
function of the normal distribution is:
1 2 2
f (x) = √ e−(x−µ) /2σ
σ 2π

The Normal Distribution

The distribution is symmetrical about the mean, with the

mean = median = mode.
i.e., the mean is the most frequently occurring value (the
mode), and lies at the point that divides the curve exactly
in half (the median).
The curve is asymptotic, extending from the mean towards
infinity in both directions, never quite reaching zero.

Here are three normal curves of same σ, different µ:

The standard deviation indicates the spread of the

measurements (the width of the normal curve).

Here are three normal curves of same µ, different σ:

Any normally-distributed
random variable assumes
a value within:
±1σ of the mean,
68.26% of the area
±2σ of the mean,
95.44% of the area
±3σ of the mean,
99.72% of the area

The standard normal distribution
As mentioned, the area beneath the curve represents the
probability of a particular measurement occurring:
1 2 2
p(a ≤ x ≤ b) = √ e−(x−µ) /2σ dx
σ 2π

Fortunately, this horrible integral has already been worked out

for the standard normal probability distribution.

This is a normal distribution scaled, or standardised, to have

µ = 0; σ=1

The values of the integral above, for the standard normal

The standard normal distribution

Probabilities for all normal distributions are computed using the

standard normal distribution, but since most real data sets do
not have a standard normal distribution, we must transform our
real normal distribution (with mean µ, and SD σ) so it has a
mean of 0, and a SD of 1.

This scaling is done using the equation:

Therefore, z can be interpreted as the number of standard
deviations that the random variable x lies from the mean

The standard normal distribution
Now we can use the tables to work out p(a0 ≤ z ≤ b0 ), where:

a−µ b−µ
a0 = and b0 =
σ σ
The tables for the standard normal distribution look something
like this:

The standard normal distribution

The highlighted value gives the blue shaded area between the
mean and z=1.06, ie:

A = p(0 ≤ z ≤ 1.06) = 0.3554

The scores (as a %) in the Statistics exam were normally

distributed with a mean of 60% and a standard deviation of
10%. What is the probability that a student scored:
a) greater than 80%
b) between 40 and 50%?

a) For p(x > 80):
First compute the z-score corresponding to x = 80:
80 − 60
z= =2
This gives us that the red shaded area below represents
p(x > 80) = p(z > 2).

On the supplied standard normal tables, look up the probability
for z = 2,00: the answer is 0,4772. This value is the area of the
blue shaded region beneath the curve:

The blue area is

A1 = p(0 ≥ z ≥ 2) = p(60 ≥ x ≥ 80) = 0, 4772. So if we
subtract the blue area (A1 ) from the total area to the right of
the mean (0,5), well get the answer:
p(x > 80) = 0, 5 − 0, 4772 = 0, 0227
a) For p(40 < x < 50):
First compute the
40 − 60
x = 40 ⇒ z = = −2
50 − 60
x = 50 ⇒ z = = −1

The red shaded area represents

p(40 ≤ x ≤ 50) = p(−2 ≤ z ≤ −1)
To find the red shaded area from the normal tables, we first use
the symmetry of the normal distribution:

p(−2 ≤ z ≤ −1) = p(1 ≤ z ≤ 2)

Then, to find our required area we take the difference of two

p(1 ≤ z ≤ 2) = p(0 ≤ z ≤ 2) − p(0 ≤ z ≤ 1)

From the standard normal tables, z = 2 gives A1 = 0, 4772; and

z = 1, gives A2 = 0, 3413. Hence:

p(40 ≤ x ≤ 50) = 0, 4772 − 0, 3413 = 0, 1359

