Professional Documents
Culture Documents
STAT600 Notes Student
STAT600 Notes Student
STAT600 Probability
Semester 2 2021
STAT600: Probability
2021 Study Guide
This is a second year undergraduate course on probability offered by the School of Engineering,
Computer and Mathematical Sciences at AUT.
Prerequisites
To enrol in this course, you need to have successfully completed STAT500 (Applied Statistics),
MATH501 (Differential and Integral Calculus) and COMP500 or ENSE501 (Programming 1)
or equivalent. If you have not completed these papers, you should contact the lecturer.
Lecturer
Dr Robin Hankin, WT Level 1, ext. 5106, email: robin.hankin@aut.ac.nz
Class Times
Lecture: Monday 4:10pm - 6:00pm, WB327
Lab: Wednesday 4:10pm - 6:00pm, WZ519
Office Hours
Students are very welcome to discuss questions and issues regarding the course with their
lecturer. Office hours will be posted on Blackboard.
Learning Hours
STAT600 is a 15 point paper and this corresponds to 150 learning hours across the semester.
Self directed learning includes reading textbook, revising lecture notes and lab exercises, prac-
tising exercises, and completing assessments.
Course Outline
Week Topic
1–2 Introduction to Probability
3–4 Discrete Random Variables
5–6 Continuous Random Variables
7 Reliability
Mid semester break
8 – 10 Markov Chains
11 Further Properties of Random Variables
12 Revision
1
STAT600: Probability 2021 Study Guide Page 2
Assessment
Assessment Weighting Hand Out Hand In
Assignment 1 25 week 4 week 7
Assignment 2 25 week 8 week 11
Exam 50 N/A Exam period
Total 100
*Exact due dates will be confirmed when assignments are handed out.
Late Assignments: Late assignments, without an approved extension, will be subject to a
deduction of 5% (one grade e.g. from C+ to C) of the total mark available for each 24-hour
period, or part thereof, up to a maximum of five calendar days. Assignments over five days late
will not normally be accepted or marked and students will receive a DNC (Did Not Complete)
for that assessment. Note: this policy does not apply to quizzes, the mid-semester test or the
exam.
Extenuating Circumstances: You may apply for special consideration for assessment events
when exceptional circumstances beyond your control, including illness or injury, seriously
affect your physical or mental/emotional ability to: attempt an assessment, or prepare for an
assessment, or perform successfully during an assessment, or complete an assessment on or
by the due date. To apply for special consideration you should complete the online special
consideration form via Blackboard.
Academic Integrity & Plagiarism: Students who are found to have plagiarised in this
paper will be treated very seriously and may be subject to academic disciplinary procedures.
For more information see the AUT Calendar and the faculty assessment polices and regulations
(available on Blackboard under “Assessment”). Students should complete the “Academic
Integrity” module on Blackboard before submitting their first assignment.
Blackboard:
Students are encouraged to regularly check the course website on AUT online at http://autonline.aut.ac.nz/.
This web site contains class announcements, discussion forums, assignment information, class
resources, as well as updated class marks.
Software
This paper will use open-source programming language R. R is installed in the School’s computer
labs. We will also introduce you to R studio which provides a nice interface for working with
R. Both are available free of charge.
http://cran.r-project.org/ https://www.rstudio.com/products/rstudio-desktop/
Reference books
Readings will be selected from books such as:
• Higgins and Keller-McNulty (1995) Concepts in Probability and Stochastic Modelling. Bel-
mont, CA, Wadsworth
• Ross, S. (2014) First Course in Probability (9th ed). Harlow: Pearson Education Ltd.
• Ross, S. (2014). Introduction to Probability Models (11e). Boston, MA: Academic Press.
• Scheaffer, R.L. and Young, L.J. (2010) Introduction to Probability and Its Applications
(3rd ed). Boston: Cengage Learning.
Contents
Study Guide 2
Contents 4
1 Introduction to Probability 10
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Why Study Probability? . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 Deterministic Models . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.3 Applications of Probability . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Sample Space & Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Sample Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Events of an Experiment . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.3 Set Operators & Notation . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Foundations of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.1 Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.2 Probability of Events . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.3 Equally Likely Outcomes . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Counting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.1 Basic Principle of Counting . . . . . . . . . . . . . . . . . . . . . 21
1.5.2 Generalised Basic Principle of Counting . . . . . . . . . . . . . . 21
1.5.3 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.4 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.5.5 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5.6 Summary of Counting Rules . . . . . . . . . . . . . . . . . . . . . 29
1.5.7 Application to Probability - Examples . . . . . . . . . . . . . . . 29
4
2.3.2 Multiplicative Rule of Probability . . . . . . . . . . . . . . . . . . 43
2.4 Bayes’ Theorem and the Law of Total Probability . . . . . . . . . . . . . 43
2.4.1 Law of Total Probability . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.2 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5 Odds, Odds Ratios and Relative Risk . . . . . . . . . . . . . . . . . . . . 45
4 Discrete Distributions 64
4.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3.2 Binomial Distribution in R . . . . . . . . . . . . . . . . . . . . . . 67
4.3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.2 Geometric Distribution in R . . . . . . . . . . . . . . . . . . . . . 69
4.4.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.4 Properties of the Geometric Distribution . . . . . . . . . . . . . . 74
4.4.5 Alternative Parameterization . . . . . . . . . . . . . . . . . . . . 75
4.5 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.2 Negative Binomial Distribution in R . . . . . . . . . . . . . . . . 76
4.5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5.4 Properties of the Negative Binomial Distribution . . . . . . . . . 78
4.5.5 Alternative Parameterization . . . . . . . . . . . . . . . . . . . . 78
4.6 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6.2 Poisson Distribution in R . . . . . . . . . . . . . . . . . . . . . . 79
4.6.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.6.4 Properties of the Poisson Distribution . . . . . . . . . . . . . . . . 82
4.7 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
STAT600 Page 5
4.7.2 Hypergeometric Distribution in R . . . . . . . . . . . . . . . . . . 87
4.7.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.8 Distributions in R: Summary . . . . . . . . . . . . . . . . . . . . . . . . 89
4.9 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.9.1 Simulating a Discrete Distribution . . . . . . . . . . . . . . . . . . 90
4.10 Activity: Monty Hall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
STAT600 Page 6
6.5.4 Properties of the Weibull Distribution . . . . . . . . . . . . . . . 138
6.6 Beta distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.6.2 Beta Distribution in R . . . . . . . . . . . . . . . . . . . . . . . . 141
6.6.3 Properties of the Beta distribution . . . . . . . . . . . . . . . . . 141
6.6.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.7 Distributions in R: Summary . . . . . . . . . . . . . . . . . . . . . . . . 143
6.8 Simulating Continuous Distributions . . . . . . . . . . . . . . . . . . . . 144
6.8.1 Inverse transformation method . . . . . . . . . . . . . . . . . . . 144
6.8.2 Rejection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7 Reliability
An Application of Continuous Distributions 146
7.1 Introduction to Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.1.1 Reliability Function . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.2 Mean Time to Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.2.2 Integration: Recap . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.2.3 Mean Time to Failure for Common Distributions . . . . . . . . . 149
7.2.4 Repairable vs Non Repairable Systems . . . . . . . . . . . . . . . 150
7.3 Modelling Reliability of Systems . . . . . . . . . . . . . . . . . . . . . . . 151
7.3.1 Series Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.3.2 Parallel Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.3.3 Complex Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
STAT600 Page 7
9 Classification of Markov Chains 187
9.1 State Transition Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 188
9.2 Computing Probabilities for Markov Chains . . . . . . . . . . . . . . . . 189
9.2.1 Initial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.3 Classification of States . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.3.1 Accessibility, Communication and Irreducibility . . . . . . . . . . 195
9.3.2 Absorbing, Transient, Recurrent States . . . . . . . . . . . . . . . 197
9.3.3 Periodic States and Ergodic Chains . . . . . . . . . . . . . . . . . 199
9.4 Steady State Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
9.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
9.4.2 Finding the Steady State Probabilities . . . . . . . . . . . . . . . 200
9.4.3 Application of state probabilities . . . . . . . . . . . . . . . . . . 202
9.4.4 Using linear algebra to find steady state probabilities . . . . . . . 203
9.4.5 Using R to find steady state probabilities . . . . . . . . . . . . . . 203
9.5 Simulating a Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . 205
9.5.1 Simulating a Markov Chain By Hand . . . . . . . . . . . . . . . . 205
9.5.2 Simulating a Markov Chain Using R . . . . . . . . . . . . . . . . 207
9.5.3 Estimating a transition matrix . . . . . . . . . . . . . . . . . . . . 211
STAT600 Page 8
A Useful Formula 264
A.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Bibliography 285
STAT600 Page 9
Chapter 1
Introduction to Probability
References:
[Ros13, chapter 1 & 2] [SY10, Chapter 1 & 2]
10
1.1 Introduction
Exercise: Birthdays
Does anyone in this class have the same birthday?
What is the probability that 2 people in this class have the same birthday?
Discussion What decisions have you made today? What factors did you take
into account when making these decisions?
• Theory vs Reality
• “All models are wrong, but some are useful” (George Box, 1979)
STAT600 Page 11
1.2.2 Probabilistic Models
Example 1.2 Suppose we are going to flip a coin. What will the outcome be?
We do not know for certain, but we can make a statement like, in the long run 1/2 the
outcomes will be heads.
Example 1.3 How many bottles of milk will Newsfeed Cafe use today?
We do not know for certain, but with some data we can make similar statements regarding
the probability of particular outcomes.
• Marketing – will NetFlix customer X watch “Designated Survivor” given that they
have watched “Suits”?
STAT600 Page 12
Example 1.4
Experiment Sample Space S Cardinality n(S)
Rolling a Die S = {1, 2, 3, 4, 5, 6} 6
An event can also be described as a set of outcomes of an experiment. Events are usually
denoted by uppercase letters: A, B, C . . . An event A occurs whenever the outcome of an
experiment is contained in A.
Example 1.5 Suppose we define an experiment which consists of rolling a six-sided die.
We can define the following events:
A : Even outcomes A = {2, 4, 6} n(A) = 3
B : Outcomes less than or equal to 3, B = {1, 2, 3} n(B) = 3
C : Outcome is greater than 8, C=∅ n(C) = 0
Event C cannot happen and is referred to as the null event or empty set and is denoted
by ∅.
Example 1.6
Experiment: Roll a 6-sided die and observe the number on the uppermost face.
The sample space is S = {1, 2, 3, 4, 5, 6}.
STAT600 Page 13
Define the following sets:
A : Even outcomes A = {2, 4, 6}
B : Outcomes less than or equal to 3, B = {1, 2, 3}
C : Outcomes containing 4 or 6, C = {4, 6}
Example 1.7
Experiment: Flip two coins and observe the outcomes.
The sample space is S = {(H, H), (H, T ), (T, H), (T, T )}.
Define the following sets:
D : Outcomes with a head first D=
E : Outcomes with exactly one tail, E =
Example 1.8
Experiment: Roll a 6-sided die and observe the number on the uppermost face and then
flip a coin and observe the outcome.
The sample space is S = {(1, H), (2, H), (3, H), (4, H), (5, H), (6, H),
(1, T ), (2, T ), (3, T ), (4, T ), (5, T ), (6, T )}.
Define the following sets:
G : Outcomes which include an even number G =
H : Outcomes which include exactly one tail, H =
Example 1.9
• A ∪ B = {1, 2, 3, 4, 6}
• G∪H =
STAT600 Page 14
The complement of event A, denoted Ac (sometimes Ā or A0 ), is the
set consisting of all outcomes in the sample space which are not in A. Or
equivalently, Ac = {s ∈ S : s ∈
/ A}
Example 1.10
• Ac = {1, 3, 5}
• Gc =
Example 1.11
• A ∩ B = AB = {2}
• D ∩ E = DE = {(H, T )}
• G∩H =
If two events are disjoint or mutually exclusive, i.e., they have no elements
in common, then their intersection is called the empty set or null event and is
denoted with the symbol ∅.
• F = {(T, T )}
• D ∩ F = DF = ∅
STAT600 Page 15
When there are more than two sets, the union and intersections can be defined as follows:
∞
[
If E1 , E2 , . . . are events, then the union of these events is denoted En ,
n=1
and is defined to be the event that consists of all outcomes that are in En for at
least one value of n = 1, 2, . . ..
∞
\
If E1 , E2 , . . . are events, then the intersection of these events is denoted En ,
n=1
and is defined to be the event that consists of all outcomes that are in all of the
events En for n = 1, 2, . . ..
[Ros13, p 25]
The set operators (union, intersection and complement) have the following
properties.
Commutative laws: A ∩ B = B ∩ A
A∪B =B∪A
Associative laws: (A ∩ B) ∩ C = A ∩ (B ∩ C)
(A ∪ B) ∪ C = A ∪ (B ∪ C)
Distributive laws: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
De Morgan’s laws: (A ∪ B)c = Ac ∩ B c
c
(A ∩ B)! = Ac ∪ B c
n c n
[ \
General case: Ei = Eic
i=1 !c i=1
\n [n
Ei = Eic
i=1 i=1
Example 1.13
• A ∩ B = {2}
• A ∩ C = {4, 6}
• A ∪ B = {1, 2, 3, 4, 6}
• B ∪ C = {1, 2, 3, 4, 6}
Distributive Law
• A ∩ (B ∪ C) = {2, 4, 6}
STAT600 Page 16
• (A ∩ B) ∪ (A ∩ C) = {2, 4, 6}
De Morgan’s Law
• (A ∪ B)c = Ac ∩ B c = {5}
• A ∪ B = {1, 2, 3, 4, 6}
• Ac = {1, 3, 5}
• B c = {4, 5, 6}
Example 1.14 The primary power supply to a hospital is provided by the national grid.
If the primary power supply fails (i.e. there is a power cut), then there are two back-up
generators which can provide power to the hospital. Periodically, all three power systems
are tested and each could be found to be operational o or not n.
The sample space for this experiment is:
STAT600 Page 17
1.4 Foundations of Probability
1.4.1 Axioms of Probability
Formally, probability must satisfy the following axioms. These axioms were defined by
Kolmogorov so are sometimes referred to as “Kolmogorov’s Axioms”.
1. 0 ≤ P (E) ≤ 1
2. P (S) = 1
STAT600 Page 18
1.4.2 Probability of Events
• P (∅) = 0
• If E ⊂ F , then P (E) ≤ P (F )
Example 1.15
Experiment: Roll a 6-sided die and observe the number on the uppermost face.
The sample space is S = {1, 2, 3, 4, 5, 6}. Consider the following events, A, B, C.
A : Even outcomes A = {2, 4, 6}
B : Outcomes less than or equal to 3, B = {1, 2, 3}
C : Outcome is greater than 8, C=∅
Assuming that each side of the die is equally likely, then P ({1}) = P ({2}) = P ({3}) =
P ({4}) = P ({5}) = P ({6}) = 1/6
• P (C) = 0
Example 1.16 A farmer has decided to plant a new variety of corn on some of his land,
and he has narrowed his choice to one of three varieties, which are numbered 1, 2, and
3. All three varieties have produced good yields in variety trials. Which corn variety
STAT600 Page 19
produces the greatest yield depends on the weather. The optimal conditions for each are
equally likely to occur, and none does poorly when the weather is not optimal.
Being unable to choose, the farmer writes the name of each variety on a piece of paper,
mixes the pieces, and blindly selects one. The variety that is selected is purchased and
planted. Let Ei denote the event that variety i is selected (i = 1, 2, 3), let A denote the
event that variety 2 or 3 is selected, and let B denote the event that variety 3 is not
selected. [SY10, p 26]
Find the probabilities of Ei , A, and B.
n(A) 3
P (A) = =
n(S) 6
Example 1.18 Consider a class of 20 university students. Students elect one subject
to major in. Of the 20 students, 10 are studying analytics, 6 are studying maths and 4
are studying engineering. A student is chosen at random from this class. What is the
probability that the student is studying engineering?
STAT600 Page 20
1.5 Counting Rules
1.5.1 Basic Principle of Counting
Example 1.19 Suppose a retail chain has decided to build one new store in the North
Island and one in the South Island. There are 4 possible locations for the North Island
store (Auckland, Tauranga, Hamilton, Wellington) and 2 possible locations for the South
Island store (Christchurch, Dunedin). How many possible locations are there for the new
stores?
STAT600 Page 21
1.5.3 Permutations
Example 1.21 Consider the letters a, b, c. How many different ordered arrangements
of these letters are possible?
n!
Prn = n(n − 1) · · · (n − r + 1) =
(n − r)!
Example 1.22 How many distinct two letter arrangements can be made from the letters
a, b, c, d, e, if each letter can only be selected once?
## [1] 20
choose(5, 2)*factorial(2)
## [1] 20
R Demo
STAT600 Page 22
# Install package if not already installed and load
if(!("gtools" %in% installed.packages()[,1]))install.packages("gtools")
library(gtools)
## [1] 20 2
obj.perms
## [,1] [,2]
## [1,] "a" "b"
## [2,] "a" "c"
## [3,] "a" "d"
## [4,] "a" "e"
## [5,] "b" "a"
## [6,] "b" "c"
## [7,] "b" "d"
## [8,] "b" "e"
## [9,] "c" "a"
## [10,] "c" "b"
## [11,] "c" "d"
## [12,] "c" "e"
## [13,] "d" "a"
## [14,] "d" "b"
## [15,] "d" "c"
## [16,] "d" "e"
## [17,] "e" "a"
## [18,] "e" "b"
## [19,] "e" "c"
## [20,] "e" "d"
STAT600 Page 23
1.5.4 Combinations
n(n − 1) · · · (n − r + 1)
n n n!
Cr = = =
r (n − r)!r! r!
Prn
The expression Crn can be read as “n choose r”. Notice that Crn = r!
.
In R, the function choose(n, r) can be used to compute Crn .
Example 1.23 How many groups of two letters can be selected from a, b, c, d, e, if each
letter can only be selected once?
## [1] 10
R demo
# Install package if not already installed and load
# if(!("gtools" %in% installed.packages()[,1]))install.packages("gtools")
library(gtools)
STAT600 Page 24
# Generate list of combinations
(obj.combins <- combinations(5, 2, v=obj, set=TRUE, repeats.allowed=FALSE))
## [,1] [,2]
## [1,] "a" "b"
## [2,] "a" "c"
## [3,] "a" "d"
## [4,] "a" "e"
## [5,] "b" "c"
## [6,] "b" "d"
## [7,] "b" "e"
## [8,] "c" "d"
## [9,] "c" "e"
## [10,] "d" "e"
dim(obj.combins)
## [1] 10 2
Example 1.24 The Mathematical Sciences Department has 15 academic staff members.
A group of 4 must be selected to help with the upcoming open day. How many different
groups of staff could be selected for this task?
n+r−1 (n + r − 1)!
=
r (n − 1)!r!
STAT600 Page 25
Example 1.25 An entomologist is studying the spatial distribution of insects across
plants. Suppose in a sample of 5 plants, 3 insects have been found. The locations of
these 3 insects across the 5 are not known. A plant could support any number of insects
so could have 0, 1, 2 or 3 of the insects. How many different arrangements of the insects
across the plants are possible, if:
STAT600 Page 26
1.5.5 Partitions
This result can also be explained as the number of different permutations of n objects,
of which n1 are alike, n2 are alike, . . ., nk are alike.
Example 1.26 Suppose 10 employees are to be divided among three teams, with 3 going
to team I, 4 going to team II and 3 going to team III. In how many ways can the team
assignments be made? (Adapted from [SY10, p 40])
STAT600 Page 27
Example 1.27 How many different arrangements can be formed from the letters:
PEPPER?
STAT600 Page 28
1.5.6 Summary of Counting Rules
The number of ways of selected r items from n, r ≤ n.
Order Is Important Order Is Not
Important
n+r−1
With
nr
Replacement r
Without n n! n n! n
Pr = Cr = =
Replacement (n − r)! (n − r)! r! r
b. Suppose that within this group there are 7 men and 3 women. All 3 women were
awarded a bonus of $5000. If the bonuses are awarded randomly, what is the proba-
bility that this would happen?
Example 1.29 An import company has received a delivery of 20 products. It will check
the quality of the delivery by inspecting three products. The inspection process involves
destroying the product. The delivery will only be accepted if all three products are non-
defective. Suppose that there are 5 defective items within the delivery of 20. What is the
probability that the delivery will be accepted?
STAT600 Page 29
Example 1.30 Birthday
What is the probability that two people in this class have the same birthday?
STAT600 Page 30
Chapter 2
References:
[Ros13, chapter 3] [SY10, Chapter 3]
31
2.1 Introduction
Example 2.1 The Titanic was a large luxury ocean linear that was declared to be an
“unsinkable ship.” During its maiden voyage across the Atlantic Ocean, it hit an iceberg
and sank on April 14, 1912. Large numbers of people lost their lives. The economic status
of the passengers has been roughly grouped according to whether they were travelling
first class, second class, or third class. The crew has been reported separately. Although
the exact numbers are still a matter of debate, one report (Dawson, 1995) of the numbers
of those who did and did not survive, by economic status and gender is displayed in the
table below. [SY10, p 87]
STAT600 Page 32
This data is available in R.
## Survived
## Sex No Yes
## Male 1364 367
## Female 126 344
## Survived
## Age No Yes
## Child 52 57
## Adult 1438 654
Use ?Titanic to get more information about the data. This dataset can be printed in
different ways depending on the question of interest.
#Class vs Gender
apply(Titanic, c("Class", "Sex"), sum)
## Sex
## Class Male Female
## 1st 180 145
## 2nd 179 106
## 3rd 510 196
## Crew 862 23
## , , Survived = No
##
## Sex
## Class Male Female
## 1st 118 4
## 2nd 154 13
## 3rd 422 106
## Crew 670 3
##
## , , Survived = Yes
##
## Sex
## Class Male Female
## 1st 62 141
## 2nd 25 93
## 3rd 88 90
## Crew 192 20
STAT600 Page 33
2.2 Conditional Probability
2.2.1 Conditional Probability Definition
The vertical line | is read as “given”, so P (A|B) is the probability that event A has
occurred, given that event B has occurred.
To compute conditional probability, B becomes a sample space for A and then the event
of interest is the intersection of A and B.
Example 2.2 A die is thrown and the number on the uppermost face is observed. If it
is known that the number is even, what is the probability that the number is a 2?
A = {2}
B = {2, 4, 6}
P (A|B) = n(A)/n(B) = 1/3.
Example 2.3 Refer to the following table of Titanic data and the tables shown earlier.
## Survived
## Class No Yes Sum
## 1st 122 203 325
## 2nd 167 118 285
## 3rd 528 178 706
## Crew 673 212 885
## Sum 1490 711 2201
a. Given that a passenger was travelling in first class, what is the probability that he/she
survived?
STAT600 Page 34
b. Given that a passenger was male and travelling in third class, what is the probability
that he survived?
Example 2.4 There are four batteries and one is defective. Two are to be selected at
random for use on a particular day. Find the probability that the second battery selected
is not defective, given that the first was not defective. [SY10, ex3.2 p 60]
STAT600 Page 35
Example 2.5 It is known that 0.5% of the population have disease A. Of those that have
the disease, 99% will test positive for the disease. Of those that do not have the disease
2% test positive for the disease.
b. What is the probability that a person chosen at random has the disease?
c. What is the probability that a person chosen at random received a positive test?
STAT600 Page 36
2.2.3 Types of Errors
True Diagnosis
Positive Negative
Positive True Positive False Positive
Test Result
Negative False Negative True Negative
Example 2.6 Previously we considered testing for Disease A. For this example, con-
struct a table to show the rates of true positive, true negative, false positive and false
negatives.
True Diagnosis
Positive Negative Sum
Positive
Test Result
Negative
Sum
STAT600 Page 37
A “good” test should have sensitivity and specificity values close to 1.
A third measure which is important when considering a test is the predictive value.
Ideally, sensitivity, specificity and predictive value should all be close to 1. However the
predicted value is strongly influenced by the prevalence of the disease.
Example 2.7 Previously we considered testing for Disease A. For this example, compute
the sensitivity, specificity and predictive value for the test.
STAT600 Page 38
The following example highlights the importance of considering prevalence of a disease
when interpreting screening tests.
Example 2.8 For the three scenarios below compute the sensitivity, specificity, preva-
lence and predictive value of the test.
Scenario 1:
True Diagnosis
Positive Negative Sum
Positive 90 10 100
Test Result Negative 10 90 100
Sum 100 100 200
Scenario 2:
True Diagnosis
Positive Negative Sum
Positive 90 100 190
Test Result Negative 10 900 910
Sum 100 1000 1100
Scenario 3:
True Diagnosis
Positive Negative Sum
Positive 90 1000 1090
Test Result Negative 10 9000 9010
Sum 100 10000 10100
STAT600 Page 39
2.2.4 Is Conditional Probability a Probability?
We can assess P (A|B) = P (AB)/P (B) against the probability axioms to determine
whether or not it is a probability.
Since AB ⊂ B, then P (AB) ≤ P (B). We also have P (AB) ≥ 0. Therefore the first
axiom is satisfied, i.e.
P (AB)
0 ≤ P (A|B) = ≤1
P (B)
The second axiom
P (SB) P (B)
P (S|B) = = =1
P (B) P (B)
For the third axiom, if A1 , A2 , . . . are mutually exclusive events, then A1 B, A2 B, . . . are
also mutually exclusive, and:
∞
! !
[
∞
! P Ai B
[ i=1
P Ai |B =
i=1
P (B)
∞
X
P (Ai B)
i=1
=
P (B)
∞
X P (Ai B)
=
i=1
P (B)
X∞
= P (Ai |B)
i=1
STAT600 Page 40
2.3 Independence
2.3.1 Independent Events
Two events A and B are independent if knowledge about one does not affect knowledge
about the other. If A occurs, it does not change the probability of B, and vice versa.
Example 2.9 Which of the following do you think would be independent events?
• Day of week on which a person was born and whether they like chocolate.
Example 2.11 Consider the Titanic survival data. Are the events “being a female
passenger” and “surviving” independent?
STAT600 Page 41
Example 2.12 Blood type, the best known of the blood factors, is determined by a
single allele. Each person has blood type A, B, AB or O. Type O represents the absence
of a factor and is recessive to factors A and B. Thus, a person with type A blood may
be either homozygous (AA) or heterozygous (AO) for this allele; similarly, a person with
type B blood may be either homozygous (BB) or heterozygous (BO). Type AB occurs if
a person is given an A factor by one parent and a B factor by the other parent. To have
type O blood an individual must be homozygous O (OO). Suppose a couple is preparing
to have a child. One parent has blood type AB and the other is heterozygous B. [SY10,
p 72]
What are the possible blood types the child will have and what is the probability of each?
STAT600 Page 42
2.3.2 Multiplicative Rule of Probability
[Ros13]
The law of total probabilityPn states that for mutually exclusive events
B1 , B2 , . . . , Bn such that i=1 P (Bi ) = 1,
Or equivalently,
n
X
P (A) = P (A|Bi )P (Bi ).
i=1
STAT600 Page 43
2.4.2 Bayes’ Theorem
P (E|Bj )P (Bj )
P (Bj |E) = n
X
P (E|Bi )P (Bi )
i=1
If the events Bi , i = 1, . . . , n are competing hypotheses, then Bayes’ formula gives the
conditional probabilities of these hypotheses when evidence E becomes available.
Notice that the denominator of this formula uses the law of total probability.
Example 2.13 A company buys microchips from three suppliers – I, II, and III. Supplier
I has a record of providing microchips that contain 10% defectives; Supplier II has a
defective rate of 5%; and Supplier III has a defective rate of 2%. Suppose 20%, 35%, and
45% of the current supply came from Suppliers I, II, and III, respectively. [SY10, p 79]
b. If a microchip is selected at random from this supply, what is the probability that it
is defective?
c. If a randomly selected microchip is defective, what is the probability that it came from
supplier II?
STAT600 Page 44
2.5 Odds, Odds Ratios and Relative Risk
Let F be an event, then the odds of event F or the odds in favour of event
F are given by
P (F )
P (F c )
Example 2.14 Consider a fair coin in which P (Heads) = P (Tails) = 0.5. The odds in
favour of heads is
P (H) 0.5
= =1
P (T ) 0.5
We can say, the odds of obtaining heads are 1 to 1 or “even”.
Example 2.15 Consider a horse race in which the probability of horse A winning is 0.8.
Then the odds in favour of horse A are
P (A) 0.8
c
= =4
P (A ) 0.2
45
Chapter 3
References:
[Ros13, chapter 4] [SY10, Chapter 4]
46
3.1 Random variables
In other words, a random variable is a function that assigns a real number to every
member of the sample space.
The value X = x is the event that the random variable X takes the value x. It is
convention that:
• Continuous random variable — X is continuous and takes values over a real interval.
• Time (in minutes) until the arrival of the next bus as at a given bus stop, X ∈ R+
47
3.2 Discrete Random Variables
3.2.1 Probability Mass Function
(i) P (X = x) = p(x) ≥ 0
P
(ii) p(x) = 1
x
Example 3.1 Two coins are flipped. Let X denote the number of heads observed. Find
and graph the pmf of X.
0 1 2
x
STAT600 Page 48
Example 3.2 A local video store periodically puts its used movies in a bin and offers to
sell them to customers at a reduced price. Twelve copies of a popular movie have just
been added to the bin, but three of these are defective. A customer randomly selects two
of the copies for gifts. Let X be the number of defective movies the customer purchased.
Find and graph the probability function for X. [SY10, p 96]
0.4
0.2
0.0
0 1 2
STAT600 Page 49
3.2.2 Distribution Function
F (b) = P (X ≤ b)
If X is discrete, then
b
X
F (b) = p(x)
x=−∞
1. lim F (x) = 0
x→−∞
2. lim F (x) = 1
x→∞
3. The distribution function is a nondecreasing function; that is, if a < b then F (a) ≤
F (b).
Example 3.3 Two coins are flipped. Let X denote the number of heads observed. Find
the distribution function (cdf) of X.
0, x<0
0.25, 0 ≤ x < 1
F (x) =
0.75, 1 ≤ x < 2
x≥2
1,
STAT600 Page 50
Example 3.4 Recall the video store example.
px
cumsum(px)
MASS::fractions(cumsum(px))
STAT600 Page 51
1.0
0.8
0.6
F(x)
0.4
0.2
0.0
c. Verify that the function is F (x) is indeed a distribution function using the properties.
1. lim F (x) = 0
x→−∞
2. lim F (x) = 1
x→∞
3. The distribution function is a nondecreasing function; that is, if a < b then F (a) ≤
F (b).
4. The distribution function is right-hand continuous; that is, lim+ F (x + h) = F (x)
h→0
STAT600 Page 52
3.3 Expected Value and Moments
3.3.1 Expected Value
The expected value of X can be thought of as the average of the random variable and is
often called the mean of X and is denoted by µ.
Example 3.5 Two coins are flipped. Let X denote the number of heads observed. Find
the expected value E[X].
Example 3.6 Compute the expected value of X for the video store example, i.e. compute
the expected number of defective movies bought.
In R:
x <- 0:2
MASS::fractions(px)
EX <- sum(x*px)
EX
## [1] 0.5
STAT600 Page 53
3.3.2 Expected Value of a Function
Example 3.7 Suppose you decided to play the following game with a friend. Two coins
are flipped and the number of heads is observed. Let X denote the number of heads. For
each of the following games, compute your expected winnings and determine if the game
is fair (i.e. has an expected value of 0).
a. If no heads are observed, you must pay your friend $1. If one head is observed, your
friend pays you $1 and if two heads are observed, your friend pays you $2.
b. If no heads are observed, you must pay your friend $4. If one head is observed, your
friend pays you $1 and if two heads are observed, your friend pays you $2.
STAT600 Page 54
3.3.3 Variance and Standard Deviation
p √
STD[X] = VAR[X] = 0.5 = 0.7071068
Example 3.9 Compute the variance and standard deviation of X for the video store
example.
STAT600 Page 55
Example 3.10 Verify that VAR[X] = E[(X − µ)2 ] does in fact equal E[X 2 ] − µ2 .
STAT600 Page 56
3.3.4 Functions of a Random Variable
Let X be a random variable with expected value E[X] and variance VAR[X].
Then:
Tchebysheff’s Theorem is useful when the mean and variance are known, but the distri-
bution is unknown.
STAT600 Page 57
Example 3.11 The daily production of electric motors at a certain factory averages 120
with a standard deviation of 10. [SY10, p 113]
1. What can be said about the fraction of days on which the production level falls
between 100 and 140?
2. Find the shortest interval certain to contain at least 90% of the daily production
levels.
STAT600 Page 58
3.3.5 Moments
The second moment may be familiar from the following definition of the variance of X.
VAR[X] = E[X 2 ] − µ2
Example 3.12 Let the discrete random variable X have the following probability mass
function:
x 0 1 2 3 4
p(x) 0.1 0.3 0.4 0.1 0.1
STAT600 Page 59
3.3.6 Example: Production Line
A manufacturing company in concerned about the number of defects on its production
lines. Let:
F (x) = P (X ≤ x)
F (y) = P (Y ≤ y)
3. The production of defects costs the company money. The financial director has
authorised the repair of one production line. Which production line do you think
should be repaired? Justify your answer.
STAT600 Page 60
4. Find the expected value and variance of X and Y .
STAT600 Page 61
5. Suppose that to repair a defect created by machine X costs $5. Find the expected
value and variance of the cost of repairing defects per hour.
STAT600 Page 62
6. Verify your answer to the previous question by simulating this scenario in R.
## [1] 3.49145
## [1] 15.2202
63
Chapter 4
Discrete Distributions
References:
[Ros13, Chapter 4]
[SY10, Chapter 4]
64
4.1 Discrete Distributions
There are several fundamental discrete distributions that apply for a large number of
practical problems. These include:
• Bernoulli
• Binomial
• Geometric
• Negative Binomial
• Poisson
• Hypergeometric
For example, tossing a coin and observing either a head or a tail; a cow being pregnant
or not; a product being defective or not. The two outcomes are often referred to as a
success (often denoted by a 1) and a failure (often denoted by a 0).
Suppose one Bernoulli trial is conducted, in which the probability of success is p. Let X
be a random variable in which
(
0, if the outcome of the trial is a failure
X=
1, if the outcome of the trial is a success
p(x) = px (1 − p)1−x , x = 0, 1
P
The expected value of X is E[X] = x p(x)x = 0 × (1 − p) + 1 × p = p.
Bernoulli Distribution:
p(x) = px (1 − p)1−x , x = 0, 1
Mean: E[X] = p
Variance: VAR[X] = p(1 − p)
65
4.3 Binomial Distribution
4.3.1 Definition
2. Each trial has only two outcomes (i.e. each trial is a Bernoulli trial)
Mean: E[X] = np
Variance: VAR[X] = np(1 − p)
p n−k
Notice that: P (X = k + 1) = 1−p k+1
P (X = k)
n
In the probability function
for X, the term k
represents the number ways of allocating
n n!
k successes in n trials: k = k!(n−k!) .
• Number of rivers out of a sample of n that breached their banks last winter
STAT600 Page 66
4.3.2 Binomial Distribution in R
dbinom(x, n, p)
pbinom(x, n, p)
qbinom(Fx, n, p)
rbinom(numSims, n, p)
Example 4.2 Use R to find the probability function and distribution function of the
binomial distribution with n = 10 and p = 0.2 for values of x = 0, 1, . . . , 10:
4.3.3 Examples
Example 4.3 An industrial firm supplies 10 manufacturing plants with a certain chem-
ical. The probability that any one firms calls in an order on a given day is 0.2, and this
is the same for all 10 plants. [SY10, ex 4.12]
a. Find the probability that on the given day, the number of plants calling in orders is:
i. Exactly 3
ii. At most 3
iii. At least 3
b. Find the expected value and variance for number of plants calling in an order on a
given day.
STAT600 Page 67
STAT600 Page 68
4.4 Geometric Distribution
4.4.1 Definition
Example 4.5 Use R to find the probability of observing 3 failures before the first success,
if the probability of success is 0.2.
probSuccess <- 0.2
numFailures <- 3
dgeom(numFailures, probSuccess)
## [1] 0.1024
probSuccess*(1-probSuccess)^numFailures
## [1] 0.1024
STAT600 Page 69
4.4.3 Examples
Example 4.6 A recruiting firm finds that 20% of the applicants for a particular sales
position are fluent in both English and Spanish. Applicants are selected at random from
the pool and interviewed sequentially. [SY10, ex 4.15 and 4.16]
1. Find the probability that five applicants are interviewed before finding the first
applicant who is fluent in both English and Spanish.
2. Let X denote the number of unqualified applicants interviewed prior to the first
qualified one. Suppose that the first applicant who is fluent in both English and
Spanish is offered the position, and the applicant accepts. Suppose each interview
costs $125.
(a) Find the expected value and the standard deviation of the total cost of inter-
viewing until the job is filled.
STAT600 Page 70
STAT600 Page 71
(b) The mean and variance of the total cost are known, but the distribution is
not. Use Tchebysheff’s Theorem to determine the interval in which this cost
should be expected to fall at least 75% of the time?
3. Find the probability that at least five applicants are interviewed before finding the
first applicant who is fluent in both English and Spanish.
4. Find the probability that at least four applicants are interviewed before finding the
STAT600 Page 72
first applicant who is fluent in both English and Spanish.
5. Given at least five applicants are interviewed before finding the first applicant who
is fluent in both English and Spanish, find the probability that at least 9 unqualified
applicants are interviewed before the first qualified one.
STAT600 Page 73
4.4.4 Properties of the Geometric Distribution
The geometric distribution is the only discrete distribution to have the memoryless property.
If we have observed j failures, then the probability of observing at least k more failures
(i.e. j + k failures in total) before a success is the same as being at the beginning the
probability of observing at least k failures. That is, for integers j, k > 0
P (X ≥ j + k|X ≥ j) = P (X ≥ k)
P ((X ≥ j + k) ∩ (X ≥ k))
P (X ≥ j + k|X ≥ k) =
P (X ≥ k)
P (X ≥ j + k)
=
P (X ≥ k)
(1 − p)j+k
=
(1 − p)k
= P (X ≥ j)
STAT600 Page 74
4.4.5 Alternative Parameterization
If instead, X is defined as the number of trials until the first success (rather than the
number of failures), then the X also has a Geometric distribution.
Then for X ∼ Geometric(p)
P (X = x) = p(1 − p)x−1 , x = 1, 2, . . . ; 0 ≤ p ≤ 1
1
Mean: E[X] = p
1−p
Variance: VAR[X] = p2
• Number of balls a cricket player will face before getting out (if each ball is indepen-
dent and probability of getting out is the same on each ball)
Example 4.8 The number of weeds within a randomly selected square meter of a pasture
has been found to be well modelled using the geometric distribution. For a given
pasture, the number of weeds per square meter averages 0.5. What is the probability
that no weeds will be found in a randomly selected square meter of this pasture? [SY10,
ex 4.18]
STAT600 Page 75
4.5 Negative Binomial Distribution
4.5.1 Definition
Example 4.10 Use R to find the probability of observing 3 failures before the second
success, if the probability of success is 0.2.
## [1] 0.08192
choose(numFailures+numSuccesses-1, numSuccesses-1)*
(probSuccess^numSuccesses)*(1-probSuccess)^numFailures
## [1] 0.08192
STAT600 Page 76
4.5.3 Examples
Example 4.11 Suppose that 20% of the applicants for a certain sales position are fluent
in English and Spanish. Suppose that four jobs requiring fluency in English and Spanish
are open. Find the probability that two unqualified applicants are interviewed before
finding the fourth qualified applicant, if the applicants are interviewed sequentially and
at random.
R code:
## [1] 0.01024
choose(numFailures+numSuccesses-1,
numSuccesses-1)*probSuccess^numSuccesses*
(1-probSuccess)^numFailures
## [1] 0.01024
STAT600 Page 77
4.5.4 Properties of the Negative Binomial Distribution
• Extension of the geometric distribution
• Very flexible distribution as it can take different shapes depending on the parameter
values.
x−1 r
P (X = x) = p (1 − p)x−r , x = r, r + 1, . . .
r−1
r
Mean: E[X] = p
r(1−p)
Variance: VAR[X] = p2
STAT600 Page 78
4.6 Poisson Distribution
4.6.1 Definition
Poisson random variables are useful for modelling the occurrence of random phenomena.
The random variable X is the number of events observed and λ can be interpreted as the
rate at which events occur.
Example 4.13 Applications of the Poisson distribution
Example 4.14 A certain type of event occurs at a rate of 2 per day. What is the
probability that on a particular day, 3 events are observed?
lambda <- 2
numEvents <- 3
# P(X = 3)
dpois(numEvents, lambda)
## [1] 0.180447
exp(-lambda)*(lambda^numEvents)/factorial(numEvents)
## [1] 0.180447
STAT600 Page 79
History.
The Poisson distribution is named after Siméon Denis Poisson
(1781 – 1840), a French mathematician. In addition to giving
his name to this important distribution he made contributions in
other areas of science. He developed an expression for the force
of gravity (in terms of the distribution of mass within a planet),
which has been used for determining details of the Earth’s shape,
by measuring the paths of orbiting satellites.
Borokov, K. (2003), Elements of Stochastic Modelling, World Scientific, Singapore. Image from:
http://en.wikipedia.org/wiki/Simeon_Denis_Poisson
4.6.3 Examples
Example 4.15 During business hours, the number of calls passing through a particular
cellular relay system averages five per minute. [SY10, ex 4.22]
1. Find the probability that no call will pass through the relay system during a given
minute.
2. Find the probability that no call will pass through the relay system during a 2-
minute period.
STAT600 Page 80
3. Find the probability that three calls will pass through the relay system during a
2-minute period.
4. Find the probability that no more than two calls pass through the system in a given
minute.
STAT600 Page 81
Example 4.16 Derive the mean of the Poisson distribution
time
If we assume:
• that the subintervals are small enough that the probability that one contains more
than one accident is 0
Then, the number of subintervals containing accidents, and thus the number accidents in
a week is a binomial random variable. However, in this scenario, we do not know n or p,
but we would expect that as n increases, p would decrease.
STAT600 Page 82
Poisson Approximation to the Binomial Distribution
Some probability distributions, like the Poisson, come about by limiting the arguments
applied to other distributions. Let’s examine what happens to the binomial distribution
as n increases and p decreases.
For large n, the binomial distribution can be approximated by the Poisson distribution.
This suggests that Poisson distribution would be a good model when there is a large
number of independent trials each with the same probability.
k n−k
n λ λ
P (X = k) = 1−
k n n
As n increases whilst holding the mean, np, constant:
The Poisson distribution can be used to measure the occurrence of rare events in
space and volume as well as in time.
STAT600 Page 83
Example 4.18 Applications of the Poisson distribution to modelling “rare” events
Example 4.19 Suppose each of n people is equally likely to have his or her birthday
on any of the 365 days of the year. What is the probability that a set of n independent
people all have different birthdays?
We solved this problem in week 1, but here we will solve it using the Poisson approxima-
tion. [Ros13, p 149]
STAT600 Page 84
R simulation of the birthday problem
set.seed(123123)
options(scipen=10, digits=7)
numSims <- 100000
for(i in 1:length(people)){
numPeople <- people[i]
results[i, "numPeople"] <- numPeople
# Simulation
uniqueBirthday <- replicate(numSims,
length(unique(sample(1:365, numPeople,
replace=TRUE)))==numPeople)
results[i, "simulation"] <- 1-table(uniqueBirthday)["FALSE"]/numSims
# Actual probability
results[i, "actual"] <- choose(365, numPeople )*factorial(numPeople)/(365^numPeople)
# Poisson Approximation
lambda <- choose(numPeople, 2)/365
results[i, "PoissonApprox"] <- dpois(0, lambda)
}
results
STAT600 Page 85
4.7 Hypergeometric Distribution
4.7.1 Definition
Suppose that a lot consists of N items, of which k are of one type (success)
and N − k are of another type (failures). Suppose that n items are sampled
randomly and sequentially from the lot, without replacement. Let X denote
the number of successes amongst the n sampled items, the X is said to have a
hypergeometric distribution.
k N −k
x n−x
for x = 0, 1, . . . , k; with b
= 0 if a > b
N a
P (X = x) = n
0 otherwise
k
Mean: E[X] = n
N
N −n
k k
Variance: VAR[X] = n 1−
N N N −1
The pmf of the hypergeometric distribution can be understoon intuitively as follows. For
b
x = 0, 1, 2, . . . , k; with a = 0 if a > b:
k N −k
x n−x
P (X = x) = N
n
(choose x successes from k) × (choose n − x failures from N − k)
=
(choose n items from N )
STAT600 Page 86
4.7.2 Hypergeometric Distribution in R
dhyper(x, k, N-k, n)
phyper(x, k, N-k, n)
qhyper(x, k, N-k, n)
rhyper(numSims, k, N-k, n)
x = number of successes
k = total number of successes
N = total number of items
N - k = total number of failures
n = number of items selected
4.7.3 Examples
Example 4.21 Two positions are open in a company. Ten men and five women have
applied for a job at this company, and all are equally qualified for either position. The
manager randomly hires two people from the applicant pool to fill the positions. What
is the probability that a man and a woman were chosen? [SY10, ex 4.25]
R code:
totalNumSuccesses <- 10 #k
totalNumFailures <- 5 #N-k
numSuccesses <- 1 #x
numTrials <- 2 #n
## [1] 0.47619
10/21
## [1] 0.47619
choose(totalNumSuccesses, numSuccesses)*
choose(totalNumFailures, numTrials-numSuccesses)/
choose(totalNumSuccesses+totalNumFailures, numTrials)
## [1] 0.47619
STAT600 Page 87
Example 4.22 An auditor checking the accounting practices of a firm samples 4 accounts
from an accounts receivable list of 12. Find the probability that the auditor sees at least
one past-due account under the following conditions: [SY10, ex 4.111]
Let X denote the number of past-due accounts in a sample of 4 from 12, in which
2 of 12 are past-due.
N = 12, n = 4, k = 2
P (X ≥ 1) = 1 − P (X = 0)
2 12−2
0 4−0
=1− 12
4
= 0.575758
1-dhyper(0, 2, 10, 4)
## [1] 0.575758
1-choose(2,0)*choose(12-2, 4-0)/choose(12, 4)
## [1] 0.575758
STAT600 Page 88
3. There are 8 such accounts among the 12
STAT600 Page 89
4.9 Simulation
• Widely used technique
– number of runs
– starting conditions
– length of each simulation run
– accuracy of the model (compared with system being modelled)
The random variables X1 , X2 , . . . , Xn are a random sample if they are independent and
identically distributed (iid).
The distribution resulting from the random variables in a random sample is called the
empirical probability distribution.
This function will differ each time a new sample is taken. When we simulate a random
discrete variable, we are collecting a random sample of values from that distribution. As
the size of the sample increases the empirical probability distribution will converge to
the theoretical distribution. As n → ∞, empirical probability distribution → theoretical
distribution.
set.seed(12345)
dbinom(0:5, 5, 0.3)
table(rbinom(10000, 5, 0.3))
##
## 0 1 2 3 4 5
## 1653 3624 3135 1290 276 22
STAT600 Page 90
2. Write your own simulation using uniform random numbers between 0 and 1.
Example 4.23 Let X be a random variable with the following probability mass function:
x 10 20 30 40 50
p(x) 0.2 0.3 0.1 0.3 0.1
F (x) 0.2 0.5 0.6 0.9 1
STAT600 Page 91
R code for simulation
set.seed(9293)
x <- seq(10, 50, 10)
px <- c(0.2, 0.3, 0.1, 0.3, 0.1)
Fx <- cumsum(px)
Fx
# Method 1
for(i in 1:numSims){
results[i, "x1"] <- x[min(which(u.all[i] < Fx))]
}
table(results[,"x1"])
##
## 10 20 30 40 50
## 2042 2951 995 3020 992
# Method 2
results$x2 <- sapply(u.all, function(u)x[min(which(u < Fx))])
table(results[,"x2"])
##
## 10 20 30 40 50
## 2042 2951 995 3020 992
head(results)
## u x1 x2
## 1 0.4068846 20 20
## 2 0.3058260 20 20
## 3 0.7390002 40 40
## 4 0.8640917 40 40
## 5 0.6610237 40 40
## 6 0.1691787 10 10
STAT600 Page 92
4.10 Activity: Monty Hall
Suppose you are a contestant on a game show and the host (Monty) shows you 3 doors.
Behind 2 of the doors are goats and behind 1 of the doors is a car. You do not know the
location of the car, but Monty does.
At the start of the game, you choose a door. Monty then opens one of the other two
doors to reveal a goat. He offers you the choice of sticking with your original door choice
or switching to the other closed door.
Should you stick with your original decision or switch to the other door?
http://www.shodor.org/interactivate/activities/SimpleMontyHall/
https://www.youtube.com/watch?v=mhlc7peGlGg
93
Chapter 5
References:
[Ros13, Chapter 5]
[SY10, Chapter 5]
94
5.1 Continuous Random Variables
A random variable, X may be continuous or discrete (or a mixture of the two).
• Continuous random variable — X is continuous and takes values over a real interval.
0.12
dnorm(x_seq, 0, 3)
dexp(x_seq, 0.5)
0.4
0.08
0.2
0.04
0.00
0.0
0 2 4 6 8 10 −10 −5 0 5 10
x_seq x_seq
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
0.15
dgamma(x_seq, 2, 1)
dunif(x_seq, 2, 8)
0.3
0.05 0.10
0.2
0.1
0.00
0.0
−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−
0 2 4 6 8 10 0 2 4 6 8 10
x_seq x_seq
95
5.1.1 Relative Frequency
Example 5.2 Suppose the lifetimes of 50 batteries were recorded (in hundreds of hours).
What can we say about the lifetime of a battery?
Battery Lifetimes
15
10
Frequency
5
0
0 2 4 6 8 10
hours (hundreds)
Relative Frequency
STAT600 Page 96
Battery Lifetimes
0.30
0.25
0.20
Density
0.15
0.10
0.05
0.00
0 2 4 6 8
hours (hundreds)
[SY10, p 194]
STAT600 Page 97
Example 5.3 Suppose we are going to model the battery lifetimes (in hundreds of hours)
using the following probability density function.
(
1 −x/2
e , x>0
f (x) = 2
0, otherwise
1. What is the probability that a battery lifetime is less than 200 hours?
Z 2
1 −x/2
P (X ≤ 2) = e dx
0 2
2
1 1 −x/2
= e
−1/2 2
0
2
= −e−x/2
0
−2/2
= −e + e0
= 0.6321206
2. What is the probability that a battery lifetime is greater than 400 hours?
STAT600 Page 98
3. What is the probability that a battery lifetime is less than 200 hours or greater
than 400 hours?
4. What is the probability that a battery lasts more than 300 hours, given that it has
already been in use for more than 200 hours?
STAT600 Page 99
5. Show that the function 21 e−x/2 , x > 0 is probability density function?
f (x)
Recall that: ef (x) dx = fe 0 (x) + C.
R
1. lim F (x) = 0
x→−∞
2. lim F (x) = 1
x→∞
3. The distribution function is a nondecreasing function; that is, if a < b then F (a) ≤
F (b).
P (a ≤ X ≤ b) = P (a < X ≤ b)
= P (a ≤ X < b)
= P (a < X < b)
Z b
= f (x)dx
a
= F (b) − F (a), a≤b
The cumulative distribution function(cdf) of X is the area under the density function for
values less than or equal to x. The definition stated above gives the relationship:
d
f (x) = F (x) = F 0 (x)
dx
Note that under the definition of the probability density function above, the probability
that a continuous random variable X equals a given value b exactly, is 0, i.e.
Z b
P (X = b) = P (b ≤ X ≤ b) = f (x)dx = 0
b
Therefore the probabilities of a continuous random variable do not change if strict (<, >)
and nonstrict (≤, ≥) inequalities are interchanged. Note: this is not the case for discrete
random variables.
Example 5.4 Let X be the lifetime (in hours) of a lightbulb and the pdf of X be:
(
0 if x < 0
f (x) = −0.001x
0.001e if x ≥ 0
2. What is the probability that a randomly selected light bulb lasts less than 1000
hours?
3. What is the probability that a randomly selected light bulb lasts less than 100
4. What is the probability that a randomly selected light bulb lasts between 100 and
1000 hours?
5. Find a number x such that a lightbulb survives the age x with probability 0.5.
Special case:
If X is a non-negative continuous random variable then
Z ∞
E[X] = P (X > x)dx
0
[SY10, p 202]
5.2.2 Variance
For a random variable X with probability density function f (x), the variance
of X is given by:
where µ = E[X].
[SY10, p 202]
Observe that the results for the expected value and variance of a continuous random
P
variable follow
R from the discrete case, whereby p(x) is replaced by f (x)dx and is
replaced by .
The expected value of X can be thought of as the average of the random variable and is
often called the mean of X and is denoted by µ. The variance is often denoted by σ 2 .
R∞
• 1st moment: E[X] = xf (x)dx (i.e. the mean of X).
−∞
R∞
• 2nd moment: E[X 2 ] = x2 f (x)dx
−∞
R∞
• kth moment: E[X k ] = xk f (x)dx
−∞
The theorems relating to the expectation of discrete random variables apply to the
continuous case as well:
f (x) = 0.001e−0.001x , x ≥ 0
and
F (x) = 1 − e−0.001x
The expected lifetime of the lightbulb is:
Z ∞ Z ∞
E[X] = xf (x)dx = P (X > x)dx
−∞ 0
Z ∞
= e−0.001x dx
0 −0.001x ∞
e
= −
0.001 0
1
=0−− = 1000
0.001
Example 5.6 Suppose that a random variable X has a probability density function given
by ( 2
x
, −1 < x < 2
f (x) = 3
0, otherwise
[SY10, ex 5.3, p199]
1.0
0.8
0.6
F(x)
0.4
0.2
0.0
−2 −1 0 1 2 3
In R:
Fx[x_range1] <- 0
Fx[x_range2] <- (x[x_range2]^3 + 1)/9
Fx[x_range3] <- 1
plot(x, Fx, type="l")
109
Chapter 6
Continuous Distributions
References:
[Ros13, Chapter 5]
[SY10, Chapter 5]
110
6.1 Uniform Distribution
6.1.1 Definition
0.8
0.20
0.6
F(x)
f(x)
0.4
0.10
0.2
0.00
0.0
0 2 4 6 8 10 0 2 4 6 8 10
x x
111
6.1.2 Uniform Distribution in R
dunif(x, min, max)
punif(x, min, max)
qunif(Fx, min, max)
runif(numSims, min, max)
(0.5-0)/(2-0)
## [1] 0.25
punif(0.5, 0, 2)
## [1] 0.25
6.1.3 Examples
Example 6.2 A farmer living in western Nebraska has an irrigation system to provide
water for crops, primarily corn, on a large farm. Although he has thought about buying
a backup pump, he has not done so. If the pump fails, delivery time X for a new pump
to arrive is uniformly distributed over the interval from 1 to 4 days. The pump fails. It is
a critical time in the growing season in that the yield will be greatly reduced if the crop
is not watered within the next 3 days. [SY10, p. 213, ex 5.7]
1. Assuming that the pump is ordered immediately and that installation time is
negligible, what is the probability that the farmer will suffer major yield loss?
1.0
0.8
1.5
0.6
F(x)
f(x)
1.0
0.4
0.5
0.2
0.0
0.0
0 2 4 6 8 10 0 2 4 6 8 10
x_seq x_seq
Example 6.5 Consider an exponential random variable X with mean 2. Find P (X > 3).
## [1] 0.22313
1 - pexp(3, lambda)
## [1] 0.22313
6.2.3 Examples
Example 6.6 A sugar refinery has three processing plants, all of which receive raw sugar
in bulk. The amount of sugar that one plant can process in one day can be modeled as
having an exponential distribution with a mean of 4 tons for each of the three plants.
[SY10, p. 220, ex 5.9]
1. What is the probability that a plant will process more than 4 tons on a given day?
2. If the plants operate independently, find the probability that exactly two of the
3. If the plants operate independently, find the probability that an odd number of
plants will process more than 4 tons on a given day.
4. Which of the following pdfs corresponds to the amount of sugar processed by one
plant?
4
0.20
3
dexp(x, 0.25)
0.15
dexp(x, 4)
2
0.10
1
0.05
0.00
0
0 5 10 15 20 0 1 2 3 4
x x
Memoryless Property
Example 6.7 Suppose a customer has been waiting in line to be served for x time units
and would like to know the probability that he or she will be required to wait a further t
units of time. The calculation of this probability does not depend on the length of time
already spent waiting, i.e. the distribution does not “remember” how long the customer
has been waiting.
Example 6.8 Show that an exponential random variable with cdf F (x) = 1−e−λx , x ≥ 0
has the memoryless property.
Variance: σ 2
Z ∼ N (0, 1)
X ~ Normal(µ,σ2) X ~ Normal(µ,σ2)
0.4
1.0
Normal(0, 1)
Normal(0, 9)
Normal(4, 25)
Normal(10, 25)
0.8
0.3
0.6
F(x)
0.2
f(x)
0.4
0.1
0.2
Normal(0, 1)
Normal(0, 9)
Normal(4, 25)
0.0
0.0
Normal(10, 25)
−10 0 10 20 −10 0 10 20
x x
Example 6.10 Consider a normal random variable X with mean 3 and variance 0.2.
Find P (X < 2.8).
pnorm(2.8, 3, sqrt(0.2))
## [1] 0.32736
mu <- 0
sigma <- 1
x <- seq(-4, 4, 0.001)
cols <- c(gray.colors(4, start = 0.1, end = 0.9, gamma = 2.2, alpha = NULL),
gray.colors(4, start = 0.9, end = 0.1, gamma = 2.2, alpha = NULL))
colText <- c("black")
#Create plot
plot(x, dnorm(x, mu, sigma), type="l", lwd=3, ylab="f(x)", yaxt="n", xaxt="n")
#Compute probabilities
Fx <- pnorm(-4:4, mu, sigma)
#Shade areas
polygon(x = c(i,ijvals , j),
y = c(0, dnorm(ijvals, 0, 1), 0),
col=cols[i+k], border="white")
µ − 3σ µ − 2σ µ − 1σ µ µ + 1σ µ + 2σ µ + 3σ
The cumulative distribution function is not available in closed form, but can be
evaluated
Properties
6.3.5 Examples
Example 6.11 A machining operation produces steel shafts where diameters have a
normal distribution with a mean of 1.005 inches and a standard deviation of 0.01 inch.
(Adapted from [SY10, p. 351, ex 5.87])
1. What is the probability that a randomly selected steel shaft will be less than 1.005
inches in diameter?
3. Specifications call for diameters to fall within the interval 1.00 ± 0.02 inches. What
percentage of the output of this operation will fail to meet specifications?
(i) Γ(1) = 1
## [1] 1
gamma(10)
## [1] 362880
#Inspect properties
factorial(9)
## [1] 362880
9*gamma(9)
## [1] 362880
X ~ Gamma(α,λ) X ~ Gamma(α,λ)
1.2
1.0
Gamma(0.5, 1)
Gamma(1, 1)
Gamma(2, 1)
1.0
Gamma(4, 1)
0.8
0.8
0.6
F(x)
f(x)
0.6
0.4
0.4
0.2
0.2
Gamma(0.5, 1)
Gamma(1, 1)
Gamma(2, 1)
0.0
0.0
Gamma(4, 1)
0 5 10 15 20 0 5 10 15 20
x x
• Component lifetimes (a few fail early, many have an “average” lifetime, and a few
last a very long time)
• Survival times
• Fish lengths
Example 6.13 Consider a gamma random variable X with shape parameter 2 and rate
parameter 3. Find P (X < 2.5).
pgamma(2.5, 2, 3)
## [1] 0.9953
Example 6.14 The weekly downtime X (in hours) for a certain industrial machine has
approximately a gamma distribution with α = 3.5 and λ = 2/3. (Adapted from [SY10,
p. 232, ex 5.67])
2. What is the probability that the weekly downtime will exceed 5 hours? Use the
relevant R code below to compute your answer.
3. Suppose the loss L (in dollars) to the industrial operation as a result of this
downtime is given by
L = 30X + 2X 2
Find the expected value of L.
Xn
Y ∼ Gamma( αi , λ)
i=1
Pn Pn
i=1 αi i=1 αi
E[Y ] = VAR[Y ] =
λ λ2
Example 6.15 A certain electronic system has a life length of X1 , which has an expo-
nential distribution with a mean of 450 hours. The system is supported by an identical
backup system that has a life length of X2 . The backup system takes over immediately
when the system fails. Assume that the systems operate independently. [SY10, ex 5.11,
p229]
1. Find the probability distribution and expected value for the total life length of the
primary and backup systems.
EX = 450
lambda <- 1/EX
alpha <- 2
## [1] 900.86
var(y_sim)
## [1] 408897
alpha/lambda
## [1] 900
alpha/lambda^2
## [1] 405000
0.0008
Y~Gamma(2, 0.0022 )
0.0006
Density
0.0004
0.0002
0.0000
y (total length)
mean(y_sim); var(y_sim)
## [1] 1199.3
## [1] 119983
## [1] 1200
## [1] 120000
Y~Gamma( 12 , 0.01 )
0.0010
Density
0.0005
0.0000
y (total length)
Mean: E[X] = αβ √
Standard deviation: STD(X) = αβ
Variance: VAR[X] = αβ 2
If the gamma distribution is parametrised in this way, then in R the following command
can be used:
pgamma(x, shape = alpha, scale = beta)
alpha <- 2
lambda <- 4
beta <- 1/lambda
## [1] TRUE
## [1] 0.50308
## [1] 0.49513
alpha/lambda; alpha*beta
## [1] 0.5
## [1] 0.5
## [1] 7.9115
## [1] 0.5
X ~ Weibull(α,θ) X ~ Weibull(α,θ)
2.0
1.0
Weibull(0.5, 1)
Weibull(1, 1)
Weibull(2, 1)
Weibull(4, 1)
0.8
1.5
0.6
F(x)
f(x)
1.0
0.4
0.5
0.2
Weibull(0.5, 1)
Weibull(1, 1)
Weibull(2, 1)
0.0
0.0
Weibull(4, 1)
0 1 2 3 4 5 6 0 1 2 3 4 5 6
x x
• Reliability
• Lifetime of a system
Example 6.19 Consider a Weibull random variable X with shape parameter α = 4 and
scale parameter θ = 3. Find P (X < 2.5).
1 - exp(-(2.5/3)^4)
## [1] 0.38261
pweibull(2.5, 4, 3)
## [1] 0.38261
1. Find the probability that a randomly selected bearing of this type will fail in less
than 100 hours.
2. Find the expected value of the fatigue life for these bearings.
Example 6.21 Suppose that Y ∼ Exp(1/3) and X = Y 1/4 . Generate 10000 random
variables from the exponential distribution, compute X, and show graphically that the
density of X is consistent with a Weibull distribution with θ = 31/4 and α = 4, i.e.
X ∼ Weibull(31/4 , 4).
x <- y^(1/alpha)
hist(x, prob=TRUE, main =
bquote("Y~Exp("~.(round(lambda, 4))~"),"~X == Y^{ .(1/alpha)}))
1.0
0.20
0.8
0.15
Density
Density
0.6
0.10
0.4
0.05
0.2
0.00
0.0
y x
alpha <- 2
theta <- 3
pweibull(seq(0, 10, 2), shape = alpha, scale = theta)
1.0
pweibull(x/theta, shape = alpha, scale = 1)
pweibull(x, shape = alpha, scale = theta)
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 2 4 6 8 10 0 2 4 6 8 10
x x
This example demonstrates the behaviour of the scale parameter, in particular, that
F (x, θ, α, ) = F (x/θ, 1, α)
αβ
Variance: VAR[X] = (α+β)2 (α+β+1)
Special case: The uniform distribution is a special case of the beta distribution
with α = 1 and β = 1.
X ~ Beta(α,β) X ~ Beta(α,β)
3.0
1.0
Beta(2, 2)
Beta(3, 3)
Beta(5, 3)
2.5
Beta(1, 1)
0.8
2.0
0.6
f(x)
f(x)
1.5
0.4
1.0
0.2
0.5
Beta(2, 2)
Beta(3, 3)
Beta(5, 3)
0.0
0.0
Beta(1, 1)
−0.5 0.0 0.5 1.0 1.5 2.0 −0.5 0.0 0.5 1.0 1.5 2.0
x x
pbeta(0.5, 2, 3)
## [1] 0.6875
6.6.4 Examples
Example 6.25 A gasoline wholesale distributor uses bulk storage tanks to hold a fixed
supply. The tanks are filled every Monday. Of interest to the wholesaler is the proportion
of the supply sold during the week. Over many weeks, this proportion has been observed
to match fairly well a beta distribution with α = 4 and β = 2. (Adapted from [SY10, p.
256, ex 5.19]
1. Find the expected value of the proportion of the supply sold each week. Verify your
answer using simulation.
Z x
Γ(α + β) α−1
P (X > x) = 1 − t (1 − t)β−1 dt
0 Γ(α)Γ(β)
Z 1
Γ(4 + 2) 4−1
P (X > 0.9) = t (1 − t)2−1 dt
0.9 Γ(4)Γ(2)
Z 1
120 3
= t (1 − t)dt
0.9 6 × 1
Z 1
= 20 (t3 − t4 )dt
0.9
4 1
t t5
= 20 −
4 5 0.9
4
0.95
1 1 0.9
= 20 − − −
4 5 4 5
= 0.08146
Check computation in R
1-pbeta(0.9, 4, 2)
## [1] 0.08146
Answer: No, it is unlikely that 90% of the stock will be sold, since P (X > 0.9) =
0.08146 is small.
pbeta(seq(0, 1, 0.1), 4, 2)
dbeta(seq(0, 1, 0.1), 4, 2)
## [1] 0.000 0.018 0.128 0.378 0.768 1.250 1.728 2.058 2.048
## [10] 1.458 0.000
qbeta(seq(0, 1, 0.1), 4, 2)
2. Set u = F (x)
3. Solve for x
This method is only appropriate when we know the functional form of F (x), as we need
it to be able to solve u = F (x) for x.
X ~ Exponential(λ)
0.8
0.6
F(x)
0.4
0.2
0 1 2 3 4 5
x−a
u=
b−a
u(b − a) = x − a
x = u(b − a) + a, 0 ≤ u < 1
Exponential:
For X ∼ Exp(λ) with F (x) = 1 − e−λx , x ≥ 0.
u = 1 − e−λx
1 − u = e−λx
ln(1 − u) = −λx
ln(1 − u)
x=− , 0≤u<1
λ
Let u be a unit random number and set u = F (x). Solve for x and use the result to
determine the value of x if u = 0.51679.
145
Chapter 7
Reliability
An Application of Continuous
Distributions
References:
[HKM, Chapter 10]
146
7.1 Introduction to Reliability
Recent reliability failures
• Airbags – recall
The probability that an item does not fail during some period of time t, is often used as
a measure of reliability.
R(t) = P (T > t)
The reliability at time t, R(t) is called the reliability function and can be
expressed in terms of the distribution function of T , i.e.
R(t) = 1 − P (T ≤ t) = 1 − F (t)
[HKM, pp. 373 – 374]
Example 7.1 If R(t) = 0.999 then on average only 1 item in every 1000 will fail during
t time units.
147
7.2 Mean Time to Failure
7.2.1 Definition
ef (x)
Z
ef (x) dx = +C
f 0 (x)
R∞
Example 7.3 Find 0
e−λx dx, λ > 0
∞ ∞
e−λx
Z
−λx
e dx =
0 −λ 0
1 1
= 0− =
−λ λ
1
Exponential 1 − e−λt e−λt
λ
t−a b−t (a + b)
Uniform ,a≤t≤b ,a≤t≤b
b−a b−a 2
Normal pnorm(t, mu, sigma) 1 - pnorm(t, mu, sigma) µ
β β 1
Weibull 1 − e−(t/α) e−(t/α) αΓ 1 +
β
## [1] 24
factorial(4)
## [1] 24
4*3*2*1
## [1] 24
To assess the reliability of the system, typically, the reliability of the components are
measured and then used to compute the reliability of the system.
[HKM, p. 375]
4. Find the probability that the system lasts more than 1000 hours.
set.seed(12354)
# Define parameters
lambda <- c(0.5, 0.4,0.25)
nSims <- 100000
head(sim_components)
colMeans(sim_components)
##
## FALSE TRUE
## 0.68351 0.31649
#Theoretical value
exp(-1.15*1)
## [1] 0.31664
# Estimate MTTF
mean(sim_systemlife)
## [1] 0.87046
#Theoretical value
1/1.15
## [1] 0.86957
[HKM, p. 375]
Example 7.8 The times to failure of three components are exponentially distributed with
means in thousands of hours given by 2, 2.5 and 4 respectively. Assume the components
are arranged in parallel and fail independently of each other.
4. Find the probability that the system lasts more than 1000 hours.
set.seed(12354)
head(sim_components)
colMeans(sim_components)
##
## FALSE TRUE
## 0.02896 0.97104
#Theoretical value
1 - (1-exp(-0.5*1))*(1- exp(-0.4*1))*(1- exp(-0.25*1))
## [1] 0.97131
# Estimate MTTF
mean(sim_systemlife)
## [1] 5.4024
#Theoretical value
0 - (1/(-0.25) + 1/(-0.4) - 1/(-0.65) + 1/(-0.5) - 1/(-0.75) - 1/(-0.9) + 1/(-1.15))
## [1] 5.3867
Discussion If component 1 fails, will this system fail? If component 3 fails, will
this system fail?
Example 7.9 The times to failure of three components are exponentially distributed with
means in thousands of hours given by 2, 2.5 and 4 respectively. Assume the components
are arranged as shown in the following diagram and fail independently of each other.
2. Find the probability that the system lasts more than 1000 hours.
Z ∞
MTTF = R(t)dt
Z0 ∞
= (e−0.75t + e−0.65t − e−1.15t )dt
0 −0.75t ∞
e e−0.65t e−1.15t
= + − )
−0.75 −0.65 −1.15 0
1 1 1
= 0−( + − )
−0.75 −0.65 −1.15
= 2.00223
set.seed(12354)
head(sim_components)
colMeans(sim_components)
##
## FALSE TRUE
## 0.32192 0.67808
#Theoretical value
exp(-0.75*1) + exp(-0.65*1) - exp(-1.15*1)
## [1] 0.67778
# Estimate MTTF
mean(sim_systemlife)
## [1] 2.0036
#Theoretical value
- (1/(-0.75) + 1/(-0.65) - 1/(-1.15))
## [1] 2.0022
2. Find the probability that the system lasts more than 2000 hours.
head(sim_components)
colMeans(sim_components)
head(sim_systemlife)
##
## FALSE TRUE
## 0.86557 0.13443
#Theoretical value
(1-(1-exp(-2))*(1-exp(-2*2/3)))*exp(-0.5*2)
## [1] 0.13364
## [1] 0.13364
# Estimate MTTF
mean(sim_systemlife)
## [1] 1.0632
#Theoretical value
-(1/(-1.5)+ 1/(-7/6) - 1/(-13/6))
## [1] 1.0623
163
Chapter 8
References:
[Win04, Chapter 17] (Available on the STAT600 Blackboard page
under Course Resources or in the Library)
164
8.1 A Simple Game
8.1.1 Game Introduction
Consider the following game. The player is presented with a 1 x 5 grid. Starting at square
1, the play must reach square 5 as quickly as possible. The player rolls a fair 6-sided die
to determine which move to make, so each roll is independent of subsequent rolls. The
player may move forward by either 1 or 2, stay in the same place or move backward by
1, 2 or 3. The player cannot move below square 1 and the game ends when the player
reaches square 5.
1 2 3 4 5
Start End
165
8.1.3 Playing the Game
Game time! Roll the die and determine your next move. Complete the table below
after each roll:
Suppose you are in square 1, what is the probability of being in square 5 (i.e. winning
the game) after:
• 1 roll
• 2 rolls
Time
State
The set S of all possible values of X(t) is called the state space. The state space can
be:
• Continuous
Properties:
• The number in the ith row and jth column of a matrix A is called the ijth
element of A and is written aij .
• Two matrices A and B are equal if and only if aij = bij for all i and j
a11 a12 . . . a1n
a21 a22 . . . a2n
.. ..
A= . .
.. ..
. .
am1 am2 . . . amn
2 −4 5 0.2 1.4
B= C=
5 1 0 5.5 1.0
Suppose that m, n and r are positive integers and suppose A is a m × r matrix and B is
a r × n matrix. Then the product AB is a m × n matrix.
b11 . . . b1n
a11 . . . . . . a1r .. c11 . . . c1n
.
.. .. ..
=
. .
.
.. .
am1 . . . . . . amr cm1 . . . cmn
br1 . . . brn
# Define matrix A
(A <- matrix(c(1, 1, 2,
2, 1, 3), nrow=2, byrow=TRUE))
# Define matrix B
(B <- matrix(c(1, 1,
2, 3,
1, 2), nrow=3, byrow=TRUE))
## [,1] [,2]
## [1,] 1 1
## [2,] 2 3
## [3,] 1 2
Multiply matrices
A %*% B
## [,1] [,2]
## [1,] 5 8
## [2,] 7 11
B %*% A
# Define matrix A
(A <- matrix(c(2, 4,
3, 1), nrow=2, byrow=TRUE))
## [,1] [,2]
## [1,] 2 4
## [2,] 3 1
# Define matrix B
(B <- matrix(c(1, 2,
2, 3), nrow=2, byrow=TRUE))
## [,1] [,2]
## [1,] 1 2
## [2,] 2 3
# Multiply matrices
A %*% B
## [,1] [,2]
## [1,] 10 16
## [2,] 5 9
B %*% A
## [,1] [,2]
## [1,] 8 6
## [2,] 13 11
4x = 3
Ax = b
A−1 Ax = A−1 b
x = A−1 b
A square matrix is any matrix that has an equal number of rows and
columns.
The diagonal elements of a square matrix are those elements aij such that
i = j.
A square matrix for which all diagonal elements are equal to 1 and all
nondiagonal elements are equal to 0 is called an identity matrix. [Win04,
p36]
Example:
1 0 0 0 0
1 0 0 0 1 0 0 0
1 0
I2 = I3 = 0 1 0 0
I5 = 0 1 0 0
0 1
0 0 1 0 0 0 1 0
0 0 0 0 1
• If A is an m × m matrix, then
Im A = AIm = A
## [,1] [,2]
## [1,] 2 5
## [2,] 1 3
BA = AB = Im
[Win04, p37]
If there is a matrix B that satisfies BA = AB = Im , then we say B = A−1 and call A−1
the inverse of A. Example: Consider the matrices A and B. Verify that B = A−1 .
2 5 3 −5
A= B=
1 3 −1 2
If B = A−1 , then AB = I2 .
2 5 3 −5 1 0
AB = =
1 3 −1 2 0 1
• Gauss-Jordan Method (beyond the scope of this course - see [Win04, p39] for an
example)
• Using R
2 5
Example: Consider the matrix A = . Use R to find A−1 .
1 3
A = matrix(c(2, 5, 1, 3), nrow = 2, byrow = TRUE)
solve(A)
## [,1] [,2]
## [1,] 3 -5
## [2,] -1 2
A %*% solve(A)
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
solve(A) %*% A
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
The future of the process {Xn+1 = j} depends only the present {Xn = i}, not the past
{Xn−1 = in−1 , . . . , X0 = i0 }.
That is, given the present state of the process, the future state is independent of the past
states.
History.
Consider a discrete time, discrete space stochastic process {Xn }. Let Pi,j
denote the probability that the process will move to state j given that it is
currently in state i, such that:
• Stock prices
• Inventory levels
• Population growth
• Queues
• Speech recognition
• Bioinformatics
For all states i, j ∈ S, the transition probabilities can be displayed in the (one-step)
transition matrix, P.
p0,0 p0,1 p0,2 · · · p0,j ···
p1,0 p1,1 p1,2 · · · p1,j · · ·
P= ··· ··· ··· ··· ··· · · ·
pi,0 pi,1 pi,2 · · · pi,j · · ·
··· ··· ··· ··· ··· ···
Let Xn denote the nth person to throw the ball. The state space for this problem is:
S = {A, B, C}.
This is a Markov chain since the person throwing the ball is not directly influenced by
those who previously had the ball.
Transition Matrix
The probabilities can also be represented as a matrix called the transition matrix P:
p 1−p
P=
1−p p
Probability Pij is the probability from moving from state i to state j. For example:
• the probability of being in state 0 at the next stage, given that the signal is currently
in state 0 is P00 = p
• the probability of being in state 1 at the next stage, given that the signal is currently
in state 0 is P01 = 1 − p
At each stage, the signal has to either stay the same, or change, so the rows of the
transition matrix must sum to 1, i.e. P00 + P01 = 1.
Let Xn denote the state of the system at stage n = 0, 1, 2, . . ..
1. P (X1 = 0|X0 = 0) = p
2. P (X1 = 1|X0 = 0) = 1 − p
3. P (X1 = 0|X0 = 1) = 1 − p
4. P (X1 = 1|X0 = 1) = p
• P (X2 = 0|X0 = 0) =
• P (X2 = 1|X0 = 0) =
State at stage n = 2
0 1
State at stage n = 0 0
1
The probability of moving from state i to state j after 2 stages can be represented in a
matrix:
2 p 1−p p 1−p
P =
1−p p 1−p p
2
p + (1 − p)2 2p(1 − p)
=
2p(1 − p) p2 + (1 − p)2
n
The n-step transition probability Pi,j of the Markov Chain is the
probability that the chain will be in state j after n transitions, given that it
is currently in state i.
n
Pi,j = P (Xn+m = j|Xm = i), n ≥ 0, i, j, ≥ 0
The probability of moving from state i to state j after n transitions is given by the
n-step transition matrix Pn , where Pn is the transition matrix P multiplied by itself
n times.
P(2) = P(1+1) = P · P = P2
P(n) = Pn
• if it has rained for the past two days, then it will rain tomorrow with probability
0.7
• if it rained today but not yesterday, then it will rain tomorrow with probability 0.5
• if it rained yesterday but not today, then it will rain tomorrow with probability 0.4
• if it has not rained in the past two days, then it will rain tomorrow with probability
0.2
Let the state at time n be determined by the weather conditions during that day and the
previous day. For instance the process is in:
0 0.2 0 0.8
Therefore the two-step transition matrix is:
0.49 0.12 0.21 0.18
0.35 0.20 0.15 0.30
P2 = 0.20
0.12 0.20 0.48
0.10 0.16 0.10 0.64
2 2
P (X2 = 0|X0 = 0) + P (X2 = 1|X0 = 0) = P0,0 + P0,1
= 0.49 + 0.12
= 0.61
c. Suppose that it rained on Monday and Tuesday. What is the probability that it will
rain on Friday? (See week 8 lab for further details)
Therefore the probability it will rain on Friday, given it has rained on Monday and
Tuesday is:
186
Chapter 9
References:
[Win04, Chapter 17]
187
9.1 State Transition Diagram
The transition probability matrix of a Markov chain can be represented graphically using
a directed graph.
• Arcs: The transition probability Pi,j is represented by the arc (i, j).
1 0.5
0.5
B C
188
9.2 Computing Probabilities for Markov Chains
Recall that:
P (A ∩ B) = P (A)P (B|A)
and
It can be shown by induction that this result holds for more than n = 2 transitions.
(0)
The probability distribution Pj = P (X0 = j), j ∈ S is called the
initial distribution of the chain.
P (Xn = sn , . . . , X2 = s2 , X1 = s1 ,X0 = s0 )
= Ps(0)
0
Ps0 ,s1 Ps1 ,s2 . . . Psn−1 ,sn
Questions of interest:
a. If a person is currently a cola 2 purchaser, what is the probability that she will purchase
cola 1 two purchases from now?
b. If a person is currently a cola 1 purchaser, what is the probability that she will purchase
cola 1 three purchases from now?
library(shape)
library(diagram)
# Default plot
plotmat(t(P))
# Customise plot
plotmat(t(P), curve=0.3, pos=c(2), box.size=0.1,
self.shifty = c(0.1, 0.1),
self.shiftx = c(-0.1, 0.1),
self.lwd = 2,
self.arrpos = c(1, 1),
shadow.size = 0,
cex = 1,
box.cex = 1.5)
a. If a person is currently a cola 2 purchaser, what is the probability that she will purchase
cola 1 two purchases from now?
2. If a person is currently a cola 1 purchaser, what is the probability that she will
purchase cola 1 three purchases from now?
3 0.9 0.1 0.83 0.17
2 0.781 0.219
P =P ·P = =
0.2 0.8 0.34 0.66 0.438 0.562
3
We seek P1,1 = 0.781.
# Define matrix P
P <- matrix(data=c(0.9, 0.1,
0.2, 0.8),
nrow=2, ncol=2, byrow=TRUE)
P; P2; P3
## [,1] [,2]
## [1,] 0.9 0.1
## [2,] 0.2 0.8
## [,1] [,2]
## [1,] 0.83 0.17
## [2,] 0.34 0.66
## [,1] [,2]
## [1,] 0.781 0.219
## [2,] 0.438 0.562
## [1] 5
## [,1] [,2]
## [1,] 0.72269 0.27731
## [2,] 0.55462 0.44538
## [1] 10
## [,1] [,2]
## [1,] 0.67608 0.32392
## [2,] 0.64783 0.35217
## [1] 15
## [,1] [,2]
## [1,] 0.66825 0.33175
## [2,] 0.66350 0.33650
## [1] 20
## [,1] [,2]
## [1,] 0.66693 0.33307
## [2,] 0.66613 0.33387
pi_est
library(expm)
#to use matrix power %^%
p50= P %^% 50
p50
## [,1] [,2]
## [1,] 0.66667 0.33333
## [2,] 0.66667 0.33333
If state j is accessible from i and state i is accessible from j, then states i and
j communicate. Symbolically, communication of states i and j is represented:
i↔j
1. i ↔ j
2. if i ↔ j then j ↔ i
3. if i ↔ j and j ↔ k, then i ↔ k
The state space can be divided up into classes depending on whether or not states
communicate.
• As a result of the properties above, any two classes of states are either
identical or disjoint.
• A class is a closed set, once the chain enters a closed set it can never
leave.
0 13 23
Yes. Even though P1,3 = 0, we can access state 3 by first going via state 2. Since
n
P1,3 > 0 (for some n), state 3 is accessible from state 1. In this case we can see
2
that P1,3 > 0.
Example 9.5 Consider a Markov chain with four states and having transition probability
matrix: 1 1
2 2
0 0
1 1 1 0
P = 14 41 1
2
0 4 4 2
0 0 0 1
– State 1 is transient because state 4 is reachable from state 1, but state 1 is not
reachable from state 4.
– State 2 and 3 are also transient for the same reason as state 1.
– State 4 is not transient because there is no other state j such that j is reachable
n
from 4 and 4 is not reachable from j. i.e. There is no j such that P4j > 0 and
n
Pj4 = 0. Therefore, State 4 is recurrent.
Let Xn denotes his total stake (winnings plus initial capital) after n games. {Xn } is a
Markov chain with state space S = {0, 1, ..., N }.
Transition probabilities:
In this example, state 0 and 4 are absorbing states and are also recurrent states. States
1, 2, 3 are transient states.
1. If I start with $1 what is the probability that I will have $4 in, say, 6 steps?
3. On the average how long will it take for me to be ruined or to win $N?
Each state has period 3. Starting in state 1, the only way to return is to take the path
1 → 2 → 3 → 1 some number of times m. It will take 3m transitions to return to state
1, so state 1 has period 3.
• 1→2→1
• 1→2→1→2→1
• 1→2→1→2→1→2→1
States 1 and 2 are periodic with period 2. State 3 is aperiodic. Starting in state 1, the
only way to return is to take the path 1 → 2 → 1 some number of times m. It will take
2m transitions to return to state 1, so state 1 has period 2. Similarly for state 2.
If all states in a chain are recurrent, aperiodic and communicate with each
other, then the chain is said to be ergodic.
n
For an irreducible, ergodic Markov chain, lim Pi,j exists and is independent
n→∞
of i. Furthermore, letting
n
πj = lim Pi,j
n→∞
π = πP
The steady state probability πj can also be interpreted as the long-run proportion of
time that the chain is in state j.
π1 = 0.9π1 + 0.2π2
π2 = 0.1π1 + 0.8π2
1 = π1 + π2
This chain has s = 2 states, so we use the first (s − 1) equations and the last equation:
π1 = 0.9π1 + 0.2π2
1 = π1 + π2
2
π1 =
3
1
π2 =
3
2
After a long time, the probability that a person will purchase cola 1 is 3
and the
probability that they will purchase cola 2 is 13 .
Should the company that makes cola 1 hire the advertising firm?
0.9π1 + 0.2π2 = π1
π1 + π 2 = 1
−0.1π1 + 0.2π2 = 0
π1 + π 2 = 1
Or equivalently
−0.1 0.2 π1 0
=
1 1 π2 1
This has the form Aπ = b. So π = A−1 b.
It can be shown that:
−1 −10/3 2/3
A =
10/3 1/3
Therefore
π = A−1 b
−10/3 2/3 0
=
10/3 1/3 1
2/3
=
1/3
solve(A) %*% b
## [,1]
## [1,] 0.66667
## [2,] 0.33333
# Method 1
# Compute P - I(n) and select first n-1 rows
A = t(P - diag(nStates))[1:(nStates-1),]
# Add "sum to 1" constraints
A = rbind(A, rep(1, nStates))
# Define RHS of system of equations
b <- c(rep(0, nStates-1), 1)
# Solve
pi_theoretical <- solve(A,b)
round(pi_theoretical,4)
MASS::fractions(pi_theoretical)
# Method 2
# Compute P - I(n)
a = t(P - diag(nStates))
# Add "sum to 1" constraints
a = rbind(a, rep(1, nStates))
# Define RHS of system of equations
d <- c(rep(0, nStates), 1)
qr.solve(a, d)
# Method 3
eigenP <- eigen(t(P))
ev <- eigenP$vectors[,1] / sum(eigenP$vectors[,1])
ev
MASS::fractions(ev)
Simulate this Markov Chain “by hand” using the probabilities below.
For example, if in state 1, and probability is in (0, 0.9), next state is state 1. If probability
is in (0.9, 1) then next state is state 2.
Stage State Probability Next State
0 1 0.0702
1 0.233
2 0.0479
3 0.9999
4 0.7361
5 0.8035
6 0.0249
7 0.8194
8 0.6268
9 0.3876
10 0.0501
set.seed(1234567)
#Specify states
S <- c(1,2)
M <- length(S)
## 1 2
## 1 0.9 0.1
## 2 0.2 0.8
#Initialize vector x
x <- rep(NA, numStages+1)
names(x) <- 0:numStages
Inspect output
head(x, 10)
## 0 1 2 3 4 5 6 7 8 9
## 1 1 1 2 2 2 2 2 1 1
## x
## 1 2
## 67080 32921
#Plot trajectory
plotLen <- 50
plot(names(x[1:plotLen]), x[1:plotLen], type='s',
xlab="Time", ylab='State' , yaxt='n')
axis(2, at = c(1,2), labels = S)
head(x, 10)
## 0 1 2 3 4 5 6 7 8 9
## 1 1 1 2 2 2 2 2 1 1
2
State
0 10 20 30 40 50
Time
#head(mat_sum)
abline(h=pi_theoretical[i], col=mycols[i])
text(numStagesPlot*(1-0.03*i), pi_theoretical[i],
parse(text =paste("pi[", i, "]")),
col=mycols[i], cex = 1.5)
}
# Add legend
legend("topright",
leg=c(paste0("State ", S), "Theory", "Simulation"),
col=c(mycols, 1, 1),
lty=c(rep(NA, M), 1, 2),
pch=c(rep(16, M), NA, NA),
cex = 1)
0.8
0.6 π1
πi
0.4
π2
0.2
0.0
Discussion Would we get the same results if we run the simulation again?
head(x, 10)
## 0 1 2 3 4 5 6 7 8 9
## 1 1 1 2 2 2 2 2 1 1
table(x[1:9],
x[2:10])
##
## 1 2
## 1 3 1
## 2 1 4
##
## 1 2
## 1 60476 6604
## 2 6603 26317
##
## 1 2
## 1 0.90155 0.09845
## 2 0.20058 0.79942
MASS::fractions(P_sim)
##
## 1 2
## 1 1163/1290 127/1290
## 2 6603/32920 26317/32920
P - P_sim
## 1 2
## 1 -0.00155039 0.00155039
## 2 -0.00057716 0.00057716
To demonstrate the process of estimating the transition matrix we will simulate some
data. We will assume that the difference between the arrival time and the scheduled time
(minutes) can be modelled by a normally distributed random variable with mean 0 and
standard deviation of 5.
Simulate data
set.seed(123345456)
# Generate data
n <- 20
df <- data.frame(x = rnorm(n, 0 , 5))
library(tidyverse)
df <- mutate(df,
cat_x = cut(x, c(-Inf, -1, 1, Inf),
ordered_result=TRUE,
labels=c("early", "on-time", "late")) )
## x cat_x
## 1 0.802981 on-time
## 2 2.305315 late
## 3 -9.012655 early
## 4 4.814735 late
## 5 2.292868 late
## 6 4.412924 late
## 7 5.541629 late
## 8 3.866881 late
## 9 -9.069756 early
## 10 6.327343 late
## 11 5.345333 late
## 12 -2.140519 early
## 13 -0.063766 on-time
## 14 4.289332 late
## 15 -5.666453 early
## 16 4.628747 late
## 17 4.771764 late
## 18 1.774790 late
## 19 -2.661365 early
## 20 2.211989 late
##
## early on-time late
## 5 2 13
Early
On-Time
Late
# Explore transitions
trans_freq <- table(df$cat_x[1:(nrow(df)-1)],
df$cat_x[2:nrow(df)])
trans_freq
##
## early on-time late
## early 0 1 4
## on-time 0 0 2
## late 5 0 7
P <- trans_freq/rowSums(trans_freq)
P
##
## early on-time late
## early 0.00000 0.20000 0.80000
## on-time 0.00000 0.00000 1.00000
## late 0.41667 0.00000 0.58333
MASS::fractions(P)
##
## early on-time late
## early 0 1/5 4/5
## on-time 0 0 1
## late 5/12 0 7/12
Discussion What would make the estimation of this transition matrix more
accurate?
214
Chapter 10
References:
[Win04, Chapter 17]
215
10.1 Example: Stock Prices
Example 10.1 Consider two stocks. Stock 1 always sells for $10 or $20. If stock 1 is
selling for $10 today, there is a .70 chance that it will sell for $10 for tomorrow. If it
is selling for $20 today, there is a .75 chance that it will sell for $20 tomorrow. Stock
2 always sells for $10 or $25. If stock 2 sells today for $10 there is a .90 chance that it
will sell tomorrow for $10. If it sells today for $25, there is a .85 chance that it will sell
tomorrow for $25.
The price of the stocks over time can be modelled by two Markov chains.
• Does the steady state distribution exist for this Markov chain?
• Find the steady state probabilities.
• In the long-run, what is the average price of the stock?
• If the stock is currently selling for $10, on average, how long until it will next sell
for $10?
• If the stock is currently selling for $10, on average, how long until it will next sell
for $20?
216
Does the steady state distribution exist for this Markov chain?
• Recurrent: There are no transient states (there is only one class) so all states are
recurrent.
• Aperiodic: No states are periodic (no k > 1 such that all paths back to a state
are a multiple of k), so the Markov chain is aperiodic.
• Communicate: All states communicate with each other, so there is one class and
thus the Markov chain is irreducible.
• Ergodic: All states communicate with each other, are recurrent, and are aperiodic
so the Markov chain is ergodic, therefore the steady state distribution exists.
Find the steady state probabilities.
This chain has s = 2 states, so we use the first (s − 1) equations and the last equation:
π1 = 0.7π1 + 0.25π2
1 = π1 + π 2
Solving for π1 and π2 gives:
5
π1 = = 0.4545455
11
6
π2 = = 0.5454545
11
In the long run the price will be $10 about 45% of the time and $20 about 55% of the
time. Therefore the average stock price is:
For an ergodic Markov chain, let the mean first passage time mi,j =
the expected number of transitions before we reach state j, given that we are
currently in state i. [Win04, p939]
The mean first passage times can be found by solving the following set of
equations:
X
mi,j = 1 + pi,k mk,j (10.1)
k6=j
The mean first passage times to state i given that the system is currently in
state i is:
1
mi,i = (10.2)
πi
If the stock is currently selling for $10, on average, how long until it will next
sell for $10?
1
mi,i =
πi
1
m1,1 =
0.4545455
= 2.2 days
If the stock is currently selling for $10, on average, how long until it will next
sell for $20?
X
mi,j = 1 + pi,k mk,j
k6=j
X
m1,2 = 1 + p1,k mk,2
k6=2
1
m2,2 =
0.54545
= 1.833 days
X
m2,1 = 1 + p2,k mk,1
k6=1
Example 10.3 A gambler plays a series of independent games. At each game he either
wins $1 with probability p or loses $1 with probability 1 − p. The gambler stops playing
either if he goes broke or as soon as he amasses N = $4, whichever happens first.
1 0 0 0 0
1 − p 0 p 0 0
P= 0
1−p 0 p 0
0 0 1 − p 0 p
0 0 0 0 1
If {Xn } is an absorbing Markov chain, then we can classify the states as follows:
If we list all of the states S, first by the transient states and then by the absorbing states,
we can rewrite the transition matrix as follows:
• I is an m × m identity matrix
1 2 3 0 4
1 0 p 0 1−p 0
2
1−p 0 p 0 0
P = 3
0 1−p 0 0 p
0 0 0 0 1 0
4 0 0 0 0 1
1 2 3 0 4
1 0 p 0 1 1−p 0
Q= 2 1−p 0 p R= 2 0 0
3 0 1−p 0 3 0 p
0 4
0 1 0
I=
4 0 1
If the chain begins in a given transient state ti , and before we reach an absorbing
state, what is the expected number of times that each state tj will be entered?
If we are in transient state ti , the expected number of periods that will be spent in
transient state tj before absorption is the ijth element of the matrix (I − Q)−1 .
Note: the matrix (I − Q)−1 is often called the Markov chain’s fundamental matrix.
Here I is a (s − m) × (s − m) identity matrix.
Example 10.5 The law firm of Mason and Burger employs three types of lawyers: junior
lawyers, senior lawyers, and partners. During a given year, there is a .15 probability that
a junior lawyer will be promoted to senior lawyer and a .05 probability that he or she will
leave the firm. Also, there is a .20 probability that a senior lawyer will be promoted to
partner and a .10 probability that he or she will leave the firm. There is a .05 probability
that a partner will leave the firm. The firm never demotes a lawyer. [Win04, p 943]
1. What is the average length of time that a newly hired junior lawyer spends working
for the firm?
2. What is the probability that a junior lawyer will be promoted to partner before
leaving the firm?
3. What is the average length of time that a partner spends with the firm (as a
partner)?
Transition Matrix
Transition Diagram
J S P
0.15 0.2
0.1
0.05 0.05
1 1
LeaveNP LeaveP
Transient states:
Absorbing states:
Q= R= I=
1. What is the average length of time that a newly hired junior lawyer spends working
for the firm?
Expected time junior lawyer spends with firm = expected time as a junior +
expected time as a senior +
expected time as a partner
−1 −1 −1
=(I − Q)11 + (I − Q)12 + (I − Q)13
=5 + 2.5 + 10
=17.5 years
2. What is the probability that a junior lawyer will be promoted to partner before
leaving the firm?
5 2.5 10 0.05 0
(I − Q)−1 R = 0 10/3 40/3 · 0.1 0
0 0 20 0 0.05
Leave as NP Leave as P
Junior 0.5 0.5
= Senior 1/3 2/3
Partner 0 1
3. What is the average length of time that a partner spends with the firm (as a partner)?
## J S P LeaveNP LeaveP
## J 0.8 0.15 0.00 0.05 0.00
## S 0.0 0.70 0.20 0.10 0.00
## P 0.0 0.00 0.95 0.00 0.05
## LeaveNP 0.0 0.00 0.00 1.00 0.00
## LeaveP 0.0 0.00 0.00 0.00 1.00
## J S P LeaveNP LeaveP
## TRUE TRUE TRUE TRUE TRUE
#Alternative method
all.equal(rowSums(P), rep(1,5), check.names = FALSE)
## [1] TRUE
#Define matrices
Q = P[1:3, 1:3]
R = P[1:3, 4:5]
I = diag(dim(Q)[1])
## J S P
## J 5 2.5000 10.000
## S 0 3.3333 13.333
## P 0 0.0000 20.000
fractions(N)
## J S P
## J 5 5/2 10
## S 0 10/3 40/3
## P 0 0 20
## LeaveNP LeaveP
## J 0.50000 0.50000
## S 0.33333 0.66667
## P 0.00000 1.00000
R demo
## J S P
## J 1 5.5511e-17 0
## S 0 1.0000e+00 0
## P 0 0.0000e+00 1
## [1] 5.5511e-17
max(abs(N%*%(I-Q) -diag(3)))
## [1] 1.1102e-16
#Alternative method
all.equal((I-Q)%*%N , diag(3), check.attributes = FALSE)
## [1] TRUE
## [1] TRUE
#Define matrices
Q = P[1:3, 1:3]
R = P[1:3, 4:6]
I = diag(dim(Q)[1])
## J S P
## J 5 2.5000 10.000
## S 0 3.3333 13.333
## P 0 0.0000 20.000
# Multiply inverse by R
N %*% R
Example 10.6 Consider an application of the gambler’s ruin problem in section 9.3.2
called the “drunkard’s walk”. A patron of a local drink establishment is walking home
and is stumbling along. He is equally likely to stumble left or right. To get home he must
cross a bridge. The bridge is 10 steps long. Unfortunately, if he stumbles 5 steps right he
will fall off the bridge into the water, if he stumbles 5 steps left he will hit a fence which
will propel him back towards the middle of the bridge in his next step. Assuming that he
starts in the middle of the bridge, what is the probability that he will fall off the bridge
before reaching the other side?
Simulate this Markov chain using R. Repeat the simulation a large number of times.
#Specify states
S <- -5:5
M <- length(S)
print(round(P, 4))
## -5 -4 -3 -2 -1 0 1 2 3 4 5
## -5 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## -4 0.5 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## -3 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## -2 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0
## -1 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 0.0
## 0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0
## 1 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0
## 2 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0
## 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0
## 4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5
## 5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
set.seed(123465)
#Initialize vector
fallInWater <- rep(NA, numSimulations)
x0 <- 0
for(j in 1:numSimulations){
#Specify starting value x0 and assign to first element of x
x <- rep(NA, maxNumSteps +1)
names(x) <- 0:maxNumSteps
x[1] <- x0
#Select row
rowIndex <- which(x[i]==S)
if(x[i+1]==xWater){
fallInWater[j] <- TRUE
#break; #uncomment to stop simulation once in water
}
}
}
## fallInWater
## TRUE
## 106
## 0 1 2 3 4 5 6 7 8 9 10
## 0 -1 -2 -3 -4 -5 -5 -5 -5 -5 -5
#Plot trajectory
plot(names(x), x, type='s', xlab="Time",
ylab='State', ylim=c(xWater, xFence))
abline(h=xWater, col="blue", lwd=6)
abline(h=xFence, col="gray", lwd=6)
0
−2
−4
0 2 4 6 8 10
Time
• Gambler’s Ruin
• Inventory planning
The probability that the chain will be back in its initial state after 2n transitions is given
by [Ros02, p 111]:
2n 2n n (2n)!
P0,0 = p (1 − p)n = [p(1 − p)]n
n n!n!
Using this result it can be shown:
• If p = 0.5 then the chain is recurrent and the chain is called a symmetric random walk.
Example 10.7 Consider a simple random walk, with p = 0.4. Suppose this Markov
chain is in state 0 at time 0.
1. What is the probability that the Markov chain will be in state 0 at time 1?
P0,0 = 0
2. What is the probability that the Markov chain will be in state 1 at time 1?
P0,0 = p = 0.4
2n 2n n (2n)!
P0,0 = p (1 − p)n = [p(1 − p)]n
n n!n!
10 2×5 10
P0,0 = P0,0 = 0.45 (1 − 0.4)5
5
= 0.20066
10
Answer: P0,0 = 0.20066
In R:
choose(10, 5)*(0.4^5)*(0.6^5)
## [1] 0.20066
## [1] 0.20066
4. What is the probability that the Markov chain will be in state 0 at an odd-numbered
state?
2n−1
P0,0 = 0, n = 1, 2, . . .
• With probability 13 , 0 units are demanded during the period; with probability 13 ,
1 unit is demanded during the period; with probability 13 , 2 units are demanded
during the period.
234
Chapter 11
References:
[Ros13, §8.3]
[SY10, §8.4]
235
11.1 Examples
11.1.1 Normal Distribution
Example 11.1 A coffee machine has two mechanisms: one which provides milk and one
which provides coffee. Under the “flat white” setting, the amount of coffee dispersed is
normally distributed with mean 40 mls and standard deviation of 5 mls; and the amount
of milk dispersed is normally distributed with mean 100 mls and standard deviation of 8
mls. The amount of coffee and milk dispersed are independent. George places a 150ml
cup under this machine and requests a flat white. What is the probability that his cup
will overflow?
Explore using simulation
## [1] 140
## [1] 9.434
hist(x)
hist(y)
hist(w)
## [1] 0.14457
##
## FALSE TRUE
## 85615 14385
20000
15000
15000
10000
Frequency
Frequency
Frequency
10000
10000
5000
5000
5000
0
x y w
Theoretical answers
236
Let W be the total amount of liquid dispersed in a “flat white”, therefore W = X + Y .
Suppose the number of vehicles that are owned by university students has an exponential
distribution with λ = 2.
## lambda = 2
cat("n =", n)
## n = 36
# Plot x
cat("E(X) =", mean(x))
## E(X) = 0.50167
## SD(X) = 0.50243
## E(Y) = 18 , SD(Y)= 3
0.15
5
1.0
0.10
0.8
Density
Density
Density
3
0.6
0.05
0.4
2
0.2
0.00
0.0
x colMeans(x) colSums(x)
Theoretical answers
X ∼ Exp(2)
Y = a1 X1 + a2 X2 + . . . an Xn
Then:
Example 11.3 Let W denote the number of faults observed each week on a particular
printer. The probability mass function of W is:
w 0 1 2
p(w) 0.8 0.15 0.05
2. Find E[W ].
3. Find VAR[W ].
In particular:
Let Xi ∼ N (µi , σi2 ); i = 1, 2, . . . , n be independent random variables and define
Y = X 1 + X2 + · · · + Xn
# Define X
E_X <- 1.40
STD_X <- .05
# Define Y
n <- 72
E_Y <- n*E_X
VAR_Y <- n*STD_X^2
STD_Y <- sqrt(VAR_Y)
E_Y; VAR_Y; STD_Y
## [1] 100.8
## [1] 0.18
## [1] 0.42426
# Simulate X and Y
x <- replicate(1e4, rnorm(n, E_X, STD_X))
y <- colSums(x)
hist(y, prob=TRUE)
curve(dnorm(x, E_Y, STD_Y), col="red", add=TRUE)
0.8
6
0.6
Density
Density
4
0.4
2
0.2
0.0
0
1.2 1.3 1.4 1.5 1.6 99.0 99.5 100.0 100.5 101.0 101.5 102.0 102.5
x y
## [1] 0.97033
##
## FALSE TRUE
## 0.0276 0.9724
Let X ∼ Bin(n, p), and Y ∼ N (np, np(1 − p)), then for suitable n and p:
• For large values of n and values of p that are not extreme, X will have an ap-
proximate
p normal distribution with mean µ = np and standard deviation σ =
np(1 − p).
• Various rules are suggested for the required values of n and p. A good rule of thumb
is np > 5 and n(1 − p) > 5.
## [1] 50
## [1] 6.892
## [1] 0.87312
## [1] 0.87312
## [1] 0.87237
## [1] 0.85321
N <- solve(I-Q)
N
(Note: computation of the variance of the time till absorption is not assessed, but is
included here for completion).
Let T denote the time that a junior lawyer spends working for the firm. T is a discrete
random variable. We seek E[T ].
Expected time junior lawyer spends with firm =N1,1 + N1,2 + N1,3
=5 + 2.5 + 10
=17.5 years
One way to investigate this is by simulating this Markov chain repeatedly and inspecting
the results.
Define values and compute theoretical answers
# Define states
transientStates <- S[1:3]
absorbStates <- S[4:5]
Q <- P[1:3, 1:3]
R <- P[1:3, 4:5]
I <- diag(3)
for(j in 1:numLawyers){
#Print i after 10% of lawyers
if(j %% (numLawyers/10)==0 & verbose) cat("Sim", j)
break;
}#END ELSE
i <- i+1
}#END WHILE
}#END j
return(list(law=law, x = x))
}# END lawFirmSim
Run simulation
#Run Simulation
set.seed(123456789)
num_lawyers <-1000
max_years <- 500
Inspect data
# Inspect career of lawyer j=numLawyers
lengthOfCareer_finalLawyer <- min(which(is.na(law_data$x)))-2
law_data$x[lengthOfCareer_finalLawyer+2] <- law_data$x[lengthOfCareer_finalLawyer+1] #for nicer plot
law_data$x <- na.omit(law_data$x)
plot(names(law_data$x), factor(law_data$x, levels=S),
type="s", xlab="Years", ylab="State",
yaxt="n", ylim=c(1, length(S)))
axis(2, labels=S, at=1:length(S))
LeaveP
LeaveNP
Partner
State
Senior
Junior
0 5 10 15
Years
# Compare theoretical results to simulated results for sample of size, numLawyers -----
#################################
## How long does a junior spend at the firm? ----
#Theoretical Answer
expTimeTillAbsorb["Junior",]
varTimeTillAbsorb["Junior",]
#Simulated Answer
mean(law_data$law$timeAtFirm)
var(law_data$law$timeAtFirm)
#################################
## What is the probability that a junior lawyer will be promoted to partner before leaving? ----
#Theoretical Answer
(N%*%R)["Junior", "LeaveP"]
#################################
## Average length of time a partner spends with the firm. -----
#Theoretical Answer
N["Partner", "Partner"]
expTimeTillAbsorb["Partner",]
varTimeTillAbsorb["Partner",]
#Simulated Answer
mean(law_data$law$timeAsPartner, na.rm=TRUE)
var(law_data$law$timeAsPartner, na.rm=TRUE)
#################################
300
400
250
200
300
Frequency
Frequency
150
200
100
100
50
0
Years Years
R demo
# Define dataframe
meanTime <- data.frame(firmNum=1:num_reps,
meanTimeAtFirm=NA,
meanTimeAsPartner=NA)
for(k in 1:num_reps){
#Print k after 10% of cohorts
if(k %% (num_reps/10)==0) cat("Firm", k, "\n")
law_data <- lawFirmSim(numLawyers = num_lawyers, maxYears = max_years,
x0 = x_0,
P = P,
S = S,
transientStates = transientStates,
absorbStates = absorbStates, verbose = FALSE)
# Compute mean time at firm and as partner for cohort k
meanTime$meanTimeAtFirm[k] <- mean(law_data$law[,"timeAtFirm"], na.rm=TRUE)
meanTime$meanTimeAsPartner[k] <- mean(law_data$law[,"timeAsPartner"], na.rm=TRUE)
}#END k
Analyse results
##################
# How long does a junior spend at the firm? -----
# Distribution of "Tbar"
hist(meanTime$meanTimeAtFirm, prob=TRUE,
main=substitute(
paste(bar(T), " ~ N(",
mu, "=", m,
", ", sigma^2, "=", s2, ") ",
", n =", n,
" lawyers, n_reps = ", nc),
list(m = expTimeTillAbsorb["Junior",],
s2 = round(varTimeTillAbsorb["Junior",]/num_lawyers, 3),
n = num_lawyers,
nc = num_reps)),
xlab="Years")
0.25
0.20
0.15
Density
0.10
0.05
0.00
14 15 16 17 18 19 20 21
Years
This example has a small number of lawyers and a small number of firms/repetitions.
We will now explore what happens as both of these values increase.
load("STAT600_week11_lawFirm_CLT_sim_100_1000.Rdata")
head(meanTime$meanTimeAtFirm)
200
150
Frequency
100
50
0
10 15 20 25
meanTime$meanTimeAtFirm
hist(meanTime$meanTimeAtFirm, prob=TRUE,
main=paste("Mean time at firm over n =",
num_lawyers, "lawyers, \n (number of repetitions = ",
num_reps, ")"),
xlab="Years", ylim=c(0,0.4), xlim=c(10, 25))
0.2
0.1
0.0
10 15 20 25
Years
hist(meanTime$meanTimeAtFirm, prob=TRUE,
main=paste("Mean time at firm over n =",
num_lawyers, "lawyers, \n (number of repetitions = ",
0.4
0.2
0.0
10 15 20 25
Years
Let Ti denote the time a newly appointed junior spends with the firm.
Define T̄n = n1 ni=1 Ti as the mean time that a group of n newly appointed juniors spend
P
with the firm.
• For a sample of size n = 100 employees, the sample mean time at the firm is
approximately normally distributed with:
• For a sample of size n = 1000 employees, the sample mean time at the firm is
approximately normally distributed with:
As n → ∞, VAR[T̄ ] decreases.
load("STAT600_week11_lawFirm_CLT_sim_100_1000.Rdata")
0.2
0.1
0.0
10 15 20 25
Years
Let X and Y be two discrete random variables. Then, the joint probability
mass function of two X and Y is:
p(x, y) = P (X = x, Y = y)
p(x, y)
P (X = x|Y = y) = , for all y such that pY (y) > 0
pY (y)
p(x, y)
P (Y = y|X = x) = , for all x such that pX (x) > 0
pX (x)
Example 11.8
A manufacturing company in concerned about the number of defects on its production
lines. Let:
• Let X denote the number of defects per hour on production line 1
• Let Y denote the number of defects per hour on production line 2
The two production lines operate independently, so it can be assumed that the random
variables X and Y are independent.
x 0 1 2
p(x) 0.5 0.3 0.2
F (y) = P (Y ≤ y) 0.9 1
p(x, y) 0 1
0 0.45 0.05
1 0.27 0.03
2 0.18 0.02
2. What is the probability that both production lines produce exactly 1 defect each?
4. What is the probability that there are 2 defects on line 1 given that there is 1 defect
on line 2?
Let X and Y be two random variables. Then, the joint distribution function
of X and Y is:
P ((X ≤ x) ∧ (Y ≤ y)) = F (x, y)
Let X and Y be two jointly continuous random variables with joint cdf F and
joint pdf f , then X and Y are continuous random variables with marginal pdfs:
Z ∞
fX (x) = f (x, y)dy, x∈R
−∞
Z ∞
fY (y) = f (x, y)dx, y∈R
−∞
263
Appendix A
Useful Formula
264
Mathematical
n
X n
Binomial Theorem: ax bn−x = (a + b)n , n ∈ Z+
x=0
x
n
X n x
a = (1 + a)n , n ∈ Z+
x=0
x
∞
X xn
Exponential Series: = ex
n=0
n!
n−1
X a(1 − rn )
Geometric Series: arx = , for r 6= 1
x=0
(1 − r)
∞
X a
Infinite Geometric Series: arx = , for |r| ≤ 1
x=0
(1 − r)
n
X n[2a + (n − 1)b]
Arithmetic Series: a + (x − 1)b =
x=0
2
∞
X (−1)(n+1) xn
Logarithmic Series: = ln(1 + x), −1 < x ≤ 1
n=1
n
f 00 (a)
Taylor’s Series:f (x) = f (a) + (x − a)f 0 (a) + (x − a)2 + ...
2!
for small|x − a|
Z ∞
Gamma function: Γ(α) = y α−1 e−y dy
0
Γ(1) = 1, Γ(α) = (α − 1)Γ(α − 1) for α > 1
Γ(n) = (n − 1)! for any positive integer n
Z
d n
(x ) = nxn−1 a dx = ax + C
dx
d axn+1
Z
(f (x) · g(x)) = f 0 (x)g(x) + f (x)g 0 (x) axn dx = + C, n 6= −1
dx n+1
f 0 (x)g(x) − f (x)g 0 (x)
Z
d f (x) 1
= dx = ln |x| + C
dx g(x) g(x)2 x
Z Z Z
d
(f (g(x))) = f 0 (g(x))g 0 (x) (f (x) + g(x))dx = f (x)dx + g(x)dx + C
dx
Z Z Z
d 1
(ln x) = (f (x) − g(x))dx = f (x)dx − g(x)dx + C
dx x
g 0 (x)
Z
d
(ln(g(x))) = ex dx = ex + C
dx g(x)
d x ef (x)
Z
(e ) = ex ef (x) dx = +C
dx f 0 (x)
d f (x)
Z Z
e = f 0 (x)ef (x) u dv = uv − vdu + C
dx
Z Z Z Z
0
u v dx = u v dx − u v dx dx + C
P (A ∩ B)
P (A|B) =
P (B)
P (A|B)P (B)
P (B|A) =
P (A|B)P (B) + P (A|B 0 )P (B 0 )
Related results:
VAR[X] = E[X 2 ] − µ2
n
X
• If Xi , i = 1, . . . , n are independent random variables and W = ai Xi , then:
i=1
n
X n
X
E[W ] = ai E[Xi ] and VAR[W ] = a2i VAR[Xi ]
i=1 i=1
STAT600
discrete X x E[X 2 ] − (E[X]) , where
P (X = k) X
random n n
E[X ] = x p(x)
k=−∞
variable X x
n
Binomial X ∼ Bin(n, p) P (X = x) = x px (1 − p)n−x , x = 0, 1, . . . , n, where E[X] = np VAR[X] = np(1 − p)
n n!
n = 1, 2, . . ., x = x!(n−x)!
0≤p≤1
1−p 1−p
Geometric X∼ P (X = x) = p(1 − p)x , x = 0, 1, 2, . . . F (x) = P (X ≤ E[X] = p VAR[X] = p2
Geometric(p), x) = 1−(1−p)x+1 ,
0≤p≤1 x = 0, 1, 2, . . .
r(1−p) r(1−p)
Negative X∼ E[X] = p VAR[X] = p2
Binomial Neg Bin(r, p), (
x+r−1
r−1 if x = 0, 1, . . .
P (X = x) =
0 ≤ p ≤ 1, r > 0 pr (1 − p)x
0 otherwise
r r(1−p)
Y ∼ E[Y ] = p VAR[Y ] = p2
Neg Bin(r, p), (
y−1
r−1 if y = r, r + 1, . . .
P (Y = y) =
0 ≤ p ≤ 1, r > 0 pr (1 − p)y−r
0 otherwise
k
Hyper- E[X] = n VAR[X]
N
X∼ =
geometric Hypergeometric, k
N −k
k k
n
N −n
N = 0, 1, . . ., x n−x N N
1−
for x =0, 1, . . . , k;
N
N −1
k = 0, 1, . . . , N , P (X = x) = n
with ab = 0 if a > b
n = 0, 1, . . . , N
0 otherwise
λk −λ
Poisson P (X = k) = e k = 0, 1, 2, . . . E[X] = λ VAR[X] = λ
k!
X ∼ Pois(λ)
Page 268
Distribution Parameters Probability Density Function Cumulative Expected Value Variance
Distribution
Function
R∞
STAT600
Any f (x) E[X] = −∞ xf (x)dx
x 2
F (x) = P (X ≤ x) =
Z VAR[X] = E[(X − E[X])2 ] =
continuous Special Case where
f (x)dx ∞
E[X 2 ] − (E[X])
Z
random −∞ Non-negative
R ∞ RV: E[X n ] = xn p(x)
variable X E[X] = 0 P (X > −∞
x)dx
α
Beta E[X] = VAR[X] =
α+β
X ∼ Beta(α, β),
α > 0, β > 0, ( αβ
Γ(α+β) α−1
Γ(α)Γ(β) x (1 − x)β−1 0≤x≤1 (α + β)2 (α + β + 1)
f (x) =
0 elsewhere
Exponential X ∼ Exp(λ), λ > 0 f (x) = λe−λx , x > 0 F (X) = 1 − e−λx , E[X] = 1/λ VAR[X] = 1/λ2
x≥0
xα−1 λα e−λx
Gamma X ∼ Gamma(α, λ), f (x) = Γ(α) , x ≥ 0; E[X] = α/λ VAR[X] = α/λ2
α, λ > 0
1 x−µ 2
Normal fX (x) = √ 1 e− 2 ( σ ) , E[X] = µ, VAR[X] = σ 2
2πσ 2
X ∼ N (µ, σ 2 ), x ∈ R,
σ>0
(
1
d−c , x−c (c+d) (d−c)2
Uniform f (x) = F (x) = E[X] = VAR[X] =
c≤x≤d
X ∼ Unif(c, d) d−c , c≤x≤d 2 12
0, otherwise.
α
α x
( α−1
θ θ e−(x/θ) 1
Weibull f (x) = F E[X] = θΓ 1 + VAR[X] =
x≥0
X ∼ Weibull(α, θ) ((x; θ, α) = α α
α > 0, θ > 0 0 x<0 2
2
1 − e−(x/θ) x≥0 θ Γ 1 + α2 − Γ 1 + α1
0 x<0
(λx)k−1
Erlang X ∼ Erlang(k, λ) f (x) = λe−λx F (x) = P (X ≤ x) = E(X) = k/λ VAR(X) = k/λ2
(k − 1)! k−1
x ≥ 0; k = 1, 2, . . .; X (λx)i
λ>0 e−λx
i!
1−
i=0
Page 269
Appendix B
Useful R commands
Examples
2+2
## [1] 4
(1+4)*(3-2)
## [1] 5
10^6
## [1] 1000000
1/10000
## [1] 0.0001
270
B.3 Assignments
y <- 2 y becomes 2
2 -> y 2 goes to y
y = 2 y becomes 2
B.4 Sequences
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
700:690
## [1] 700 699 698 697 696 695 694 693 692 691 690
seq(1, 7, by=0.5)
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
Examples
2>5
## [1] FALSE
## [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
## [10] TRUE
x <- 1:10
table(x < 5)
##
## FALSE TRUE
## 6 4
which(x > 5)
## [1] TRUE
Examples:
cars
iris
For a vector x:
x[n] returns the nth element.
x[-n] returns a vector with the nth element removed.
x[n:m] returns a vector consisting of elements in the range n to m.
To place a group of elements into a matrix x, use the matrix function to create an n x m
matrix:
matrix(x, nr = n, nc = m, byrow=FALSE)
For a matrix x:
To create a dataframe:
mydf = data.frame(name1=x, name2=y, name3=z)
Creates a dataframe from vectors x, y, z
mydf$name1 returns the nth element.
mydf[,"name"] returns a vector with the nth element re-
moved.
Examples
x[2]
## [1] 0.33
indices =c(3,2,2)
x[indices]
Examples
sqrt(3)
## [1] 1.7321
sqrt(1:10)
B.9 Combinatorics
factorial(x) x!
sample(x, k, replace=FALSE) Takes a sample of size k from set x without replacement
Binomial coefficient of nk = (n−k)!
n!
choose(n,k) k!
replicate(n, x) Repeats expression x n times
combn(x, m) Returns the combinations of the elements of x of size m
Examples
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 1 1 1 2 2 2 3 3 4
## [2,] 2 3 4 5 3 4 5 4 5 5
## [1] 10
# Simulate
sample(vowels, size=r)
sample(vowels, size=r)
#install if necessary
#install.packages('gtools')
#load library
library(gtools)
combinations(n, r, v=vowels, set=TRUE, repeats.allowed=FALSE)
## [,1] [,2]
## [1,] "a" "e"
## [2,] "a" "i"
## [3,] "a" "o"
## [4,] "a" "u"
## [5,] "e" "i"
## [6,] "e" "o"
## [7,] "e" "u"
## [8,] "i" "o"
## [9,] "i" "u"
## [10,] "o" "u"
## [,1] [,2]
## [1,] "a" "e"
## [2,] "a" "i"
## [3,] "a" "o"
## [4,] "a" "u"
## [5,] "e" "a"
## [6,] "e" "i"
## [7,] "e" "o"
## [8,] "e" "u"
## [9,] "i" "a"
## [10,] "i" "e"
## [11,] "i" "o"
# Number of permutations
factorial(n)/factorial(n-r)
## [1] 20
Examples
rnorm(10) sample of 10 from the standard normal distribution with
mean=0, std deviation=1
rnorm(10,5,2) sample of 10 from the normal distribution with mean=5,
standard deviation=2
Examples
plot(cars)
plot(cars[,1], cars[,2])
x = seq(-8, 8, by=0.1)
plot(x, dnorm(x, mean=0, sd=2))
0.20
120
120
100
100
0.15
dnorm(x, mean = 0, sd = 2)
80
80
cars[, 2]
0.10
dist
60
60
40
40
0.05
20
20
0.00
0
5 10 15 20 25 5 10 15 20 25 −5 0 5
speed cars[, 1] x
if(logical-statement){
#do something
}else{
#do something else
}
Loops
for(i in 1:n){
#do something
}
while(logical-statement){
#do something
}
Examples
x <- 2
y <- 10
if(x > y){
z <- x + y
}else{
z <- x*y
}
z
## [1] 20
for(i in 1:10){
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 1
## [1] 1 2
## [1] 1 2 3
## [1] 1 2 3 4
## [1] 1 2 3 4 5
## [1] 1 2 3 4 5 6
## [1] 1 2 3 4 5 6 7
## [1] 1 2 3 4 5 6 7 8
## [1] 1 2 3 4 5 6 7 8 9
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] "o"
## [1] "a"
## [1] "i"
## [1] "i"
Custom functions
#Define function
fun <- function(x){
# do something
}
#Call function
fun(x)
Examples
getBMI(2.05, 80)
## [1] 19.036
my.mean(c(1,2,3))
## [1] 2
1.7:
D: Outcomes with a head first D = {(H, H), (H, T )}
E: Outcomes with exactly one tail, E = {(H, T ), (T, H)}
1.8:
G: Outcomes which include an even number
G = {(2, H), (2, T ), (4, H), (4, T ), (6, H), (6, T )}
H : Outcomes which include exactly one tail,
H = {(1, T ), (2, T ), (3, T ), (4, T ), (5, T ), (6, T )}
1.9: G ∪ H = {(2, H), (2, T ), (4, H), (4, T ), (6, H), (6, T ), (1, T ), (3, T ), (5, T ), }
1.10: Gc = {(1, H), (1, T ), (3, H), (3, T ), (5, H), (5, T )}
1.11: G ∩ H = {(2, T ), (4, T ), (6, T )}
1.14:
A : primary system is operational, A = {ooo, oon, ono, onn}
B : first generator is operational, B = {ooo, oon, noo, non}
C : second generator is operational, C = {ooo, ono, noo, nno}
1.14:
• Primary or first generator are operational, A ∪ B = {ooo, oon, ono, onn, noo, non}
• At least one of the systems is operational, A∪B∪C = {ooo, oon, ono, onn, noo, non, nno}
281
1.27: 60
1.28: 4200
1.28: 1/120
1.29: 0.399123
2.3: a. 0.624615
2.3: b. 0.172549
2.4: 2/3
2.5: b. 0.005; c. 0.02485
2.7:
• Sensitivity: 0.99
• Specificity: 0.98
• Prevalence: 0.5%
0 0.2 0 0.8
8.6: c. 0.523
9.2:
0.9 0.1
P =
0.2 0.8
9.10: Company 1 should hire the advertising agency as the profit with the agency is $36,
600, 000 compared with $34, 666, 667 without the agency .
10.1: S = {1, 2} where state 1 = $10 and state 2 = $25.
0.9 0.1
P =
0.15 0.85
10.5:
Junior Senior Partner Leave as NP Leave as P
Junior .80 .15 0 .05 0
Senior
0 .70 .20 .10 0
P = Partner
0 0 .95 0 .05
Leave as NP 0 0 0 1 0
Leave as P 0 0 0 0 1
10
10.4.1. ?.b: P0,0 = 0.20066
10.8:
0 1 2 3 4
0 0 0 1/3 1/3 1/3
1
0 0 1/3 1/3 1/3
P = 2
1/3 1/3 1/3 0 0
3 0 1/3 1/3 1/3 0
4 0 0 1/3 1/3 1/3
11.3: 2. 0.25
11.3: 3. 0.2875
11.3: 4. 13, 3.866523
11.4: 0.97062
11.8: 0.03
11.8: 0.23
11.8: 0.2
[Ros02] S. Ross. Probability Models for Computer Science. Harcourt Academic Press,
San Diego, CA, 11 edition, 2002.
[Ros13] S. Ross. First Course in Probability. Pearson Higher Education, USA, 9 edition,
2013.
[SY10] R.L. Scheaffer and L.J. Young. Introduction to Probability and its Applications.
Brooks/Cole, Boston MA, 3 edition, 2010.
285