STAT600 Notes Student

Auckland University of Technology
School of Engineering, Computer and

Mathematical Sciences
STAT600 Probability
Semester 2 2021
STAT600: Probability
2021 Study Guide
This is a second year undergraduate course on probability offered by the School of Engineering,
Computer and Mathematical Sciences at AUT.
Prerequisites
To enrol in this course, you need to have successfully completed STAT500 (Applied Statistics),
MATH501 (Differential and Integral Calculus) and COMP500 or ENSE501 (Programming 1)
or equivalent. If you have not completed these papers, you should contact the lecturer.
Lecturer
Dr Robin Hankin, WT Level 1, ext. 5106, email: robin.hankin@aut.ac.nz
Class Times
Lecture: Monday 4:10pm - 6:00pm, WB327
Lab: Wednesday 4:10pm - 6:00pm, WZ519
Office Hours
Students are very welcome to discuss questions and issues regarding the course with their
lecturer. Office hours will be posted on Blackboard.
Learning Hours
STAT600 is a 15 point paper and this corresponds to 150 learning hours across the semester.
Learning Activity Hours

Lectures 24
Labs 24
Self-directed 102
Total 150
Self directed learning includes reading textbook, revising lecture notes and lab exercises, prac-
tising exercises, and completing assessments.
Course Outline
Week Topic
1–2 Introduction to Probability
3–4 Discrete Random Variables
5–6 Continuous Random Variables
7 Reliability
Mid semester break
8 – 10 Markov Chains
11 Further Properties of Random Variables
12 Revision
1
STAT600: Probability 2021 Study Guide Page 2
Assessment
Assessment Weighting Hand Out Hand In
Assignment 1 25 week 4 week 7
Assignment 2 25 week 8 week 11
Exam 50 N/A Exam period
Total 100
*Exact due dates will be confirmed when assignments are handed out.
Late Assignments: Late assignments, without an approved extension, will be subject to a
deduction of 5% (one grade e.g. from C+ to C) of the total mark available for each 24-hour
period, or part thereof, up to a maximum of five calendar days. Assignments over five days late
will not normally be accepted or marked and students will receive a DNC (Did Not Complete)
for that assessment. Note: this policy does not apply to quizzes, the mid-semester test or the
exam.
Extenuating Circumstances: You may apply for special consideration for assessment events
when exceptional circumstances beyond your control, including illness or injury, seriously
affect your physical or mental/emotional ability to: attempt an assessment, or prepare for an
assessment, or perform successfully during an assessment, or complete an assessment on or
by the due date. To apply for special consideration you should complete the online special
consideration form via Blackboard.
Academic Integrity & Plagiarism: Students who are found to have plagiarised in this
paper will be treated very seriously and may be subject to academic disciplinary procedures.
For more information see the AUT Calendar and the faculty assessment polices and regulations
(available on Blackboard under “Assessment”). Students should complete the “Academic
Integrity” module on Blackboard before submitting their first assignment.
Blackboard:
Students are encouraged to regularly check the course website on AUT online at http://autonline.aut.ac.nz/.
This web site contains class announcements, discussion forums, assignment information, class
resources, as well as updated class marks.
Software
This paper will use open-source programming language R. R is installed in the School’s computer
labs. We will also introduce you to R studio which provides a nice interface for working with
R. Both are available free of charge.
http://cran.r-project.org/ https://www.rstudio.com/products/rstudio-desktop/
Reference books
Readings will be selected from books such as:
• Higgins and Keller-McNulty (1995) Concepts in Probability and Stochastic Modelling. Bel-
mont, CA, Wadsworth
• Ross, S. (2014) First Course in Probability (9th ed). Harlow: Pearson Education Ltd.
• Ross, S. (2014). Introduction to Probability Models (11e). Boston, MA: Academic Press.
• Scheaffer, R.L. and Young, L.J. (2010) Introduction to Probability and Its Applications
(3rd ed). Boston: Cengage Learning.
Contents
Study Guide 2
Contents 4
1 Introduction to Probability 10
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Why Study Probability? . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 Deterministic Models . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.3 Applications of Probability . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Sample Space & Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Sample Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Events of an Experiment . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.3 Set Operators & Notation . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Foundations of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.1 Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.2 Probability of Events . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.3 Equally Likely Outcomes . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Counting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.1 Basic Principle of Counting . . . . . . . . . . . . . . . . . . . . . 21
1.5.2 Generalised Basic Principle of Counting . . . . . . . . . . . . . . 21
1.5.3 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.4 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.5.5 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5.6 Summary of Counting Rules . . . . . . . . . . . . . . . . . . . . . 29
1.5.7 Application to Probability - Examples . . . . . . . . . . . . . . . 29
2 Conditional Probability and Independence 31

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.1 Conditional Probability Definition . . . . . . . . . . . . . . . . . . 34
2.2.2 Conditioning on an Event . . . . . . . . . . . . . . . . . . . . . . 35
2.2.3 Types of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.4 Is Conditional Probability a Probability? . . . . . . . . . . . . . . 40
2.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.1 Independent Events . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4
2.3.2 Multiplicative Rule of Probability . . . . . . . . . . . . . . . . . . 43
2.4 Bayes’ Theorem and the Law of Total Probability . . . . . . . . . . . . . 43
2.4.1 Law of Total Probability . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.2 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5 Odds, Odds Ratios and Relative Risk . . . . . . . . . . . . . . . . . . . . 45
3 Discrete Random Variables 46

3.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.1 Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . 48
3.2.2 Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3 Expected Value and Moments . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.1 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.2 Expected Value of a Function . . . . . . . . . . . . . . . . . . . . 54
3.3.3 Variance and Standard Deviation . . . . . . . . . . . . . . . . . . 55
3.3.4 Functions of a Random Variable . . . . . . . . . . . . . . . . . . . 57
3.3.5 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.6 Example: Production Line . . . . . . . . . . . . . . . . . . . . . . 60
4 Discrete Distributions 64
4.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3.2 Binomial Distribution in R . . . . . . . . . . . . . . . . . . . . . . 67
4.3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.2 Geometric Distribution in R . . . . . . . . . . . . . . . . . . . . . 69
4.4.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.4 Properties of the Geometric Distribution . . . . . . . . . . . . . . 74
4.4.5 Alternative Parameterization . . . . . . . . . . . . . . . . . . . . 75
4.5 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.2 Negative Binomial Distribution in R . . . . . . . . . . . . . . . . 76
4.5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5.4 Properties of the Negative Binomial Distribution . . . . . . . . . 78
4.6 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6.2 Poisson Distribution in R . . . . . . . . . . . . . . . . . . . . . . 79
4.6.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.6.4 Properties of the Poisson Distribution . . . . . . . . . . . . . . . . 82
4.7 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
STAT600 Page 5
4.7.2 Hypergeometric Distribution in R . . . . . . . . . . . . . . . . . . 87
4.7.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.8 Distributions in R: Summary . . . . . . . . . . . . . . . . . . . . . . . . 89
4.9 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.9.1 Simulating a Discrete Distribution . . . . . . . . . . . . . . . . . . 90
4.10 Activity: Monty Hall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5 Continuous Random Variables 94

5.1 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 95
5.1.1 Relative Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.1.2 Probability Density Function . . . . . . . . . . . . . . . . . . . . 97
5.1.3 Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.4 Properties of Continuous Random Variables . . . . . . . . . . . . 101
5.2 Expected Value and Variance . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.1 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2.4 Expected Value of a Function . . . . . . . . . . . . . . . . . . . . 105
6 Continuous Distributions 110

6.1 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.1.2 Uniform Distribution in R . . . . . . . . . . . . . . . . . . . . . . 112
6.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.1.4 Properties of the Uniform Distribution . . . . . . . . . . . . . . . 113
6.2 Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2.2 Exponential Distribution in R . . . . . . . . . . . . . . . . . . . . 115
6.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.2 Normal Distribution in R . . . . . . . . . . . . . . . . . . . . . . 120
6.3.3 Relationship between Normal and Standard Normal . . . . . . . . 120
6.3.4 Properties of the Normal Distribution . . . . . . . . . . . . . . . . 122
6.3.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.4 Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.4.1 Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.4.2 Definition - Gamma Distribution . . . . . . . . . . . . . . . . . . 126
6.4.3 Gamma Distribution in R . . . . . . . . . . . . . . . . . . . . . . 127
6.4.4 Properties of the Gamma Distribution . . . . . . . . . . . . . . . 127
6.5 Weibull distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.5.2 Weibull Distribution in R . . . . . . . . . . . . . . . . . . . . . . 136
6.5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
STAT600 Page 6
6.5.4 Properties of the Weibull Distribution . . . . . . . . . . . . . . . 138
6.6 Beta distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.6.2 Beta Distribution in R . . . . . . . . . . . . . . . . . . . . . . . . 141
6.6.3 Properties of the Beta distribution . . . . . . . . . . . . . . . . . 141
6.6.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.7 Distributions in R: Summary . . . . . . . . . . . . . . . . . . . . . . . . 143
6.8 Simulating Continuous Distributions . . . . . . . . . . . . . . . . . . . . 144
6.8.1 Inverse transformation method . . . . . . . . . . . . . . . . . . . 144
6.8.2 Rejection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7 Reliability
An Application of Continuous Distributions 146
7.1 Introduction to Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.1.1 Reliability Function . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.2 Mean Time to Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.2.2 Integration: Recap . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.2.3 Mean Time to Failure for Common Distributions . . . . . . . . . 149
7.2.4 Repairable vs Non Repairable Systems . . . . . . . . . . . . . . . 150
7.3 Modelling Reliability of Systems . . . . . . . . . . . . . . . . . . . . . . . 151
7.3.1 Series Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.3.2 Parallel Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.3.3 Complex Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
8 Introduction to Markov Chains 164

8.1 A Simple Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.1.1 Game Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.1.2 Game Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.1.3 Playing the Game . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.1.4 Probability of Outcomes . . . . . . . . . . . . . . . . . . . . . . . 167
8.2 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.2.1 Introduction to Stochastic Processes . . . . . . . . . . . . . . . . 168
8.2.2 Classification of Stochastic Processes . . . . . . . . . . . . . . . . 168
8.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
8.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
8.3.2 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 169
8.3.3 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.4 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
8.4.1 Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
8.4.2 Markov Chains: Definition . . . . . . . . . . . . . . . . . . . . . . 176
8.4.3 Applications of Markov Chains . . . . . . . . . . . . . . . . . . . 177
8.4.4 Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . . . 177
8.4.5 n-Step Transition Probabilities . . . . . . . . . . . . . . . . . . . 181
8.4.6 Chapman-Kolmogorov Equations . . . . . . . . . . . . . . . . . . 182
STAT600 Page 7
9 Classification of Markov Chains 187
9.1 State Transition Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 188
9.2 Computing Probabilities for Markov Chains . . . . . . . . . . . . . . . . 189
9.2.1 Initial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.3 Classification of States . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.3.1 Accessibility, Communication and Irreducibility . . . . . . . . . . 195
9.3.2 Absorbing, Transient, Recurrent States . . . . . . . . . . . . . . . 197
9.3.3 Periodic States and Ergodic Chains . . . . . . . . . . . . . . . . . 199
9.4 Steady State Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
9.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
9.4.2 Finding the Steady State Probabilities . . . . . . . . . . . . . . . 200
9.4.3 Application of state probabilities . . . . . . . . . . . . . . . . . . 202
9.4.4 Using linear algebra to find steady state probabilities . . . . . . . 203
9.4.5 Using R to find steady state probabilities . . . . . . . . . . . . . . 203
9.5 Simulating a Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . 205
9.5.1 Simulating a Markov Chain By Hand . . . . . . . . . . . . . . . . 205
9.5.2 Simulating a Markov Chain Using R . . . . . . . . . . . . . . . . 207
9.5.3 Estimating a transition matrix . . . . . . . . . . . . . . . . . . . . 211
10 Properties and Applications of Markov Chains 215

10.1 Example: Stock Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
10.2 Mean First Passage Times . . . . . . . . . . . . . . . . . . . . . . . . . . 218
10.3 Absorbing Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
10.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
10.3.2 Classification of States . . . . . . . . . . . . . . . . . . . . . . . . 220
10.3.3 Properties of Absorbing Chains . . . . . . . . . . . . . . . . . . . 222
10.3.4 Simulating an Absorbing Markov Chain Using R . . . . . . . . . . 229
10.4 Applications of Markov Chains . . . . . . . . . . . . . . . . . . . . . . . 232
10.4.1 Simple Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . 232
10.4.2 Inventory Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
11 Further Properties of Random Variables 235

11.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.1.1 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.1.2 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . 237
11.2 Sum of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 240
11.2.1 Sum of Independent Random Variables . . . . . . . . . . . . . . . 240
11.2.2 Sum of Independent Normal Random Variables . . . . . . . . . . 241
11.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
11.3.1 Central Limit Theorem for Means and Sums . . . . . . . . . . . . 244
11.3.2 Normal Approximation of Binomial Distribution . . . . . . . . . . 245
11.4 CLT and Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
11.5 Bivariate distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
11.5.1 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . 259
11.5.2 Continuous random variables . . . . . . . . . . . . . . . . . . . . 263
STAT600 Page 8
A Useful Formula 264
A.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
B Useful R commands 270

B.1 Opening R on SECMS computers . . . . . . . . . . . . . . . . . . . . . . 270
B.2 Using R as a Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
B.3 Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
B.4 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
B.5 Logical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
B.6 In-built datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
B.7 Creating data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
B.8 Statistical Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
B.9 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
B.10 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 276
B.11 Creating plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
B.12 Control Statements & Functions . . . . . . . . . . . . . . . . . . . . . . . 278
B.13 Reading and Writing Data . . . . . . . . . . . . . . . . . . . . . . . . . 280
C Solutions to Selected Examples 281
Bibliography 285
Notes compiled by Dr Sarah Marshall: July 1, 2021
STAT600 Page 9
Chapter 1
Introduction to Probability
References:
[Ros13, chapter 1 & 2] [SY10, Chapter 1 & 2]
10
1.1 Introduction
Exercise: Birthdays
Does anyone in this class have the same birthday?
What is the probability that 2 people in this class have the same birthday?
1.2 Why Study Probability?
Discussion What decisions have you made today? What factors did you take
into account when making these decisions?
We live in an Information Society – we must use the information presented to us to make

intelligent decisions.
• Theory vs Reality
• “All models are wrong, but some are useful” (George Box, 1979)
• Deterministic Model vs Probabilistic Models
1.2.1 Deterministic Models

Example 1.1 Lake Yeak Laom, Cambodia has a diameter of 0.72 km. What is the
surface area of this lake?
STAT600 Page 11
1.2.2 Probabilistic Models
Example 1.2 Suppose we are going to flip a coin. What will the outcome be?
We do not know for certain, but we can make a statement like, in the long run 1/2 the
outcomes will be heads.
Example 1.3 How many bottles of milk will Newsfeed Cafe use today?
We do not know for certain, but with some data we can make similar statements regarding
the probability of particular outcomes.
1.2.3 Applications of Probability

• Credit Risk Assessment – should person X be given a loan?
• Maintenance Planning – should a part be replaced routinely or when it fails?
• Population Modelling – will a population become extinct?
• Disease Control – what will the impact be of an outbreak of a pandemic disease?
• Customer Service – how long will it take to serve a customer?
• Marketing – will NetFlix customer X watch “Designated Survivor” given that they
have watched “Suits”?
Two uses of probability models
• Use a given probabilistic model to gain insight
• Determine whether a given probabilistic model is correct, based on data (Statistical

Inference)
1.3 Sample Space & Events

1.3.1 Sample Space
An experiment is any process that produces an observation or outcome.

The set of all possible outcomes of an experiment is called the sample space.
[Ros13, p 146]
The notation n(A) represents cardinality, i.e., the number of outcomes, in

a set A.
STAT600 Page 12
Example 1.4
Experiment Sample Space S Cardinality n(S)
Rolling a Die S = {1, 2, 3, 4, 5, 6} 6
Rolling Two Dice S = {(i, j) : i, j = 1, 2, 3, 4, 5, 6} 36
Race position of 7 S = {the 7! permuations of (1, 2, 3, 4, 5, 6, 7)} 7! = 5040

horses
Lifetime of battery S = {x : 0 < x < ∞} ∞
(in hours)
1.3.2 Events of an Experiment
Any subset E of the sample space is known as an event. [Ros13, p 24]
An event can also be described as a set of outcomes of an experiment. Events are usually
denoted by uppercase letters: A, B, C . . . An event A occurs whenever the outcome of an
experiment is contained in A.
Example 1.5 Suppose we define an experiment which consists of rolling a six-sided die.
We can define the following events:
A : Even outcomes A = {2, 4, 6} n(A) = 3
B : Outcomes less than or equal to 3, B = {1, 2, 3} n(B) = 3
C : Outcome is greater than 8, C=∅ n(C) = 0
Event C cannot happen and is referred to as the null event or empty set and is denoted
by ∅.
1.3.3 Set Operators & Notation

Events and sample spaces are sets, therefore the following set notation and related
concepts will be useful.
The notation a ∈ A means that a is an element in set A and similarly, the

notation b ∈
/ A means that b is not an element in set A.
To explain the following techniques, we will consider three examples.
Example 1.6
Experiment: Roll a 6-sided die and observe the number on the uppermost face.
The sample space is S = {1, 2, 3, 4, 5, 6}.
STAT600 Page 13
Define the following sets:
A : Even outcomes A = {2, 4, 6}
B : Outcomes less than or equal to 3, B = {1, 2, 3}
C : Outcomes containing 4 or 6, C = {4, 6}
Example 1.7
Experiment: Flip two coins and observe the outcomes.
The sample space is S = {(H, H), (H, T ), (T, H), (T, T )}.
D : Outcomes with a head first D=
E : Outcomes with exactly one tail, E =
Example 1.8
Experiment: Roll a 6-sided die and observe the number on the uppermost face and then
flip a coin and observe the outcome.
The sample space is S = {(1, H), (2, H), (3, H), (4, H), (5, H), (6, H),
(1, T ), (2, T ), (3, T ), (4, T ), (5, T ), (6, T )}.
G : Outcomes which include an even number G =
H : Outcomes which include exactly one tail, H =
The union of two events A and B, denoted A ∪ B, is the set of outcomes

that are in A or B or both. Or equivalently, A ∪ B = {s ∈ S : s ∈ A or s ∈ B}
Example 1.9
• A ∪ B = {1, 2, 3, 4, 6}
• D ∪ E = {(H, H), (H, T ), (T, H)}
• G∪H =
STAT600 Page 14
The complement of event A, denoted Ac (sometimes Ā or A0 ), is the
set consisting of all outcomes in the sample space which are not in A. Or
equivalently, Ac = {s ∈ S : s ∈
/ A}
Example 1.10
• Ac = {1, 3, 5}
• Dc = {(T, H), (T, T )}
• Gc =
The intersection of two events A and B, denoted A ∩ B or AB, is the set

of outcomes that are in both A and B. Or equivalently, A ∩ B = {s ∈ S : s ∈
A and s ∈ B}
Example 1.11
• A ∩ B = AB = {2}
• D ∩ E = DE = {(H, T )}
• G∩H =
If two events are disjoint or mutually exclusive, i.e., they have no elements
in common, then their intersection is called the empty set or null event and is
denoted with the symbol ∅.
Since an event A and its complement Ac are mutually exclusive, then A ∩ Ac = ∅

Example 1.12
• D = {(H, H), (H, T )}
• F = {(T, T )}
• D ∩ F = DF = ∅
STAT600 Page 15
When there are more than two sets, the union and intersections can be defined as follows:
∞
[
If E1 , E2 , . . . are events, then the union of these events is denoted En ,
n=1
and is defined to be the event that consists of all outcomes that are in En for at
least one value of n = 1, 2, . . ..
∞
\
If E1 , E2 , . . . are events, then the intersection of these events is denoted En ,
n=1
and is defined to be the event that consists of all outcomes that are in all of the
events En for n = 1, 2, . . ..
[Ros13, p 25]
The set operators (union, intersection and complement) have the following
properties.
Commutative laws: A ∩ B = B ∩ A
A∪B =B∪A
Associative laws: (A ∩ B) ∩ C = A ∩ (B ∩ C)
(A ∪ B) ∪ C = A ∪ (B ∪ C)
Distributive laws: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
De Morgan’s laws: (A ∪ B)c = Ac ∩ B c
c
(A ∩ B)! = Ac ∪ B c
n c n
[ \
General case: Ei = Eic
i=1 !c i=1
\n [n
Ei = Eic
i=1 i=1
Example 1.13
• A ∩ B = {2}
• A ∩ C = {4, 6}
• A ∪ B = {1, 2, 3, 4, 6}
• B ∪ C = {1, 2, 3, 4, 6}
Distributive Law
• A ∩ (B ∪ C) = {2, 4, 6}
STAT600 Page 16
• (A ∩ B) ∪ (A ∩ C) = {2, 4, 6}
De Morgan’s Law
• (A ∪ B)c = Ac ∩ B c = {5}
• A ∪ B = {1, 2, 3, 4, 6}
• Ac = {1, 3, 5}
• B c = {4, 5, 6}
Example 1.14 The primary power supply to a hospital is provided by the national grid.
If the primary power supply fails (i.e. there is a power cut), then there are two back-up
generators which can provide power to the hospital. Periodically, all three power systems
are tested and each could be found to be operational o or not n.
The sample space for this experiment is:
S = {ooo, oon, ono, onn, noo, non, nno, nnn}
(Adapted from Milton & Young, 2003 pp6)
Define the following events:

A : primary system is operational, A=
B : first generator is operational, B=
C : second generator is operational, C =
Find the following:
• Primary or first generator are operational, A ∪ B =
• Primary and first generator are operational, A ∩ B =
• Primary or first generator are operational, but second is not, (A ∪ B) ∩ C c =
• At least one of the systems is operational, A ∪ B ∪ C =
STAT600 Page 17
1.4 Foundations of Probability
1.4.1 Axioms of Probability
History. Who is Kolmogorov?
Andrey Kolmogorov (1903 - 1987) was a Russian

mathematician. Over his career he made a number
of important contributions to probability as well as
to other areas of mathematics.
Formally, probability must satisfy the following axioms. These axioms were defined by
Kolmogorov so are sometimes referred to as “Kolmogorov’s Axioms”.
Consider an experiment with sample space S. A probability is a numerically

valued function that assigns a number P (E) to every event E so that the
following axioms hold:
1. 0 ≤ P (E) ≤ 1
2. P (S) = 1
3. For any sequence of mutually exclusive events E1 , E2 , . . . (i.e. events for

which Ei Ej = ∅ when i 6= j):
∞
! ∞
[ X
P Ei = P (Ei )
i=1 i=1
([Ros14, p28]; [SY10, p23])
STAT600 Page 18
1.4.2 Probability of Events
Many useful properties are provided by the three axioms of probability. A

few are stated below.
• P (∅) = 0
• 1 = P (S) = P (E ∪E c ) = P (E)+P (E c ) or equivalently, P (E c ) = 1−P (E)
• If E ⊂ F , then P (E) ≤ P (F )
• The inclusion–exclusion principle is stated below for 2 and 3 sets, but

can be extended to n sets.
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩
C) + P (A ∩ B ∩ C)
Example 1.15
Experiment: Roll a 6-sided die and observe the number on the uppermost face.
The sample space is S = {1, 2, 3, 4, 5, 6}. Consider the following events, A, B, C.
A : Even outcomes A = {2, 4, 6}
B : Outcomes less than or equal to 3, B = {1, 2, 3}
C : Outcome is greater than 8, C=∅
Assuming that each side of the die is equally likely, then P ({1}) = P ({2}) = P ({3}) =
P ({4}) = P ({5}) = P ({6}) = 1/6
• P (A) = P ({2, 4, 6}) = P ({2}) + P ({4}) + P ({6}) = 1/2
• P (B) = P ({1, 2, 3}) = P ({1}) + P ({2}) + P ({3}) = 1/2
• P (C) = 0
• P (A ∩ B) = P (AB) = P ({2}) = 1/6
• P (A ∪ B) = P (A) + P (B) − P (AB) = 1/2 + 1/2 − 1/6 = 5/6
Example 1.16 A farmer has decided to plant a new variety of corn on some of his land,
and he has narrowed his choice to one of three varieties, which are numbered 1, 2, and
3. All three varieties have produced good yields in variety trials. Which corn variety
STAT600 Page 19
produces the greatest yield depends on the weather. The optimal conditions for each are
equally likely to occur, and none does poorly when the weather is not optimal.
Being unable to choose, the farmer writes the name of each variety on a piece of paper,
mixes the pieces, and blindly selects one. The variety that is selected is purchased and
planted. Let Ei denote the event that variety i is selected (i = 1, 2, 3), let A denote the
event that variety 2 or 3 is selected, and let B denote the event that variety 3 is not
selected. [SY10, p 26]
Find the probabilities of Ei , A, and B.
1.4.3 Equally Likely Outcomes

The outcomes of an experiment are equally likely if each outcome has the same probability
of occurring. When outcomes are equally likely and the sample space is finite, the
probability of event A can be defined:
number of outcomes in A n(A)

P (A) = =
total number of outcomes in the sample space S n(S)
Example 1.17 Experiment: Roll a 6-sided die and observe the number on the uppermost
face.The sample space is S = {1, 2, 3, 4, 5, 6}. Suppose Event A consists of all even
outcomes, i.e. A = {2, 4, 6}. Then
n(A) 3
P (A) = =
n(S) 6
Example 1.18 Consider a class of 20 university students. Students elect one subject
to major in. Of the 20 students, 10 are studying analytics, 6 are studying maths and 4
are studying engineering. A student is chosen at random from this class. What is the
probability that the student is studying engineering?
This formula is fundamental in probability. In some cases counting the number of

outcomes associated with a particular event cannot be performed by simply listing the
outcomes and counting them. In the next section we will examine some “counting rules”.
STAT600 Page 20
1.5 Counting Rules
1.5.1 Basic Principle of Counting
The basic principle of counting states that if experiment 1 has n1

outcomes and experiment 2 has n2 outcomes, then together there are n1 n2
outcomes of the two experiments.
Example 1.19 Suppose a retail chain has decided to build one new store in the North
Island and one in the South Island. There are 4 possible locations for the North Island
store (Auckland, Tauranga, Hamilton, Wellington) and 2 possible locations for the South
Island store (Christchurch, Dunedin). How many possible locations are there for the new
stores?
1.5.2 Generalised Basic Principle of Counting
The generalised principle of counting states that if experiment 1 has n1

outcomes, experiment 2 has n2 outcomes, . . . , and experiment r has nr outcomes,
. . . , then together there are n1 · n2 · · · nr outcomes of the r experiments.
Example 1.20 Suppose a committee has 2 representatives from Engineering, 3 from

Applied Mathematics, 5 from Analytics and 4 from Computer Science. A subcommittee
must be formed consisting of one representative from each department. How many
different subcommittees are possible?
STAT600 Page 21
1.5.3 Permutations
The number of ordered arrangements or permutations of n distinct objects

is:
n! = n(n − 1)(n − 2) · · · 3 · 2 · 1
Example 1.21 Consider the letters a, b, c. How many different ordered arrangements
of these letters are possible?
The number of ordered arrangements or permutations of r objects selected

from n distinct objects (r ≤ n) is:
n!
Prn = n(n − 1) · · · (n − r + 1) =
(n − r)!
Example 1.22 How many distinct two letter arrangements can be made from the letters
a, b, c, d, e, if each letter can only be selected once?
Using R to Generate Permutations

R can be used to generate and to count permutations.
# Compute number of permuations

factorial(5)/factorial(5-2)
## [1] 20
choose(5, 2)*factorial(2)
## [1] 20
R Demo
STAT600 Page 22
# Install package if not already installed and load
if(!("gtools" %in% installed.packages()[,1]))install.packages("gtools")
library(gtools)
# Define list of objects

obj <- letters[1:5]
obj
## [1] "a" "b" "c" "d" "e"
# Generate list of permuations

obj.perms <- permutations(5, 2, v=obj, set=TRUE, repeats.allowed=FALSE)
dim(obj.perms)
## [1] 20 2
obj.perms
## [,1] [,2]
## [1,] "a" "b"
## [2,] "a" "c"
## [3,] "a" "d"
## [4,] "a" "e"
## [5,] "b" "a"
## [6,] "b" "c"
## [7,] "b" "d"
## [8,] "b" "e"
## [9,] "c" "a"
## [10,] "c" "b"
## [11,] "c" "d"
## [12,] "c" "e"
## [13,] "d" "a"
## [14,] "d" "b"
## [15,] "d" "c"
## [16,] "d" "e"
## [17,] "e" "a"
## [18,] "e" "b"
## [19,] "e" "c"
## [20,] "e" "d"
STAT600 Page 23
1.5.4 Combinations
The number of distinct subsets or combinations of size r that can be selected

from n distinct objects (r ≤ n) is:
n(n − 1) · · · (n − r + 1)

n n n!
Cr = = =
r (n − r)!r! r!
Prn
The expression Crn can be read as “n choose r”. Notice that Crn = r!
.
In R, the function choose(n, r) can be used to compute Crn .
Example 1.23 How many groups of two letters can be selected from a, b, c, d, e, if each
letter can only be selected once?
Using R to Generate Combinations

R can be used to generate and to count combinations.
# Compute number of combinations
choose(5, 2)
## [1] 10
R demo
# Install package if not already installed and load
# if(!("gtools" %in% installed.packages()[,1]))install.packages("gtools")
library(gtools)
# Define list of objects

(obj <- letters[1:5])
## [1] "a" "b" "c" "d" "e"
STAT600 Page 24
# Generate list of combinations
(obj.combins <- combinations(5, 2, v=obj, set=TRUE, repeats.allowed=FALSE))
## [,1] [,2]
## [1,] "a" "b"
## [2,] "a" "c"
## [3,] "a" "d"
## [4,] "a" "e"
## [5,] "b" "c"
## [6,] "b" "d"
## [7,] "b" "e"
## [8,] "c" "d"
## [9,] "c" "e"
## [10,] "d" "e"
dim(obj.combins)
## [1] 10 2
Example 1.24 The Mathematical Sciences Department has 15 academic staff members.
A group of 4 must be selected to help with the upcoming open day. How many different
groups of staff could be selected for this task?
Combinations with Replacement
The number of ways of making r selections from n objects when selection is

made with replacement and order is not important is:
n+r−1 (n + r − 1)!

=
r (n − 1)!r!
STAT600 Page 25
Example 1.25 An entomologist is studying the spatial distribution of insects across
plants. Suppose in a sample of 5 plants, 3 insects have been found. The locations of
these 3 insects across the 5 are not known. A plant could support any number of insects
so could have 0, 1, 2 or 3 of the insects. How many different arrangements of the insects
across the plants are possible, if:
a. the insects are distinguishable?
b. the insects are indistinguishable?
(Adapted from [SY10, p 48])
STAT600 Page 26
1.5.5 Partitions
The number of ways of partitioning n distinct objects into k groups containing

n1 , n2 , . . . , nk objects respectively is:
k
n! X
where n = ni
n1 !n2 ! · · · nk ! i=1
This result can also be explained as the number of different permutations of n objects,
of which n1 are alike, n2 are alike, . . ., nk are alike.
Example 1.26 Suppose 10 employees are to be divided among three teams, with 3 going
to team I, 4 going to team II and 3 going to team III. In how many ways can the team
assignments be made? (Adapted from [SY10, p 40])
STAT600 Page 27
Example 1.27 How many different arrangements can be formed from the letters:
PEPPER?
STAT600 Page 28
1.5.6 Summary of Counting Rules
The number of ways of selected r items from n, r ≤ n.
Order Is Important Order Is Not
Important
n+r−1

With
nr
Replacement r
Without n n! n n! n
Pr = Cr = =
Replacement (n − r)! (n − r)! r! r
1.5.7 Application to Probability - Examples

Example 1.28
Suppose that there are a group of 10 employees. This year employees will receive a bonus:
3 will receive $0, 4 will receive $1000, and 3 will receive $5000. (Adapted from [SY10, p
41])
a. How many ways can the bonuses be allocated?
b. Suppose that within this group there are 7 men and 3 women. All 3 women were
awarded a bonus of $5000. If the bonuses are awarded randomly, what is the proba-
bility that this would happen?
Example 1.29 An import company has received a delivery of 20 products. It will check
the quality of the delivery by inspecting three products. The inspection process involves
destroying the product. The delivery will only be accepted if all three products are non-
defective. Suppose that there are 5 defective items within the delivery of 20. What is the
probability that the delivery will be accepted?
STAT600 Page 29
Example 1.30 Birthday
What is the probability that two people in this class have the same birthday?
STAT600 Page 30
Chapter 2
Conditional Probability and

Independence
References:
[Ros13, chapter 3] [SY10, Chapter 3]
31
2.1 Introduction
Example 2.1 The Titanic was a large luxury ocean linear that was declared to be an
“unsinkable ship.” During its maiden voyage across the Atlantic Ocean, it hit an iceberg
and sank on April 14, 1912. Large numbers of people lost their lives. The economic status
of the passengers has been roughly grouped according to whether they were travelling
first class, second class, or third class. The crew has been reported separately. Although
the exact numbers are still a matter of debate, one report (Dawson, 1995) of the numbers
of those who did and did not survive, by economic status and gender is displayed in the
table below. [SY10, p 87]
Number of Passengers Number of Deaths

Economic Status
Male Female Male Female
First Class 180 145 118 4
Second Class 179 106 154 13
Third Class 510 196 422 106
Crew 862 23 670 3
Discussion Was a passenger’s chance of survival dependent on their gender and

economic status?
Compute:
P (passenger died | female and first class )
P (passenger died | male and third class )
STAT600 Page 32
This data is available in R.
## Higher survival rates in females?

apply(Titanic, c("Sex", "Survived"), sum)
## Survived
## Sex No Yes
## Male 1364 367
## Female 126 344
## Higher survival rates in children?

apply(Titanic, c("Age", "Survived"), sum)
## Survived
## Age No Yes
## Child 52 57
## Adult 1438 654
Use ?Titanic to get more information about the data. This dataset can be printed in
different ways depending on the question of interest.
#Class vs Gender
apply(Titanic, c("Class", "Sex"), sum)
## Sex
## Class Male Female
## 1st 180 145
## 2nd 179 106
## 3rd 510 196
## Crew 862 23
#Class vs Gender vs Survived

apply(Titanic, c("Class", "Sex", "Survived"), sum)
## , , Survived = No
##
## Sex
## 1st 118 4
## 2nd 154 13
## 3rd 422 106
## Crew 670 3
##
## , , Survived = Yes
##
## Sex
## 1st 62 141
## 2nd 25 93
## 3rd 88 90
## Crew 192 20
STAT600 Page 33
2.2 Conditional Probability
2.2.1 Conditional Probability Definition
For events A and B with P (B) 6= 0, then the conditional probability of

A given B is:
P (AB)
P (A|B) =
P (B)
The vertical line | is read as “given”, so P (A|B) is the probability that event A has
occurred, given that event B has occurred.
To compute conditional probability, B becomes a sample space for A and then the event
of interest is the intersection of A and B.
This formula can be rearranged to find the intersection: P (A ∩ B) = P (A|B)P (B)
Example 2.2 A die is thrown and the number on the uppermost face is observed. If it
is known that the number is even, what is the probability that the number is a 2?
A = {2}
B = {2, 4, 6}
P (A|B) = n(A)/n(B) = 1/3.
Example 2.3 Refer to the following table of Titanic data and the tables shown earlier.
classSurvival <- apply(Titanic, c("Class", "Survived"), sum)

addmargins(classSurvival)
## Survived
## Class No Yes Sum
## 1st 122 203 325
## 2nd 167 118 285
## 3rd 528 178 706
## Crew 673 212 885
## Sum 1490 711 2201
a. Given that a passenger was travelling in first class, what is the probability that he/she
survived?
STAT600 Page 34
b. Given that a passenger was male and travelling in third class, what is the probability
that he survived?
Example 2.4 There are four batteries and one is defective. Two are to be selected at
random for use on a particular day. Find the probability that the second battery selected
is not defective, given that the first was not defective. [SY10, ex3.2 p 60]
2.2.2 Conditioning on an Event
Let E and F be two events, then
P (E) = P (E|F )P (F ) + P (E|F c )P (F c )
STAT600 Page 35
Example 2.5 It is known that 0.5% of the population have disease A. Of those that have
the disease, 99% will test positive for the disease. Of those that do not have the disease
2% test positive for the disease.
• Event D = a randomly chosen person has the disease
• Event E = a positive test
a. Display this information in a tree diagram.
b. What is the probability that a person chosen at random has the disease?
c. What is the probability that a person chosen at random received a positive test?
STAT600 Page 36
2.2.3 Types of Errors
True Diagnosis
Positive Negative
Positive True Positive False Positive
Test Result
Negative False Negative True Negative
Example 2.6 Previously we considered testing for Disease A. For this example, con-
struct a table to show the rates of true positive, true negative, false positive and false
negatives.
True Diagnosis
Positive Negative Sum
Positive
Test Result
Negative
Sum
The sensitivity of a test is the probability that a person randomly selected

from among those who have the disease will have a positive test. If T P is the
number of people who have the disease and have a positive test, and F N is the
number of people who have the disease but have a negative test, then:
TP
Sensitivity =
TP + FN
The specificity of a test is the probability that a person randomly selected

from among those who do not have the disease will have a negative test. If F P is
the number of people who do not have the disease but have a positive test, and
T N is the number of people who do not have the disease and have a negative
test, then:
TN
Specificity =
FP + TN
STAT600 Page 37
A “good” test should have sensitivity and specificity values close to 1.
A third measure which is important when considering a test is the predictive value.
The predictive value of a test is the conditional probability that a randomly

selected person has the disease, given that he or she tested positive. If T P is the
number of people who have the disease and have a positive test, and F P is the
number of people who do not have the disease but have a positive test, then:
TP
Predictive Value =
TP + FP
Ideally, sensitivity, specificity and predictive value should all be close to 1. However the
predicted value is strongly influenced by the prevalence of the disease.
The prevalence of a disease is the probability that a randomly selected

person has the disease. If T P + F N is the number of people who have the
disease F P + T N is the number of people who do not have the disease, then:
TP + FN
Prevalence =
TP + FN + FP + TN
Example 2.7 Previously we considered testing for Disease A. For this example, compute
the sensitivity, specificity and predictive value for the test.
STAT600 Page 38
The following example highlights the importance of considering prevalence of a disease
when interpreting screening tests.
Example 2.8 For the three scenarios below compute the sensitivity, specificity, preva-
lence and predictive value of the test.
Scenario 1:
True Diagnosis
Positive 90 10 100
Test Result Negative 10 90 100
Sum 100 100 200
Scenario 2:
True Diagnosis
Positive 90 100 190
Sum 100 1000 1100
Scenario 3:
True Diagnosis
Positive 90 1000 1090
Sum 100 10000 10100
STAT600 Page 39
2.2.4 Is Conditional Probability a Probability?
We can assess P (A|B) = P (AB)/P (B) against the probability axioms to determine
whether or not it is a probability.
Since AB ⊂ B, then P (AB) ≤ P (B). We also have P (AB) ≥ 0. Therefore the first
axiom is satisfied, i.e.
P (AB)
0 ≤ P (A|B) = ≤1
P (B)
The second axiom
P (SB) P (B)
P (S|B) = = =1
P (B) P (B)
For the third axiom, if A1 , A2 , . . . are mutually exclusive events, then A1 B, A2 B, . . . are
also mutually exclusive, and:
∞
! !
[
∞
! P Ai B
[ i=1
P Ai |B =
i=1
P (B)
∞
X
P (Ai B)
i=1
=
P (B)
∞
X P (Ai B)
=
i=1
P (B)
X∞
= P (Ai |B)
i=1
Therefore all axioms are satisfied.
STAT600 Page 40
2.3 Independence
2.3.1 Independent Events
Two events A and B are independent if knowledge about one does not affect knowledge
about the other. If A occurs, it does not change the probability of B, and vice versa.
Two events A and B are independent if:
P (AB) = P (A)P (B)
Equivalently, if P (A ∩ B) = P (A)P (B) then A and B are independent.
Example 2.9 Which of the following do you think would be independent events?
• Day of week on which a person was born and whether they like chocolate.
• Gender and party voted for at the last election
• Age and income
• Car colour and exam mark
• Getting shot by a police officer and ethnicity
• Gender and survival on the Titanic
Example 2.10 If A and B are independent events, what is P (A|B)?
Example 2.11 Consider the Titanic survival data. Are the events “being a female
passenger” and “surviving” independent?
STAT600 Page 41
Example 2.12 Blood type, the best known of the blood factors, is determined by a
single allele. Each person has blood type A, B, AB or O. Type O represents the absence
of a factor and is recessive to factors A and B. Thus, a person with type A blood may
be either homozygous (AA) or heterozygous (AO) for this allele; similarly, a person with
type B blood may be either homozygous (BB) or heterozygous (BO). Type AB occurs if
a person is given an A factor by one parent and a B factor by the other parent. To have
type O blood an individual must be homozygous O (OO). Suppose a couple is preparing
to have a child. One parent has blood type AB and the other is heterozygous B. [SY10,
p 72]
What are the possible blood types the child will have and what is the probability of each?
STAT600 Page 42
2.3.2 Multiplicative Rule of Probability
The multiplicative rule states that for events E1 and E2 , then
P (E1 E2 ) = P (E1 )P (E2 |E1 ) = P (E2 )P (E1 |E2 )
If E1 and E2 are independent, then P (E1 E2 ) = P (E1 )P (E2 ).

In general, for events E1 , E2 , . . . , En then
P (E1 E2 · · · En ) = P (E1 )P (E2 |E1 ) · · · P (En |E1 E2 · · · En−1 )
[Ros13]
2.4 Bayes’ Theorem and the Law of Total Probability

2.4.1 Law of Total Probability
The law of total probabilityPn states that for mutually exclusive events
B1 , B2 , . . . , Bn such that i=1 P (Bi ) = 1,
P (A) = P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + . . . + P (A|Bn )P (Bn )
Or equivalently,
n
X
P (A) = P (A|Bi )P (Bi ).
i=1
History. Who is Bayes?
Thomas Bayes (c. 1701 - 1761) was a English

statistician, philosopher and Presbyterian minister.
He is known for formulating a specific case of the
theorem that bears his name: Bayes’ theorem.
P (image is actually of Bayes) < 1

Reference: https://en.wikipedia.org/wiki/Thomas_Bayes
STAT600 Page 43
2.4.2 Bayes’ Theorem
Bayes’P theorem states that for mutually exclusive events B1 , B2 , . . . , Bn

such that ni=1 P (Bi ) = 1,
P (E|Bj )P (Bj )
P (Bj |E) = n
X
P (E|Bi )P (Bi )
i=1
If the events Bi , i = 1, . . . , n are competing hypotheses, then Bayes’ formula gives the
conditional probabilities of these hypotheses when evidence E becomes available.
Notice that the denominator of this formula uses the law of total probability.
Example 2.13 A company buys microchips from three suppliers – I, II, and III. Supplier
I has a record of providing microchips that contain 10% defectives; Supplier II has a
defective rate of 5%; and Supplier III has a defective rate of 2%. Suppose 20%, 35%, and
45% of the current supply came from Suppliers I, II, and III, respectively. [SY10, p 79]
a. Represent this scenario using a tree diagram.
b. If a microchip is selected at random from this supply, what is the probability that it
is defective?
c. If a randomly selected microchip is defective, what is the probability that it came from
supplier II?
STAT600 Page 44
2.5 Odds, Odds Ratios and Relative Risk
Let F be an event, then the odds of event F or the odds in favour of event
F are given by
P (F )
P (F c )
Example 2.14 Consider a fair coin in which P (Heads) = P (Tails) = 0.5. The odds in
favour of heads is
P (H) 0.5
= =1
P (T ) 0.5
We can say, the odds of obtaining heads are 1 to 1 or “even”.
Example 2.15 Consider a horse race in which the probability of horse A winning is 0.8.
Then the odds in favour of horse A are
P (A) 0.8
c
= =4
P (A ) 0.2
We can say, the odds of horse A winning are 4 to 1.

Example 2.16 Consider another horse race in which the probability of horse B winning
is 0.25. Then the odds in favour of horse B are
P (B) 0.25 1
c
= =
P (B ) 0.75 3
We can say, the odds of horse B winning are 1 to 3.
45
Chapter 3
Discrete Random Variables
References:
[Ros13, chapter 4] [SY10, Chapter 4]
46
3.1 Random variables
A random variable is a real-valued function whose domain is a sample

space. (Sheaffer and Young, 2010, pp 94)
In other words, a random variable is a function that assigns a real number to every
member of the sample space.
The value X = x is the event that the random variable X takes the value x. It is
convention that:
• upper case – a random variable X, Y, Z
• lower case – the value that the random variable takes x, y, z
A random variable, X may be continuous or discrete (or a mixture of the two).
• Discrete random variable — X has a finite or countably infinite range of values.
• Continuous random variable — X is continuous and takes values over a real interval.
Examples of discrete random variables
• Number of defects in 10 products, X ∈ {0, 1, 2, ..., 10}.
• Number of customers served in a hour, X ∈ {0, 1, 2, . . .}
• Number of days that river level is above 3m, X ∈ {0, 1, 2, . . .}
• Number of heads minus number of tails in 3 tosses of a coin, X ∈ {−1, −3, 1, 3}
Examples of continuous random variables
• Time (in minutes) until the arrival of the next bus as at a given bus stop, X ∈ R+
• Height of a randomly selected university student, X ∈ R+
• Difference between actual weight of chocolate bar and advertised weight, X ∈ R
• Time a customer waits to get through to a call centre operator, X ∈ R+
• Temperature (Celsius) at a randomly selected geographical location, X ∈ R
47
3.2 Discrete Random Variables
3.2.1 Probability Mass Function
A random variable X is said to be a discrete random variable if it has a

finite or countably infinite range of values.
The probability mass function (pmf) or probability function p(x), of a

discrete random variable X assigns a probability to each value x of X such that:
(i) P (X = x) = p(x) ≥ 0
P
(ii) p(x) = 1
x
Example 3.1 Two coins are flipped. Let X denote the number of heads observed. Find
and graph the pmf of X.
px <- c(0.25, 0.5, 0.25)

barplot(px, names.arg=0:2, ylim=c(0,1), xlab="x", ylab="p(x)",
cex.axis=1.5, cex.names=1.5, cex.lab = 1.5, axis.lty = 1)
1.0
0.8
0.6
p(x)
0.4
0.2
0.0
0 1 2
x
STAT600 Page 48
Example 3.2 A local video store periodically puts its used movies in a bin and offers to
sell them to customers at a reduced price. Twelve copies of a popular movie have just
been added to the bin, but three of these are defective. A customer randomly selects two
of the copies for gifts. Let X be the number of defective movies the customer purchased.
Find and graph the probability function for X. [SY10, p 96]
(px <- c((9/12)*(8/11), (9/12)*(3/11) + (3/12)*(9/11), (3/12)*(2/11)))
## [1] 0.5454545 0.4090909 0.0454545
barplot(px, names.arg=0:2, ylim=c(0,1), xlab="x", ylab="p(x)")

MASS::fractions(px)
## [1] 6/11 9/22 1/22

1.0
0.8
0.6
p(x)
0.4
0.2
0.0
0 1 2
STAT600 Page 49
3.2.2 Distribution Function
The (cumulative) distribution function (cdf) of X is:
F (b) = P (X ≤ b)
If X is discrete, then
b
X
F (b) = p(x)
x=−∞
where p(x) is the probability function.
A distribution function F (x) has the following properties:
1. lim F (x) = 0
x→−∞
2. lim F (x) = 1
x→∞
3. The distribution function is a nondecreasing function; that is, if a < b then F (a) ≤
F (b).
4. The distribution function is right-hand continuous; that is, lim+ F (x + h) = F (x)

h→0
the distribution function (cdf) of X.


0, x<0

0.25, 0 ≤ x < 1
F (x) =


0.75, 1 ≤ x < 2
x≥2

1,
STAT600 Page 50
Example 3.4 Recall the video store example.
a. Compute the distribution function F (x).
b. Graph the distribution function F (x).
px
## [1] 0.5454545 0.4090909 0.0454545
cumsum(px)
## [1] 0.545455 0.954545 1.000000
MASS::fractions(cumsum(px))
## [1] 6/11 21/22 1
sfun0 <- stepfun(0:2, c(0, cumsum(px)), f = 0)

plot(sfun0, verticals=FALSE, xlim=c(0, 3), pch = 19, main=NULL,
ylab="F(x)")
STAT600 Page 51
1.0
0.8
0.6
F(x)
0.4
0.2
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
c. Verify that the function is F (x) is indeed a distribution function using the properties.
1. lim F (x) = 0
x→−∞
2. lim F (x) = 1
x→∞
F (b).
h→0
A distribution function should have the following properties:

1. lim F (x) = 0
x→−∞
X F (x) = 0 for x < 0

2. lim F (x) = 1
x→∞
X F (x) = 1 for x ≥ 2
F (b).
X As x increases, F (x) increases
h→0
X The function is discontinuous at points 0, 1 and 2. At each of these points F (x)
is right-hand continuous. e.g. at X = 1, lim+ F (1 + h) = 0.545 = F (1)
h→0
STAT600 Page 52
3.3 Expected Value and Moments
3.3.1 Expected Value
A random variable X with probability mass function p(x) has an

expected value, E[X], whereby:
X
E[X] = xp(x)
x
The expected value of X can be thought of as the average of the random variable and is
often called the mean of X and is denoted by µ.
the expected value E[X].
E[X] = (0 × 0.25) + (1 × 0.5) + (2 × 0.25) = 1
Example 3.6 Compute the expected value of X for the video store example, i.e. compute
the expected number of defective movies bought.
In R:
x <- 0:2
MASS::fractions(px)
## [1] 6/11 9/22 1/22
EX <- sum(x*px)
EX
## [1] 0.5
STAT600 Page 53
3.3.2 Expected Value of a Function
If X is a discrete random variable with probability function p(x) and g(X)

is a real-valued function X, then:
X
E[g(X)] = g(x)p(x)
x
Example 3.7 Suppose you decided to play the following game with a friend. Two coins
are flipped and the number of heads is observed. Let X denote the number of heads. For
each of the following games, compute your expected winnings and determine if the game
is fair (i.e. has an expected value of 0).
a. If no heads are observed, you must pay your friend $1. If one head is observed, your
friend pays you $1 and if two heads are observed, your friend pays you $2.
b. If no heads are observed, you must pay your friend $4. If one head is observed, your
friend pays you $1 and if two heads are observed, your friend pays you $2.
STAT600 Page 54
3.3.3 Variance and Standard Deviation
The variance of a random variable X, with expected value µ, is denoted by

VAR[X] and is given by:
VAR[X] = E[(X − µ)2 ] = E[X 2 ] − µ2
The variance is sometime denoted by σ 2 .
The standard deviation of a random variable X is the squareroot of the

variance and is given
√ by:
p
STD(X) = σ = σ 2 = E[(X − µ)2 ]
The standard deviation can be thought of as a “typical” deviation between an observed

outcome and the expected value.
the variance and standard deviation of X. E[X] = µ = 1
VAR[X] = (0 − 1)2 × 0.25 + (1 − 1)2 × 0.5 + (2 − 1)2 × 0.25 = 0.5
p √
STD[X] = VAR[X] = 0.5 = 0.7071068
Example 3.9 Compute the variance and standard deviation of X for the video store
example.
STAT600 Page 55
Example 3.10 Verify that VAR[X] = E[(X − µ)2 ] does in fact equal E[X 2 ] − µ2 .
STAT600 Page 56
3.3.4 Functions of a Random Variable
Let X be a random variable with expected value E[X] and variance VAR[X].
Then:
• If Y = a + bX, then E[Y ] = a + bE[X]
• If Y = a + bX, then VAR[Y ] = b2 VAR[X]
Tchebysheff ’s Theorem: Let X be a random variable with mean µ and

variance σ 2 . Then for any positive k,
1
P (|X − µ| < kσ) ≥ 1 −
k2
[SY10, p 113]
Tchebysheff’s Theorem is useful when the mean and variance are known, but the distri-
bution is unknown.
STAT600 Page 57
Example 3.11 The daily production of electric motors at a certain factory averages 120
with a standard deviation of 10. [SY10, p 113]
1. What can be said about the fraction of days on which the production level falls
between 100 and 140?
2. Find the shortest interval certain to contain at least 90% of the daily production
levels.
STAT600 Page 58
3.3.5 Moments
Let X be a discrete random variable with probability mass function p(x).

P
• 1st moment: E[X] = xp(x), (i.e. the mean of X).
x
• 2nd moment: E[X 2 ] = x2 p(x)

P
x
• kth moment: E[X k ] = xk p(x)

P
x
The second moment may be familiar from the following definition of the variance of X.
VAR[X] = E[X 2 ] − µ2
Example 3.12 Let the discrete random variable X have the following probability mass
function:
x 0 1 2 3 4
p(x) 0.1 0.3 0.4 0.1 0.1
The first 3 moments of X are:

X
E[X] = xp(x) = 0(0.1) + 1(0.3) + 2(0.4) + 3(0.1) + 4(0.1) = 1.8
x
X
E[X 2 ] = x2 p(x) = 02 (0.1) + 12 (0.3) + 22 (0.4) + 32 (0.1) + 42 (0.1) = 4.4
x
X
3
E[X ] = x3 p(x) = 03 (0.1) + 13 (0.3) + 23 (0.4) + 33 (0.1) + 43 (0.1) = 12.6
x
STAT600 Page 59
3.3.6 Example: Production Line
A manufacturing company in concerned about the number of defects on its production
lines. Let:
• Let X denote the number of defects per hour on production line 1
• Let Y denote the number of defects per hour on production line 2
1. Calculate the distribution function of X .

x 0 1 2
p(x) 0.5 0.3 0.2
F (x) = P (X ≤ x)
2. Calculate the distribution function of Y .

y 0 1
p(y) 0.9 0.1
F (y) = P (Y ≤ y)
3. The production of defects costs the company money. The financial director has
authorised the repair of one production line. Which production line do you think
should be repaired? Justify your answer.
STAT600 Page 60
4. Find the expected value and variance of X and Y .
STAT600 Page 61
5. Suppose that to repair a defect created by machine X costs $5. Find the expected
value and variance of the cost of repairing defects per hour.
STAT600 Page 62
6. Verify your answer to the previous question by simulating this scenario in R.
p_defect <- c(0.5, 0.3, 0.2)

x_defect <- 0:2
cost <- 5
num_sims <- 1e5

trials <- replicate(num_sims,
sample(x_defect, 1, prob= p_defect))
(mean_cost_trials <- mean(trials*cost))
## [1] 3.49145
(var_cost_trials <- var(trials*cost))
## [1] 15.2202
63
Chapter 4
Discrete Distributions
References:
[Ros13, Chapter 4]
[SY10, Chapter 4]
64
4.1 Discrete Distributions
There are several fundamental discrete distributions that apply for a large number of
practical problems. These include:
• Bernoulli
• Binomial
• Geometric
• Negative Binomial
• Poisson
• Hypergeometric
4.2 Bernoulli Distribution

4.2.1 Definition
A Bernoulli trial is an experiment with only two outcomes.
For example, tossing a coin and observing either a head or a tail; a cow being pregnant
or not; a product being defective or not. The two outcomes are often referred to as a
success (often denoted by a 1) and a failure (often denoted by a 0).
Suppose one Bernoulli trial is conducted, in which the probability of success is p. Let X
be a random variable in which
(
0, if the outcome of the trial is a failure
X=
1, if the outcome of the trial is a success
The probability distribution of X is
p(x) = px (1 − p)1−x , x = 0, 1
P
The expected value of X is E[X] = x p(x)x = 0 × (1 − p) + 1 × p = p.
Bernoulli Distribution:
p(x) = px (1 − p)1−x , x = 0, 1
Mean: E[X] = p
Variance: VAR[X] = p(1 − p)
65
4.3 Binomial Distribution
4.3.1 Definition
A random variable X has a binomial distribution if the following

conditions are met:
1. The experiment consists of a fixed number n of identical trials
2. Each trial has only two outcomes (i.e. each trial is a Bernoulli trial)
3. The probability of success p is constant from trial to trial
4. The trials are independent
5. X is defined as the number of successes among n trials.
A Binomial random variable X is a the number of successes in n

independent and identical Bernoulli trials in which the probability of a success
is p, 0 < p < 1. Then, for X ∼ B(n, p),
(
n k
k
p (1 − p)n−k for k = 0, . . . , n
P (X = k) =
0 otherwise
Mean: E[X] = np
Variance: VAR[X] = np(1 − p)

p n−k

Notice that: P (X = k + 1) = 1−p k+1
P (X = k)
n

In the probability function
for X, the term k
represents the number ways of allocating
n n!
k successes in n trials: k = k!(n−k!) .
Example 4.1 Applications of binomial random variables.
• Number of defects in a sample of n components
• Number of times a head is obtained in n coin tosses
• Number of rivers out of a sample of n that breached their banks last winter
STAT600 Page 66
4.3.2 Binomial Distribution in R
dbinom(x, n, p)
pbinom(x, n, p)
qbinom(Fx, n, p)
rbinom(numSims, n, p)
Example 4.2 Use R to find the probability function and distribution function of the
binomial distribution with n = 10 and p = 0.2 for values of x = 0, 1, . . . , 10:
dbinom(0:10, 10, 0.2)
## [1] 0.1073741824 0.2684354560 0.3019898880 0.2013265920

## [5] 0.0880803840 0.0264241152 0.0055050240 0.0007864320
## [9] 0.0000737280 0.0000040960 0.0000001024
pbinom(0:10, 10, 0.2)
## [1] 0.107374 0.375810 0.677800 0.879126 0.967207 0.993631

## [7] 0.999136 0.999922 0.999996 1.000000 1.000000
For example: If X ∼ Bin(10, 0.2), then P (X = 2) = 0.30199 and P (X ≤ 2) = 0.6778.
4.3.3 Examples
Example 4.3 An industrial firm supplies 10 manufacturing plants with a certain chem-
ical. The probability that any one firms calls in an order on a given day is 0.2, and this
is the same for all 10 plants. [SY10, ex 4.12]
a. Find the probability that on the given day, the number of plants calling in orders is:
i. Exactly 3
ii. At most 3
iii. At least 3
b. Find the expected value and variance for number of plants calling in an order on a
given day.
c. Verify your answers using simulation
STAT600 Page 67
STAT600 Page 68
4.4 Geometric Distribution
4.4.1 Definition
Suppose that a sequence of independent and identical Bernoulli trials, each

having probability of success p, 0 < p < 1, are performed until a success is
observed. If X denotes the number of failures obtained prior to the first success,
then X is said to be a geometric random variable.
(
p(1 − p)x for x = 0, 1, 2, . . . ; 0 < p < 1
P (X = x) =
0 otherwise
F (x) = P (X ≤ x) = 1 − (1 − p)x+1 for x = 0, 1, 2, . . .
1−p
Mean: E[X] =
p
1−p
Variance: VAR[X] =
p2
Notice that: P (X ≥ x) = 1 − F (x − 1) = 1 − (1 − (1 − p)x ) = (1 − p)x for x = 0, 1, 2, . . .
Example 4.4 Applications of geometric random variables

• Number of customers contacted before the first sale is made
• Number of times a child is exposed to measles before contracting the disease
• Number of unqualified applicants interviewed prior to the first qualified one
4.4.2 Geometric Distribution in R

dgeom(x, p)
pgeom(x, p)
qgeom(Fx, p)
rgeom(numSims, p)
Example 4.5 Use R to find the probability of observing 3 failures before the first success,
if the probability of success is 0.2.
probSuccess <- 0.2
numFailures <- 3
dgeom(numFailures, probSuccess)
## [1] 0.1024
probSuccess*(1-probSuccess)^numFailures
## [1] 0.1024
STAT600 Page 69
4.4.3 Examples
Example 4.6 A recruiting firm finds that 20% of the applicants for a particular sales
position are fluent in both English and Spanish. Applicants are selected at random from
the pool and interviewed sequentially. [SY10, ex 4.15 and 4.16]
1. Find the probability that five applicants are interviewed before finding the first
applicant who is fluent in both English and Spanish.
2. Let X denote the number of unqualified applicants interviewed prior to the first
qualified one. Suppose that the first applicant who is fluent in both English and
Spanish is offered the position, and the applicant accepts. Suppose each interview
costs $125.
(a) Find the expected value and the standard deviation of the total cost of inter-
viewing until the job is filled.
STAT600 Page 70
STAT600 Page 71
(b) The mean and variance of the total cost are known, but the distribution is
not. Use Tchebysheff’s Theorem to determine the interval in which this cost
should be expected to fall at least 75% of the time?
3. Find the probability that at least five applicants are interviewed before finding the
first applicant who is fluent in both English and Spanish.
4. Find the probability that at least four applicants are interviewed before finding the
STAT600 Page 72
first applicant who is fluent in both English and Spanish.
5. Given at least five applicants are interviewed before finding the first applicant who
is fluent in both English and Spanish, find the probability that at least 9 unqualified
applicants are interviewed before the first qualified one.
STAT600 Page 73
4.4.4 Properties of the Geometric Distribution
The geometric distribution is the only discrete distribution to have the memoryless property.
If we have observed j failures, then the probability of observing at least k more failures
(i.e. j + k failures in total) before a success is the same as being at the beginning the
probability of observing at least k failures. That is, for integers j, k > 0
P (X ≥ j + k|X ≥ j) = P (X ≥ k)
P ((X ≥ j + k) ∩ (X ≥ k))
P (X ≥ j + k|X ≥ k) =
P (X ≥ k)
P (X ≥ j + k)
=
P (X ≥ k)
(1 − p)j+k
=
(1 − p)k
= P (X ≥ j)
This property is demonstrated in example 5 above.
STAT600 Page 74
4.4.5 Alternative Parameterization
If instead, X is defined as the number of trials until the first success (rather than the
number of failures), then the X also has a Geometric distribution.
Then for X ∼ Geometric(p)
P (X = x) = p(1 − p)x−1 , x = 1, 2, . . . ; 0 ≤ p ≤ 1
1
Mean: E[X] = p
1−p
Variance: VAR[X] = p2
Example 4.7 Applications of alternative parameterization
• Number of coin flips until the first head is observed
• Number of rolls of a die required to obtain roll a 6
• Number of balls a cricket player will face before getting out (if each ball is indepen-
dent and probability of getting out is the same on each ball)
The drawback of this parameterization is that X cannot be equal to 0. Therefore, when

the application cannot be explained in terms of Bernoulli trials (as in the following
example), we may need to allow for X = 0.
Example 4.8 The number of weeds within a randomly selected square meter of a pasture
has been found to be well modelled using the geometric distribution. For a given
pasture, the number of weeds per square meter averages 0.5. What is the probability
that no weeds will be found in a randomly selected square meter of this pasture? [SY10,
ex 4.18]
STAT600 Page 75
4.5 Negative Binomial Distribution
4.5.1 Definition
Suppose that a sequence of independent and identical Bernoulli trials, each

having probability of success p, 0 ≤ p ≤ 1, are performed until r successes are
observed. If X denotes the number of failures obtained prior to the rth success,
then X is said to be a negative binomial random variable.
(
x+r−1 r

r−1
p (1 − p)x for x = 0, 1, . . . ; 0 ≤ p ≤ 1
P (X = x) =
0 otherwise
r(1 − p)
Mean: E[X] =
p
r(1 − p)
Variance: VAR[X] =
p2
Example 4.9 Applications of the negative binomial distribution
• Number of customers contacted before the rth sale is made
• Number of unqualified applicants interviewed prior to the rth qualified one
4.5.2 Negative Binomial Distribution in R

dnbinom(x, r, p)
pnbinom(x, r, p)
qnbinom(Fx, r, p)
rnbinom(numSims, r, p)
Example 4.10 Use R to find the probability of observing 3 failures before the second
success, if the probability of success is 0.2.
probSuccess <- 0.2

numSuccesses <- 2
numFailures <- 3
dnbinom(numFailures, numSuccesses, probSuccess)
## [1] 0.08192
choose(numFailures+numSuccesses-1, numSuccesses-1)*
(probSuccess^numSuccesses)*(1-probSuccess)^numFailures
## [1] 0.08192
STAT600 Page 76
4.5.3 Examples
Example 4.11 Suppose that 20% of the applicants for a certain sales position are fluent
in English and Spanish. Suppose that four jobs requiring fluency in English and Spanish
are open. Find the probability that two unqualified applicants are interviewed before
finding the fourth qualified applicant, if the applicants are interviewed sequentially and
at random.
R code:
probSuccess <- 0.2

numFailures <- 2
numSuccesses <- 4
dnbinom(numFailures, numSuccesses, probSuccess)
## [1] 0.01024
choose(numFailures+numSuccesses-1,
numSuccesses-1)*probSuccess^numSuccesses*
(1-probSuccess)^numFailures
## [1] 0.01024
STAT600 Page 77
4.5.4 Properties of the Negative Binomial Distribution
• Extension of the geometric distribution
• Very flexible distribution as it can take different shapes depending on the parameter
values.

A negative binomial random variable is the number of independent and identical
Bernoulli trials required to obtain the rth success (for an integer r > 1). The probability
of a success is p.
Then for X ∼ Neg Bin(r, p)
x−1 r

P (X = x) = p (1 − p)x−r , x = r, r + 1, . . .
r−1
r
Mean: E[X] = p
r(1−p)
Variance: VAR[X] = p2
Example 4.12 Applications of alternative parameterization
• Number of children born into a family before having 3 daughters
• Number of rolls of a die required to obtain 10 sixes
• Number of products to sample before obtaining 5 defective items
STAT600 Page 78
4.6 Poisson Distribution
4.6.1 Definition
A random variable X that takes values 0, 1, 2, . . . is said to be a Poisson

random variable with parameter λ, if for some λ > 0
( x
e−λ λx! for x = 0, 1, 2, . . .
P (X = x) =
0 otherwise
Mean: E[X] = λ
Variance: VAR[X] = λ
Poisson random variables are useful for modelling the occurrence of random phenomena.
The random variable X is the number of events observed and λ can be interpreted as the
rate at which events occur.
Example 4.13 Applications of the Poisson distribution
• Number of car accidents at a given intersection per week
• Number of customers arriving for service at a bank
• Number of phone calls per minute coming into an office
• Number of cyclones in the Pacific Ocean per year
4.6.2 Poisson Distribution in R

dpois(x, lambda)
ppois(x, lambda)
qpois(Fx, lambda)
rpois(numSims, lambda)
Example 4.14 A certain type of event occurs at a rate of 2 per day. What is the
probability that on a particular day, 3 events are observed?
lambda <- 2
numEvents <- 3
# P(X = 3)
dpois(numEvents, lambda)
## [1] 0.180447
exp(-lambda)*(lambda^numEvents)/factorial(numEvents)
## [1] 0.180447
STAT600 Page 79
History.
The Poisson distribution is named after Siméon Denis Poisson
(1781 – 1840), a French mathematician. In addition to giving
his name to this important distribution he made contributions in
other areas of science. He developed an expression for the force
of gravity (in terms of the distribution of mass within a planet),
which has been used for determining details of the Earth’s shape,
by measuring the paths of orbiting satellites.
Borokov, K. (2003), Elements of Stochastic Modelling, World Scientific, Singapore. Image from:
http://en.wikipedia.org/wiki/Simeon_Denis_Poisson
4.6.3 Examples
Example 4.15 During business hours, the number of calls passing through a particular
cellular relay system averages five per minute. [SY10, ex 4.22]
1. Find the probability that no call will pass through the relay system during a given
minute.
2. Find the probability that no call will pass through the relay system during a 2-
minute period.
STAT600 Page 80
3. Find the probability that three calls will pass through the relay system during a
2-minute period.
4. Find the probability that no more than two calls pass through the system in a given
minute.
STAT600 Page 81
Example 4.16 Derive the mean of the Poisson distribution
4.6.4 Properties of the Poisson Distribution

Example 4.17 Suppose we are interested in the number of accidents at a particular
intersection over the period of a week. We could split this interval into a series of n
subintervals such that:
P (one accident in a subinterval) = p

P (no accidents in a subinterval) = 1 − p
time
If we assume:
• that the probability p is the same across all subintervals and
• that the subintervals are small enough that the probability that one contains more
than one accident is 0
• the occurrence of accidents is independent from subinterval to subinterval
Then, the number of subintervals containing accidents, and thus the number accidents in
a week is a binomial random variable. However, in this scenario, we do not know n or p,
but we would expect that as n increases, p would decrease.
STAT600 Page 82
Poisson Approximation to the Binomial Distribution
Some probability distributions, like the Poisson, come about by limiting the arguments
applied to other distributions. Let’s examine what happens to the binomial distribution
as n increases and p decreases.
For large n, the binomial distribution can be approximated by the Poisson distribution.
This suggests that Poisson distribution would be a good model when there is a large
number of independent trials each with the same probability.
Let X ∼ Binomial(n, p) and let λ = np.
k n−k
n λ λ
P (X = k) = 1−
k n n
As n increases whilst holding the mean, np, constant:
k n−k k n −k

n λ λ n! λ λ λ
lim 1− = lim 1− 1−
n→∞ k n n n→∞ k!(n − k)! nk n n
k n −k
n! λ λ λ
= lim k 1− 1−
n→∞ n (n − k)! k! n n
k
λ
=1· · e−λ · 1
k!
k
λ
= e−λ
k!
Since:
n × (n − 1) × (n − 2) × . . . , ×(n − k + 1) nk

n!
lim = lim = =1
n→∞ nk (n − k)! n→∞ nk nk
n
λ
lim 1 − = e−λ
n→∞ n
−k
λ
lim 1 − =1
n→∞ n
As n → ∞, if np is held constant then the Poisson distribution is the limiting

distribution of the Binomial distribution.
The Poisson distribution can be used to approximate the binomial distribution

for large n and small p. For this reason, the Poisson distribution is sometimes
called the distribution of rare events.
The Poisson distribution can be used to measure the occurrence of rare events in
space and volume as well as in time.
STAT600 Page 83
Example 4.18 Applications of the Poisson distribution to modelling “rare” events
• Number of flaws in a cubic metre of fabric
• Number of bacteria colonies in a cubic centimetre of water
• Number of times a machine fails over a 24 hour period
Poisson Paradigm: Consider n events, with pi equal to the probability that

event i occurs, i = 1, 2, . . . , n. If all the pi are “small” and the trials are either
independent or at most “weakly dependent”, then the number P of these events
that occur approximately has a Poisson distribution with mean ni=1 pi .
Example 4.19 Suppose each of n people is equally likely to have his or her birthday
on any of the 365 days of the year. What is the probability that a set of n independent
people all have different birthdays?
We solved this problem in week 1, but here we will solve it using the Poisson approxima-
tion. [Ros13, p 149]
STAT600 Page 84
R simulation of the birthday problem
set.seed(123123)
options(scipen=10, digits=7)
numSims <- 100000
people <- c(23, 35, 50, 100)
results <- data.frame(numPeople=NA, simulation=NA, actual=NA, PoissonApprox=NA)
for(i in 1:length(people)){
numPeople <- people[i]
results[i, "numPeople"] <- numPeople
# Simulation
uniqueBirthday <- replicate(numSims,
length(unique(sample(1:365, numPeople,
replace=TRUE)))==numPeople)
results[i, "simulation"] <- 1-table(uniqueBirthday)["FALSE"]/numSims
# Actual probability
results[i, "actual"] <- choose(365, numPeople )*factorial(numPeople)/(365^numPeople)
# Poisson Approximation
lambda <- choose(numPeople, 2)/365
results[i, "PoissonApprox"] <- dpois(0, lambda)
}
results
## numPeople simulation actual PoissonApprox

## 1 23 0.49255 0.4927027656760 0.499998248
## 2 35 0.18304 0.1856167611253 0.195902736
## 3 50 0.03063 0.0296264204220 0.034868746
## 4 100 0.00000 0.0000003072489 0.000001289
STAT600 Page 85
4.7 Hypergeometric Distribution
4.7.1 Definition
Suppose that a lot consists of N items, of which k are of one type (success)
and N − k are of another type (failures). Suppose that n items are sampled
randomly and sequentially from the lot, without replacement. Let X denote
the number of successes amongst the n sampled items, the X is said to have a
hypergeometric distribution.
 k N −k
 x n−x

for x = 0, 1, . . . , k; with b

= 0 if a > b
N a

P (X = x) = n

0 otherwise


k
Mean: E[X] = n
N
N −n

k k
Variance: VAR[X] = n 1−
N N N −1
The pmf of the hypergeometric distribution can be understoon intuitively as follows. For
b

x = 0, 1, 2, . . . , k; with a = 0 if a > b:
k N −k

x n−x
P (X = x) = N

n
(choose x successes from k) × (choose n − x failures from N − k)
=
(choose n items from N )
Example 4.20 Applications of the hypergeometric distribution
• Sampling from small populations without replacement
• Discrimination - e.g. probability of employing 2 male candidates from a pool of 5

males and 3 females
• Quality Control - e.g. Probability of observing defects from a pool of objects in

which the total number of defects is known
STAT600 Page 86
4.7.2 Hypergeometric Distribution in R
dhyper(x, k, N-k, n)
phyper(x, k, N-k, n)
qhyper(x, k, N-k, n)
rhyper(numSims, k, N-k, n)
x = number of successes
k = total number of successes
N = total number of items
N - k = total number of failures
n = number of items selected
4.7.3 Examples
Example 4.21 Two positions are open in a company. Ten men and five women have
applied for a job at this company, and all are equally qualified for either position. The
manager randomly hires two people from the applicant pool to fill the positions. What
is the probability that a man and a woman were chosen? [SY10, ex 4.25]
R code:
totalNumSuccesses <- 10 #k
totalNumFailures <- 5 #N-k
numSuccesses <- 1 #x
numTrials <- 2 #n
dhyper(numSuccesses, totalNumSuccesses, totalNumFailures, numTrials)
## [1] 0.47619
10/21
## [1] 0.47619
choose(totalNumSuccesses, numSuccesses)*
choose(totalNumFailures, numTrials-numSuccesses)/
choose(totalNumSuccesses+totalNumFailures, numTrials)
## [1] 0.47619
STAT600 Page 87
Example 4.22 An auditor checking the accounting practices of a firm samples 4 accounts
from an accounts receivable list of 12. Find the probability that the auditor sees at least
one past-due account under the following conditions: [SY10, ex 4.111]
1. There are 2 such accounts among the 12.
Let X denote the number of past-due accounts in a sample of 4 from 12, in which
2 of 12 are past-due.
N = 12, n = 4, k = 2
P (X ≥ 1) = 1 − P (X = 0)
2 12−2

0 4−0
=1− 12

4
= 0.575758
1-dhyper(0, 2, 10, 4)
## [1] 0.575758
1-choose(2,0)*choose(12-2, 4-0)/choose(12, 4)
## [1] 0.575758
2. There are 6 such accounts among the 12
STAT600 Page 88
3. There are 8 such accounts among the 12
4.8 Distributions in R: Summary

Distributions in R: Summary
d = density/mass function
p = distribution function
q = quantile function
r = random number
Key Discrete Distributions

dbinom — Binomial (including Bernoulli) distribution
dgeom — Geometric distribution
dnbinom —Negative binomial distribution
dpois — Poisson distribution
dhyper — Hypergeometric distribution
Other useful functions

factorial(x) x!
sample(x, k, Takes a sample of size k from set x without replacement
replace=FALSE)
Binomial coefficient of nk = (n−k)!
n!

choose(n,k) k!
replicate(n, x) Repeats expression x n times
combn(x, m) Returns the combinations of the elements of x of size m
STAT600 Page 89
4.9 Simulation
• Widely used technique
• Often used to analyze “what-if” type questions
• Useful when problems cannot be solved analytically
– due to complexity of problem

– because to do so would require simplifying assumptions
• Results influenced by:
– number of runs
– starting conditions
– length of each simulation run
– accuracy of the model (compared with system being modelled)
• Use pseudorandom numbers generated by a computer
The random variables X1 , X2 , . . . , Xn are a random sample if they are independent and
identically distributed (iid).
The distribution resulting from the random variables in a random sample is called the
empirical probability distribution.
This function will differ each time a new sample is taken. When we simulate a random
discrete variable, we are collecting a random sample of values from that distribution. As
the size of the sample increases the empirical probability distribution will converge to
the theoretical distribution. As n → ∞, empirical probability distribution → theoretical
distribution.
4.9.1 Simulating a Discrete Distribution

1. Use an in-built distribution, e.g. rbinom or sample
set.seed(12345)
dbinom(0:5, 5, 0.3)
## [1] 0.16807 0.36015 0.30870 0.13230 0.02835 0.00243
table(rbinom(10000, 5, 0.3))
##
## 0 1 2 3 4 5
## 1653 3624 3135 1290 276 22
STAT600 Page 90
2. Write your own simulation using uniform random numbers between 0 and 1.
• The unit interval is the range of values from 0 to 1.
• A unit random number is a number selected at random from this interval.
• To simulate a discrete random variable X, the values of X are assigned to intervals

of the unit interval, proportional to their probability.
• In R runif — Uniform random numbers
• Use this method when an in-built distribution is not available.
Example 4.23 Let X be a random variable with the following probability mass function:
x 10 20 30 40 50
p(x) 0.2 0.3 0.1 0.3 0.1
F (x) 0.2 0.5 0.6 0.9 1
To generate a random observation from this distribution:
1. Generate unit random number, u.
2. If 0 ≤ u < 0.2, then X = 10
3. Else if 0.2 ≤ u < 0.5, then X = 20 etc.
STAT600 Page 91
R code for simulation
set.seed(9293)
x <- seq(10, 50, 10)
px <- c(0.2, 0.3, 0.1, 0.3, 0.1)
Fx <- cumsum(px)
Fx
## [1] 0.2 0.5 0.6 0.9 1.0
numSims <- 10000

u.all <- runif(numSims)
results <- data.frame(u=u.all, x1=NA, x2=NA)
# Method 1
for(i in 1:numSims){
results[i, "x1"] <- x[min(which(u.all[i] < Fx))]
}
table(results[,"x1"])
##
## 10 20 30 40 50
## 2042 2951 995 3020 992
# Method 2
results$x2 <- sapply(u.all, function(u)x[min(which(u < Fx))])
table(results[,"x2"])
##
## 10 20 30 40 50
## 2042 2951 995 3020 992
head(results)
## u x1 x2
## 1 0.4068846 20 20
## 2 0.3058260 20 20
## 3 0.7390002 40 40
## 4 0.8640917 40 40
## 5 0.6610237 40 40
## 6 0.1691787 10 10
STAT600 Page 92
4.10 Activity: Monty Hall
Suppose you are a contestant on a game show and the host (Monty) shows you 3 doors.
Behind 2 of the doors are goats and behind 1 of the doors is a car. You do not know the
location of the car, but Monty does.
At the start of the game, you choose a door. Monty then opens one of the other two
doors to reveal a goat. He offers you the choice of sticking with your original door choice
or switching to the other closed door.
Should you stick with your original decision or switch to the other door?
http://www.shodor.org/interactivate/activities/SimpleMontyHall/
https://www.youtube.com/watch?v=mhlc7peGlGg
93
Chapter 5
Continuous Random Variables
References:
[Ros13, Chapter 5]
[SY10, Chapter 5]
94
5.1 Continuous Random Variables
A random variable, X may be continuous or discrete (or a mixture of the two).
• Discrete random variable — X has a finite or countably infinite range of values.
• Continuous random variable — X is continuous and takes values over a real interval.
Example 5.1 Applications of continuous random variables
• Lifetime of a washing machine
• Waiting time for a bus
• Price of shares on the stock market
• Height of STAT600 students
0.12
dnorm(x_seq, 0, 3)
dexp(x_seq, 0.5)
0.4
0.08
0.2
0.04
0.00
0.0
0 2 4 6 8 10 −10 −5 0 5 10
x_seq x_seq
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
0.15
dgamma(x_seq, 2, 1)
dunif(x_seq, 2, 8)
0.3
0.05 0.10
0.2
0.1
0.00
0.0
−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−
0 2 4 6 8 10 0 2 4 6 8 10
x_seq x_seq
95
5.1.1 Relative Frequency
Example 5.2 Suppose the lifetimes of 50 batteries were recorded (in hundreds of hours).
What can we say about the lifetime of a battery?
lifetime <- c(0.406, 0.685, 4.778, 1.725, 8.223, 2.343, 1.401,

2.23, 0.538, 0.234, 4.025, 3.323, 2.92, 5.088, 1.458, 1.064,
0.774, 0.761, 5.587, 0.517, 3.246, 2.33, 1.064, 2.563, 0.511,
3.246, 2.33, 1.064, 0.023, 0.225, 1.514, 3.214, 3.81, 3.334,
2.325, 0.333, 7.514, 0.968, 3.491, 2.921, 1.624, 0.334, 4.49,
1.267, 1.702, 2.634, 1.849, 0.186, 1.507, 0.294)
h <- hist(lifetime, xlim=c(0, 10), xlab="hours (hundreds)",

main="Battery Lifetimes", col = "gray")
Battery Lifetimes
15
10
Frequency
5
0
0 2 4 6 8 10
hours (hundreds)
Complete the frequency table:

x [0,1) [1,2) [2,3) [3,4) [4,5) [5,6) [6,7) [7,8) [8,9)
Frequency
Relative Frequency
Note: the interval [a, b) includes point a but not b.
h <- hist(lifetime, xlim=c(0, 9), prob=TRUE,

xlab="hours (hundreds)", main="Battery Lifetimes", col = "gray")
x <- seq(0, 9, 0.01)
lines(x, 0.5*exp(-x/2), col="red", lwd=2)
STAT600 Page 96
Battery Lifetimes
0.30
0.25
0.20
Density
0.15
0.10
0.05
0.00
0 2 4 6 8
hours (hundreds)
5.1.2 Probability Density Function
A random variable X is said to be continuous if there is a function f (x),

called the probability density function, such that:
1. f (x) ≥ 0, for all x

Z ∞
2. f (x)dx = 1
−∞
Z b
3. P (a ≤ X ≤ b) = f (x)dx
a
[SY10, p 194]
The density function is a model of the relative frequency of X.
STAT600 Page 97
Example 5.3 Suppose we are going to model the battery lifetimes (in hundreds of hours)
using the following probability density function.
(
1 −x/2
e , x>0
f (x) = 2
0, otherwise
1. What is the probability that a battery lifetime is less than 200 hours?
Z 2
1 −x/2
P (X ≤ 2) = e dx
0 2
2
1 1 −x/2
= e
−1/2 2
0
2
= −e−x/2

0
−2/2
= −e + e0
= 0.6321206
2. What is the probability that a battery lifetime is greater than 400 hours?
STAT600 Page 98
3. What is the probability that a battery lifetime is less than 200 hours or greater
than 400 hours?
4. What is the probability that a battery lasts more than 300 hours, given that it has
already been in use for more than 200 hours?
STAT600 Page 99
5. Show that the function 21 e−x/2 , x > 0 is probability density function?
f (x)
Recall that: ef (x) dx = fe 0 (x) + C.
R
5.1.3 Distribution Function
The (cumulative) distribution function (cdf) of a continuous random

variable X is: Z x
F (x) = P (X ≤ x) = f (v) dv
−∞
0
Notice that F (x) = f (x)
A distribution function F (x) has the following properties:
1. lim F (x) = 0
x→−∞
2. lim F (x) = 1
x→∞
F (b).

h→0
STAT600 Page 100

5.1.4 Properties of Continuous Random Variables
The definitions of the density function and distribution function lead to the following
properties of a continuous random variable X:
• P (X > x) = 1 − F (x), x∈R

Rb
• P (X = b) = P (b ≤ X ≤ b) = b f (x)dx = 0
R∞
• −∞ f (x)dx = 1
P (a ≤ X ≤ b) = P (a < X ≤ b)
= P (a ≤ X < b)
= P (a < X < b)
Z b
= f (x)dx
a
= F (b) − F (a), a≤b
The cumulative distribution function(cdf) of X is the area under the density function for
values less than or equal to x. The definition stated above gives the relationship:
d
f (x) = F (x) = F 0 (x)
dx
Note that under the definition of the probability density function above, the probability
that a continuous random variable X equals a given value b exactly, is 0, i.e.
Z b
P (X = b) = P (b ≤ X ≤ b) = f (x)dx = 0
b
Therefore the probabilities of a continuous random variable do not change if strict (<, >)
and nonstrict (≤, ≥) inequalities are interchanged. Note: this is not the case for discrete
random variables.
Example 5.4 Let X be the lifetime (in hours) of a lightbulb and the pdf of X be:
(
0 if x < 0
f (x) = −0.001x
0.001e if x ≥ 0
(Olofsson and Anderson, page 87)
STAT600 Page 101

1. Show that f (x) is a pdf
2. What is the probability that a randomly selected light bulb lasts less than 1000
hours?
3. What is the probability that a randomly selected light bulb lasts less than 100
STAT600 Page 102

hours?
4. What is the probability that a randomly selected light bulb lasts between 100 and
1000 hours?
5. Find a number x such that a lightbulb survives the age x with probability 0.5.
STAT600 Page 103

5.2 Expected Value and Variance
5.2.1 Expected Value
The expected value of a continuous random variable X that has a

probability density function f (x) is given by:
Z ∞
E[X] = xf (x)dx
−∞
Special case:
If X is a non-negative continuous random variable then
Z ∞
E[X] = P (X > x)dx
0
[SY10, p 202]
5.2.2 Variance
For a random variable X with probability density function f (x), the variance
of X is given by:
VAR[X] = E[(X − µ)2 ]

Z ∞
= (x − µ)2 f (x)dx
−∞
= E[X 2 ] − µ2
where µ = E[X].
[SY10, p 202]
Observe that the results for the expected value and variance of a continuous random
P
variable follow
R from the discrete case, whereby p(x) is replaced by f (x)dx and is
replaced by .
The expected value of X can be thought of as the average of the random variable and is
often called the mean of X and is denoted by µ. The variance is often denoted by σ 2 .
STAT600 Page 104

5.2.3 Moments
Let X be a continuous random variable with probability density function

f (x).
R∞
• 1st moment: E[X] = xf (x)dx (i.e. the mean of X).
−∞
R∞
• 2nd moment: E[X 2 ] = x2 f (x)dx
−∞
R∞
• kth moment: E[X k ] = xk f (x)dx
−∞
5.2.4 Expected Value of a Function
If X is a continuous random variable with probability density function f (x)

and g(X) is a real-valued function X, then:
Z ∞
E[g(X)] = g(x)f (x)dx
−∞
The theorems relating to the expectation of discrete random variables apply to the
continuous case as well:
• If Y = a + bX, then E[Y ] = a + bE[X] and
• If Y = a + bX, then VAR[Y ] = b2 VAR[X].
STAT600 Page 105

Example 5.5 Recall the lightbulb has lifetime X where
f (x) = 0.001e−0.001x , x ≥ 0
and
F (x) = 1 − e−0.001x
The expected lifetime of the lightbulb is:
Z ∞ Z ∞
E[X] = xf (x)dx = P (X > x)dx
−∞ 0
Z ∞
= e−0.001x dx
0 −0.001x ∞
e
= −
0.001 0
1
=0−− = 1000
0.001
Example 5.6 Suppose that a random variable X has a probability density function given
by ( 2
x
, −1 < x < 2
f (x) = 3
0, otherwise
[SY10, ex 5.3, p199]
1. Find the distribution function of X.
STAT600 Page 106

2. Sketch the distribution function of X by hand and using R.
1.0
0.8
0.6
F(x)
0.4
0.2
0.0
−2 −1 0 1 2 3
In R:
x <- seq(-2, 3, 0.1)

Fx <- numeric(length(x))
x_range1 <- x < -1

x_range2 <- x >= -1 & x <= 2
x_range3 <- x > 2
Fx[x_range1] <- 0
Fx[x_range2] <- (x[x_range2]^3 + 1)/9
Fx[x_range3] <- 1
plot(x, Fx, type="l")
STAT600 Page 107

5. Find the probability that −1 < X < 1.
6. Find the probability that 1 < X < 3.
7. Find the probability that X ≤ 1 given X ≤ 1.5.
STAT600 Page 108

1 −x/2
Example 5.7 RecallR ∞the battery lifetime X has f (x) = 2 e , x > 0. Find E[X] by
integrating E[X] = −∞ xf (x)dx.
Hint: you will need to use integration by parts.

Z Z
Integration by parts: udv = uv − vdu.
109
Chapter 6
Continuous Distributions
References:
[Ros13, Chapter 5]
[SY10, Chapter 5]
110
6.1 Uniform Distribution
6.1.1 Definition
The random variable X has a uniform distribution on the interval (a, b)

if its density function is given by:
(
1
b−a
, a≤x≤b
f (x) =
0, otherwise.
The distribution function of X is:


0
 for x < a
x−a
F (x) = , a≤x≤b
 b−a
1 for x > b

Mean: E[X] = (a+b)

2
(b−a)2
Variance: VAR[X] = 12
Density Function Distribution Function

1.0
0.30
0.8
0.20
0.6
F(x)
f(x)
0.4
0.10
0.2
0.00
0.0
0 2 4 6 8 10 0 2 4 6 8 10
x x
111
6.1.2 Uniform Distribution in R
dunif(x, min, max)
punif(x, min, max)
qunif(Fx, min, max)
runif(numSims, min, max)
Example 6.1 Let X ∼ U (0, 2). Find P (X ≤ 0.5).
(0.5-0)/(2-0)
## [1] 0.25
punif(0.5, 0, 2)
## [1] 0.25
6.1.3 Examples
Example 6.2 A farmer living in western Nebraska has an irrigation system to provide
water for crops, primarily corn, on a large farm. Although he has thought about buying
a backup pump, he has not done so. If the pump fails, delivery time X for a new pump
to arrive is uniformly distributed over the interval from 1 to 4 days. The pump fails. It is
a critical time in the growing season in that the yield will be greatly reduced if the crop
is not watered within the next 3 days. [SY10, p. 213, ex 5.7]
1. Assuming that the pump is ordered immediately and that installation time is
negligible, what is the probability that the farmer will suffer major yield loss?
2. Sketch the density function and distribution function of X
STAT600 Page 112

Example 6.3 Derive the distribution function of the uniform random variable.
6.1.4 Properties of the Uniform Distribution

For a subinterval (c, d) of (a, b), then:
Z d
1
P (c ≤ X ≤ d) = dx
c b−a
d
x
=
b − a c
d−c
=
b−a
Special case: uniform [0, 1] random variable.

(
1 0<x<1
f (x) =
0 otherwise
STAT600 Page 113

6.2 Exponential distribution
6.2.1 Definition
A random variable X has an exponential distribution, X ∼ Exp(λ), if

and only if it has density function:
(
λe−λx where x ≥ 0, λ > 0
f (x) =
0 otherwise

(
0, x<0
F (x) = R x −λu
0
λe du = 1 − e−λx , x≥0
Mean: E[X] = 1/λ

Standard Deviation: STD(X) = 1/λ
Variance: VAR[X] = 1/λ2
Example 6.4 Applications of Exponential Random Variables
• Waiting time between the arrival of successive customers
• Time to failure of an electronic component.
• Time between successive emissions of particles from a radioactive substances
Density Function Distribution Function

2.0
1.0
0.8
1.5
0.6
F(x)
f(x)
1.0
0.4
0.5
0.2
0.0
0.0
0 2 4 6 8 10 0 2 4 6 8 10
x_seq x_seq
STAT600 Page 114

6.2.2 Exponential Distribution in R
dexp(x, lambda)
pexp(x, lambda)
qexp(Fx, lambda)
rexp(numSims, lambda)
Example 6.5 Consider an exponential random variable X with mean 2. Find P (X > 3).
lambda <- 1/2

1 - (1 - exp(-lambda*3))
## [1] 0.22313
1 - pexp(3, lambda)
## [1] 0.22313
6.2.3 Examples
Example 6.6 A sugar refinery has three processing plants, all of which receive raw sugar
in bulk. The amount of sugar that one plant can process in one day can be modeled as
having an exponential distribution with a mean of 4 tons for each of the three plants.
[SY10, p. 220, ex 5.9]
1. What is the probability that a plant will process more than 4 tons on a given day?
2. If the plants operate independently, find the probability that exactly two of the
STAT600 Page 115

three plants will process more than 4 tons on a given day.
3. If the plants operate independently, find the probability that an odd number of
plants will process more than 4 tons on a given day.
4. Which of the following pdfs corresponds to the amount of sugar processed by one
plant?
STAT600 Page 116

Density Function Density Function
0.25
4
0.20
3
dexp(x, 0.25)
0.15
dexp(x, 4)
2
0.10
1
0.05
0.00
0
0 5 10 15 20 0 1 2 3 4
x x
STAT600 Page 117

6.2.4 Properties
Alternative Parameterization
The exponential distribution is sometimes described using the mean rather than rate.
A random variable X has an exponential distribution with mean θ if it has density
function: (
1 −x/θ
e where x ≥ 0, θ > 0
f (x) = θ
0 otherwise
(
0, x<0
F (x) =
1 − e−x/θ , x≥0
Mean: E[X] = θ Standard Deviation: STD(X) = θ Variance: VAR[X] = θ2
Memoryless Property
If a random variable X has an exponential distribution, then for all t > 0

and s > 0
P (X > s + t|X > s) = P (X > t)
This property is called the memoryless property.
Example 6.7 Suppose a customer has been waiting in line to be served for x time units
and would like to know the probability that he or she will be required to wait a further t
units of time. The calculation of this probability does not depend on the length of time
already spent waiting, i.e. the distribution does not “remember” how long the customer
has been waiting.
Example 6.8 Show that an exponential random variable with cdf F (x) = 1−e−λx , x ≥ 0
has the memoryless property.
STAT600 Page 118

6.3 Normal distribution
6.3.1 Definition
A random variable X is Normally distributed, X ∼ N (µ, σ 2 ), if and only

if it has probability density function:
1 1 x−µ 2
fX (x) = √ e− 2 ( σ ), x ∈ R, σ > 0
2πσ 2
Mean: µ
Variance: σ 2
Special case: The standard normal distribution, denoted Z, has a mean of

µ = 0 and a standard deviation of σ = 1,
Z ∼ N (0, 1)
X ~ Normal(µ,σ2) X ~ Normal(µ,σ2)
0.4
1.0
Normal(0, 1)
Normal(0, 9)
Normal(4, 25)
Normal(10, 25)
0.8
0.3
0.6
F(x)
0.2
f(x)
0.4
0.1
0.2
Normal(0, 1)
Normal(0, 9)
Normal(4, 25)
0.0
0.0
Normal(10, 25)
−10 0 10 20 −10 0 10 20
x x
Example 6.9 Applications of the Normal distribution
• Experimental errors in scientific measurements
• Test scores on achievement tests
• Heights of individuals selected at random from a population
• Approximation to other distributions
STAT600 Page 119

6.3.2 Normal Distribution in R
dnorm(x, mean, sd)
pnorm(x, mean, sd)
qnorm(Fx, mean, sd)
rnorm(numSims, mean, sd)
Example 6.10 Consider a normal random variable X with mean 3 and variance 0.2.
Find P (X < 2.8).
pnorm(2.8, 3, sqrt(0.2))
## [1] 0.32736
6.3.3 Relationship between Normal and Standard Normal

Suppose X ∼ N (µ, σ 2 ) and Z ∼ N (0, 1).
The random variable X can be transformed to a standard Normal random variable as
follows:
X −µ
Z=
σ
Then
x−µ
P (X ≤ x) = P (Z ≤ ) = P (Z ≤ z)
σ
P (µ − 1σ ≤ X ≤ µ + 1σ) ≈ 0.68 1 standard deviation from mean
P (µ − 2σ ≤ X ≤ µ + 2σ) ≈ 0.95 2 standard deviations from mean
P (µ − 3σ ≤ X ≤ µ + 3σ) ≈ 0.997 3 standard deviations from mean
STAT600 Page 120

R code to create plot
mu <- 0
sigma <- 1
x <- seq(-4, 4, 0.001)
cols <- c(gray.colors(4, start = 0.1, end = 0.9, gamma = 2.2, alpha = NULL),
gray.colors(4, start = 0.9, end = 0.1, gamma = 2.2, alpha = NULL))
colText <- c("black")
#Create plot
plot(x, dnorm(x, mu, sigma), type="l", lwd=3, ylab="f(x)", yaxt="n", xaxt="n")
#Add axis labels

axis(side=1, at=-3:3, labels=c(
expression(mu ~-3*sigma), expression(mu ~-2 *sigma), expression(mu~-1 *sigma),
expression(mu),
expression(mu ~+1*sigma), expression(mu ~+2 *sigma), expression(mu ~+3 *sigma)))
#Compute probabilities
Fx <- pnorm(-4:4, mu, sigma)
#For 4 standard deviations either side of the mean

for(i in -4:4){
j <- i+1
k <- 5
ijvals <- seq(i, j, 0.001)
#Shade areas
polygon(x = c(i,ijvals , j),
y = c(0, dnorm(ijvals, 0, 1), 0),
col=cols[i+k], border="white")
#Add probability of each area as text

text((i+0.5), 0.05, labels=round(diff(Fx)[i+k],digits=3), col=colText[i+k])
}
f(x)
0.001 0.021 0.136 0.341 0.341 0.136 0.021 0.001
µ − 3σ µ − 2σ µ − 1σ µ µ + 1σ µ + 2σ µ + 3σ
STAT600 Page 121

6.3.4 Properties of the Normal Distribution
The normal distribution is a commonly used distribution. It is referred to as being
“bell”-shaped. It is symmetric about the mean µ, with half the probability falling on
either side of µ. The form of the probability density function means that unlike some
other distributions, the probabilities cannot be expressed with a closed-form function.
Numerical integration techniques have been used to evaluate the cumulative distribution
function for the normal distribution. The resulting probabilities for the standard normal
distribution are provided in tables.
The cumulative distribution function is not available in closed form, but can be
evaluated
• Converting X ∼ N (µ, σ 2 ) to a standard normal random variable Z ∼ N (0, 1) and

using tables.
• using software such as R. In R, the distribution function can be calculated using

the command: pnorm(x, mu, sigma)
Properties
• Area under the pdf f (x) is equal to 1

• Symmetric: area to left of mean µ = 0.5 = area to right of mean µ
• To compute P (X ≤ x) find the area to the left of x.
• Because X is a continuous random variable, P (X = x) = 0.
6.3.5 Examples
Example 6.11 A machining operation produces steel shafts where diameters have a
normal distribution with a mean of 1.005 inches and a standard deviation of 0.01 inch.
(Adapted from [SY10, p. 351, ex 5.87])
1. What is the probability that a randomly selected steel shaft will be less than 1.005
inches in diameter?
STAT600 Page 122

2. What is the probability that a randomly selected steel shaft will be less than 1.02
inches in diameter? a) estimate the probability using the properties of the normal
distribution, then b) use the relevant R output below to determine the probability.
pnorm(seq(0.95, 1.05, 0.01), 1.005, 0.01)
## [1] 0.00000001899 0.00000339767 0.00023262908 0.00620966533

## [5] 0.06680720127 0.30853753873 0.69146246127 0.93319279873
## [9] 0.99379033467 0.99976737092 0.99999660233
dnorm(seq(0.95, 1.05, 0.01), 1.005, 0.01)
## [1] 0.00001077 0.00159837 0.08726827 1.75283005

## [5] 12.95175957 35.20653268 35.20653268 12.95175957
## [9] 1.75283005 0.08726827 0.00159837
pnorm(seq(0.95, 1.05, 0.01), 0.01, 1.005)
## [1] 0.82519 0.82774 0.83027 0.83277 0.83525 0.83771 0.84014

## [8] 0.84255 0.84493 0.84729 0.84962
3. Specifications call for diameters to fall within the interval 1.00 ± 0.02 inches. What
percentage of the output of this operation will fail to meet specifications?
STAT600 Page 123

4. What should be the mean diameter of the shafts produced to minimize the fraction
that fail to meet specifications?
5. Numerically verify your answers to Q1 – Q3.
STAT600 Page 124

6.4 Gamma distribution
6.4.1 Gamma Function
The Gamma function is defined as follows:

Z ∞
Γ(α) = y α−1 e−y dy
0
The Gamma function can be shown to have the following properties:
(i) Γ(1) = 1
(ii) Γ(α) = (α − 1)Γ(α − 1) for α > 1

√
(iii) Γ(0.5) = π
(iv) Γ(n) = (n − 1)! for any positive integer n
A related function is the incomplete Gamma function γ(α, y) =

Z y α−1 −t
t e
dt
t=0 Γ(α)
In R, the gamma function can be computed as follows

gamma(1)
## [1] 1
gamma(10)
## [1] 362880
#Inspect properties
factorial(9)
## [1] 362880
9*gamma(9)
## [1] 362880
STAT600 Page 125

6.4.2 Definition - Gamma Distribution
A random variable X has a Gamma distribution, X ∼ Gamma(α, λ), with

shape parameter α > 0 and rate parameter λ > 0 if it has a density function:
( α−1 α −λx
x λ e
Γ(α)
x≥0
f (x) =
0 x<0
Mean: E[X] = α/λ √

Standard deviation: STD(X) = α/λ
Variance: VAR[X] = α/λ2
X ~ Gamma(α,λ) X ~ Gamma(α,λ)
1.2
1.0
Gamma(0.5, 1)
Gamma(1, 1)
Gamma(2, 1)
1.0
Gamma(4, 1)
0.8
0.8
0.6
F(x)
f(x)
0.6
0.4
0.4
0.2
0.2
Gamma(0.5, 1)
Gamma(1, 1)
Gamma(2, 1)
0.0
0.0
Gamma(4, 1)
0 5 10 15 20 0 5 10 15 20
x x
Example 6.12 Applications of the Gamma Distribution
• Component lifetimes (a few fail early, many have an “average” lifetime, and a few
last a very long time)
• Rainfall over a given period of time
• Survival times
• Fish lengths
• Time until failure of the αth independent exponentially distributed component
STAT600 Page 126

6.4.3 Gamma Distribution in R
dgamma(x, alpha, lambda)
pgamma(x, alpha, lambda)
qgamma(Fx, alpha, lambda)
rgamma(numSims, alpha, lambda)
Example 6.13 Consider a gamma random variable X with shape parameter 2 and rate
parameter 3. Find P (X < 2.5).
pgamma(2.5, 2, 3)
## [1] 0.9953
6.4.4 Properties of the Gamma Distribution

Distribution Function
The cumulative distribution function is not available in closed form, but can be
evaluated numerically: Z x α−1 α −λt
t λ e
F (x; α, λ) = dt
0 Γ(α)
In R, the distribution function can be calculated using the command:
pgamma(x, shape = alpha, rate = lambda) (default)
or
pgamma(x, shape = alpha, scale = 1/lambda)
Example 6.14 The weekly downtime X (in hours) for a certain industrial machine has
approximately a gamma distribution with α = 3.5 and λ = 2/3. (Adapted from [SY10,
p. 232, ex 5.67])
1. Find the expected value and variance of X
2. What is the probability that the weekly downtime will exceed 5 hours? Use the
relevant R code below to compute your answer.
STAT600 Page 127

pgamma(seq(0, 6, 0.5), 3.5, 2/3)
## [1] 0.000000 0.001421 0.012474 0.040160 0.085967 0.147449

## [7] 0.220223 0.299434 0.380644 0.460251 0.535609 0.604982
## [13] 0.667406
pgamma(seq(0, 6, 0.5), 3.5, scale=1.5)
## [1] 0.000000 0.001421 0.012474 0.040160 0.085967 0.147449

## [7] 0.220223 0.299434 0.380644 0.460251 0.535609 0.604982
## [13] 0.667406
dgamma(seq(0, 6, 0.5), 3.5, 2/3)
## [1] 0.0000000 0.0092207 0.0373744 0.0737969 0.1085476

## [6] 0.1358721 0.1535743 0.1617785 0.1618586 0.1556870
## [11] 0.1451714 0.1320073 0.1175722
3. Suppose the loss L (in dollars) to the industrial operation as a result of this
downtime is given by
L = 30X + 2X 2
Find the expected value of L.
STAT600 Page 128

Relationship to Other Distributions
The Gamma distribution is closely related to a number of other distributions.
• If α = 1, then this gives the exponential distribution with parameter λ.
• If α is a positive integer, then this gives the Erlang distribution.
• If α = n/2 where n is a positive integer and λ = 1/2, then this gives a chi-square
distribution.
Sums of Gamma Random Variables
The sum of n independent gamma random variables also has a gamma

distribution. If Xi P
are independent gamma random variables with parameters
αi and λ and Y = ni=1 Xi , then
Xn
Y ∼ Gamma( αi , λ)
i=1
Pn Pn
i=1 αi i=1 αi
E[Y ] = VAR[Y ] =
λ λ2
Example 6.15 A certain electronic system has a life length of X1 , which has an expo-
nential distribution with a mean of 450 hours. The system is supported by an identical
backup system that has a life length of X2 . The backup system takes over immediately
when the system fails. Assume that the systems operate independently. [SY10, ex 5.11,
p229]
1. Find the probability distribution and expected value for the total life length of the
primary and backup systems.
2. Verify using simulation
STAT600 Page 129

require(graphics)
EX = 450
lambda <- 1/EX
alpha <- 2
#Simulate sum of 2 exponential random variables

numSims <- 1e5
y_sim <- replicate(numSims,sum(rexp(2, lambda)))
hist(y_sim, prob=TRUE, xlab="y (total length)", ylim=c(0, 8e-4),
main= bquote(atop("Comparison of Simulated and Theoretical Total Life Length",
"Y="~X[1]+X[2]~","~X[i]~"~Exp(1/450)")))
mean(y_sim)
## [1] 900.86
var(y_sim)
## [1] 408897
#Compare to Gamma(2, lambda)

y <- seq(0, 6000, 0.1)
lines(y, dgamma(y, alpha, lambda), col="red")
legend("topright", legend=bquote("Y~Gamma(2,"~.(round(lambda,4))~")"), lty=1, col="red")
alpha/lambda
## [1] 900
alpha/lambda^2
## [1] 405000
STAT600 Page 130

Comparison of Simulated and Theoretical Total Life Length
Y= X1 + X2 , Xi ~Exp(1/450)
0.0008
Y~Gamma(2, 0.0022 )
0.0006
Density
0.0004
0.0002
0.0000
0 1000 2000 3000 4000 5000 6000
y (total length)
STAT600 Page 131

Example 6.16 Use simulation to verify that the sum of 4 independent and identically
distributed gamma random variables, Xi ∼ Gamma(3, 0.01) has a gamma distribution
with parameters α = 3 × 4 = 12 and λ = 0.01.
lambda <- 0.01

alpha <- 3
n <- 4
#Simulate sum of n gamma random variables

numSims <- 1e5
y_sim <- replicate(numSims,sum(rgamma(n, alpha, lambda)))
hist(y_sim, prob=TRUE, xlab="y (total length)", ylim=c(0, 1.5e-3),
main= bquote(atop("Simulated and Theoretical Gamma Distribution",
"Y="~sum(X[i], i==1, n)~","~ n==.(n)~","~
X[i]~"~Gamma("~.(alpha)~","~.(round(lambda,4))~")")))
mean(y_sim); var(y_sim)
## [1] 1199.3
## [1] 119983
#Compare to Gamma(2, lambda)

y <- seq(0, 10000, 0.1)
lines(y, dgamma(y, alpha*n, lambda), col="red")
legend("topright", legend=bquote("Y~Gamma("~.(n*alpha)~","~.(round(lambda,4))~")"),
lty=1, col="red")
n*alpha/lambda; n*alpha/lambda^2
## [1] 1200
## [1] 120000
Simulated and Theoretical Gamma Distribution

n
Y= ∑ Xi , n = 4 , Xi ~Gamma( 3 , 0.01 )
i=1
0.0015
Y~Gamma( 12 , 0.01 )
0.0010
Density
0.0005
0.0000
500 1000 1500 2000 2500 3000
y (total length)
STAT600 Page 132

Example 6.17 Show that variable X ∼ Gamma(1, λ) is equivalent to an exponential
random variable with rate λ > 0.

The gamma distribution is sometimes described using a scale parameter, which is defined
as the reciprocal of the rate parameter.
A random variable X has a Gamma distribution, X ∼ Gamma(α, β), with α > 0, β >
0 if it has a density function:
( α−1 −x/β
x e
β α Γ(α)
x≥0
f (x) =
0 x<0
Mean: E[X] = αβ √
Standard deviation: STD(X) = αβ
Variance: VAR[X] = αβ 2
If the gamma distribution is parametrised in this way, then in R the following command
can be used:
pgamma(x, shape = alpha, scale = beta)
STAT600 Page 133

Verify with R
alpha <- 2
lambda <- 4
beta <- 1/lambda
#Check the density is equal

x <- seq(0, 5, 0.1)
all(dgamma(x, alpha, lambda) == dgamma(x, alpha, scale=beta))
## [1] TRUE
#Check using simulation

x1 <- rgamma(1e4, alpha, lambda)
x2 <- rgamma(1e4, alpha, scale=beta)
mean(x1); mean(x2);
## [1] 0.50308
## [1] 0.49513
alpha/lambda; alpha*beta
## [1] 0.5
## [1] 0.5
#WRONG! Notice the means are not similar

x3 <- rgamma(1e4, alpha, beta)
mean(x3); alpha*beta
## [1] 7.9115
## [1] 0.5
STAT600 Page 134

6.5 Weibull distribution
6.5.1 Definition
A random variable X has a Weibull distribution, X ∼ Weibull(α, θ) with

shape parameter α > 0 and scale parameter θ > 0, if it has density function:
( α−1 α
α
θ
x
θ
e−(x/θ) x ≥ 0
f (x) =
0 x<0
and a cumulative distribution function:

( α
1 − e−(x/θ) x≥0
F (x; θ, α) =
0 x<0
Mean: E[X] = θΓ 1 + α1

2
Variance: VAR[X] = θ2 Γ 1 + α2 − Γ 1 + α1

X ~ Weibull(α,θ) X ~ Weibull(α,θ)
2.0
1.0
Weibull(0.5, 1)
Weibull(1, 1)
Weibull(2, 1)
Weibull(4, 1)
0.8
1.5
0.6
F(x)
f(x)
1.0
0.4
0.5
0.2
Weibull(0.5, 1)
Weibull(1, 1)
Weibull(2, 1)
0.0
0.0
Weibull(4, 1)
0 1 2 3 4 5 6 0 1 2 3 4 5 6
x x
STAT600 Page 135

Example 6.18 Applications of the Weibull distribution
• Reliability
• Lifetime of a system
History. Waloddi Weibull
Waloddi Weibull (1887 – 1979) - Swedish engineer,

scientist and mathematician. This distribution was
named after him following publication of some signifi-
cant papers on the subject.
http://www.barringer1.com/weibull_bio.htm
6.5.2 Weibull Distribution in R

dweibull(x, shape = alpha, scale = theta)
pweibull(x, shape = alpha, scale = theta)
qweibull(Fx, shape = alpha, scale = theta)
rweibull(numSims, shape = alpha, scale = theta)
Example 6.19 Consider a Weibull random variable X with shape parameter α = 4 and
scale parameter θ = 3. Find P (X < 2.5).
1 - exp(-(2.5/3)^4)
## [1] 0.38261
pweibull(2.5, 4, 3)
## [1] 0.38261
STAT600 Page 136

6.5.3 Examples
Example 6.20 Fatigue life (in hundreds of hours) for a certain type of bearing has
approximately a Weibull distribution with α = 2 and θ = 3. (Adapted from [SY10, p.
266, ex 5.119])
1. Find the probability that a randomly selected bearing of this type will fail in less
than 100 hours.
2. Find the expected value of the fatigue life for these bearings.
3. Verify your answers using simulation.
STAT600 Page 137

6.5.4 Properties of the Weibull Distribution
• If α = 1, then this gives the exponential distribution with rate parameter 1/θ.
1/α
• Y ∼ Exp(λ) and X = Y 1/α then X ∼ Weibull( λ1 , α)
Example 6.21 Suppose that Y ∼ Exp(1/3) and X = Y 1/4 . Generate 10000 random
variables from the exponential distribution, compute X, and show graphically that the
density of X is consistent with a Weibull distribution with θ = 31/4 and α = 4, i.e.
X ∼ Weibull(31/4 , 4).
lambda <- 1/3

alpha <- 4
theta <- (1/lambda)^(1/alpha)
#Compare using simulation

y <- rexp(1e4, lambda)
hist(y, prob=TRUE, main=paste("Y~Exp(",round(lambda,3),")",sep=""))
x <- y^(1/alpha)
hist(x, prob=TRUE, main =
bquote("Y~Exp("~.(round(lambda, 4))~"),"~X == Y^{ .(1/alpha)}))
x_seq <- seq(0, 6, 0.01)

lines(x_seq, dweibull(x_seq, alpha, theta))
Y~Exp(0.333) Y~Exp( 0.3333 ), X = Y0.25

0.25
1.0
0.20
0.8
0.15
Density
Density
0.6
0.10
0.4
0.05
0.2
0.00
0.0
0 5 10 15 20 25 30 35 0.0 0.5 1.0 1.5 2.0
y x
STAT600 Page 138

Example 6.22 Consider the following Weibull distributions: X ∼ Weibull(3, 2) and
Y ∼ Weibull(1, 2). Use R to investigate what happens if Y = X/3.
alpha <- 2
theta <- 3
pweibull(seq(0, 10, 2), shape = alpha, scale = theta)
## [1] 0.00000 0.35882 0.83099 0.98168 0.99918 0.99999
pweibull(seq(0, 10, 2)/theta, shape = alpha, scale = 1)
## [1] 0.00000 0.35882 0.83099 0.98168 0.99918 0.99999
curve(pweibull(x, shape = alpha, scale = theta), from = 0, to = 10)

curve(pweibull(x/theta, shape = alpha, scale = 1), from = 0, to = 10)
1.0
1.0
pweibull(x/theta, shape = alpha, scale = 1)
pweibull(x, shape = alpha, scale = theta)
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 2 4 6 8 10 0 2 4 6 8 10
x x
This example demonstrates the behaviour of the scale parameter, in particular, that
F (x, θ, α, ) = F (x/θ, 1, α)
STAT600 Page 139

6.6 Beta distribution
6.6.1 Definition
A random variable X has a Beta distribution, X ∼ Beta(α, β) with α > 0

and β > 0, if it has density function:
(
Γ(α+β) α−1
Γ(α)Γ(β)
x (1 − x)β−1 0 ≤ x ≤ 1
f (x) =
0 elsewhere
α
Mean: E[X] = α+β
αβ
Variance: VAR[X] = (α+β)2 (α+β+1)
Special case: The uniform distribution is a special case of the beta distribution
with α = 1 and β = 1.
X ~ Beta(α,β) X ~ Beta(α,β)
3.0
1.0
Beta(2, 2)
Beta(3, 3)
Beta(5, 3)
2.5
Beta(1, 1)
0.8
2.0
0.6
f(x)
f(x)
1.5
0.4
1.0
0.2
0.5
Beta(2, 2)
Beta(3, 3)
Beta(5, 3)
0.0
0.0
Beta(1, 1)
−0.5 0.0 0.5 1.0 1.5 2.0 −0.5 0.0 0.5 1.0 1.5 2.0
x x
Example 6.23 Applications of the beta distribution
• Proportion of solid mass in sintered linde copper
• Values which can occur over a finite interval
STAT600 Page 140

6.6.2 Beta Distribution in R
dbeta(x,alpha, beta)
pbeta(x,alpha, beta)
qbeta(Fx,alpha, beta)
rbeta(numSims, alpha, beta)
Example 6.24 Consider a Beta random variable X with parameter α = 2 and β = 3.

Find P (X < 0.5).
pbeta(0.5, 2, 3)
## [1] 0.6875
6.6.3 Properties of the Beta distribution

The cumulative distribution function can be found by integrating

0R
 x<0
x Γ(α+β) α−1
F (x; α, β) = 0 Γ(α)Γ(β)
t (1 − t)β−1 dt 0≤x≤1

1 x>1

As α and β increase, the integration becomes more difficult.
In R, the distribution function can be calculated using the command:

pbeta(x, alpha, beta)
6.6.4 Examples
Example 6.25 A gasoline wholesale distributor uses bulk storage tanks to hold a fixed
supply. The tanks are filled every Monday. Of interest to the wholesaler is the proportion
of the supply sold during the week. Over many weeks, this proportion has been observed
to match fairly well a beta distribution with α = 4 and β = 2. (Adapted from [SY10, p.
256, ex 5.19]
1. Find the expected value of the proportion of the supply sold each week. Verify your
answer using simulation.
STAT600 Page 141

2. Is it highly likely that the wholesaler will sell at least 90% of the stock in a given
week?
X ∼ Beta(4, 2)
Z x
Γ(α + β) α−1
P (X > x) = 1 − t (1 − t)β−1 dt
0 Γ(α)Γ(β)
Z 1
Γ(4 + 2) 4−1
P (X > 0.9) = t (1 − t)2−1 dt
0.9 Γ(4)Γ(2)
Z 1
120 3
= t (1 − t)dt
0.9 6 × 1
Z 1
= 20 (t3 − t4 )dt
0.9
4 1
t t5
= 20 −
4 5 0.9
4
0.95

1 1 0.9
= 20 − − −
4 5 4 5
= 0.08146
Check computation in R
1-pbeta(0.9, 4, 2)
## [1] 0.08146
Answer: No, it is unlikely that 90% of the stock will be sold, since P (X > 0.9) =
0.08146 is small.
STAT600 Page 142

3. What is the probability that no more than 50% of the stock is sold in a given week?
Use the relevant following R code below to compute the required probability.
pbeta(seq(0, 1, 0.1), 4, 2)
## [1] 0.00000 0.00046 0.00672 0.03078 0.08704 0.18750 0.33696

## [8] 0.52822 0.73728 0.91854 1.00000
dbeta(seq(0, 1, 0.1), 4, 2)
## [1] 0.000 0.018 0.128 0.378 0.768 1.250 1.728 2.058 2.048
## [10] 1.458 0.000
qbeta(seq(0, 1, 0.1), 4, 2)
## [1] 0.00000 0.41611 0.50981 0.57799 0.63501 0.68619 0.73443

## [8] 0.78197 0.83139 0.88777 1.00000
6.7 Distributions in R: Summary

d = density/mass function
p = cumulative distribution function (cdf)
q = quantile function (also called inverse cdf)
r = random number
Key Continuous Distributions

punif — Uniform distribution
pexp — Exponential distribution
pgamma — Gamma distribution
pnorm —Normal distribution
pweibull — Weibull distribution
pbeta — Beta distribution
Summary of distributions in R and the required parameters:

http://cran.r-project.org/doc/manuals/R-intro.html#Probability-distributions
STAT600 Page 143

6.8 Simulating Continuous Distributions
6.8.1 Inverse transformation method
Let X be a continuous random variable with distribution function F (x). To

simulate a value x from this distribution:
1. Generate a random number u from U ∼ Uniform(0, 1)
2. Set u = F (x)
3. Solve for x
This method is only appropriate when we know the functional form of F (x), as we need
it to be able to solve u = F (x) for x.
Graphical explanation: To sample from X ∼ Exp(1), generate random number

U ∼ Unif(0, 1), read across to F (x) and then down to x. The value of x is the sample
from the exponential distribution.
X ~ Exponential(λ)
0.8
0.6
F(x)
0.4
0.2
0 1 2 3 4 5
STAT600 Page 144

Uniform:
x−a
For X ∼ Unif(a, b) with F (x) = b−a
, a≤x≤b
x−a
u=
b−a
u(b − a) = x − a
x = u(b − a) + a, 0 ≤ u < 1
Exponential:
For X ∼ Exp(λ) with F (x) = 1 − e−λx , x ≥ 0.
u = 1 − e−λx
1 − u = e−λx
ln(1 − u) = −λx
ln(1 − u)
x=− , 0≤u<1
λ
Note: the qexp() also returns this function.

Example 6.26 Let X be a random variable with

0,
 x<0
2
F (x) = x4 , 0≤x≤2

1 x>2

Let u be a unit random number and set u = F (x). Solve for x and use the result to
determine the value of x if u = 0.51679.
6.8.2 Rejection Method

For some distributions, such as the normal distribution, the functional form of the cdf is
not known. In this case the inverse transformation method cannot be used. The rejection
method can be used in this situation as it requires only the pdf, not the cdf. In this case,
the rejection method can be used.
We will not cover the rejection method in this course.
145
Chapter 7
Reliability
An Application of Continuous
Distributions
References:
[HKM, Chapter 10]
146
7.1 Introduction to Reliability
Recent reliability failures
• Airbags – recall
• Samsung Galaxy Note 7
The probability that an item does not fail during some period of time t, is often used as
a measure of reliability.
7.1.1 Reliability Function
If T denotes the time to failure of a system, then the reliability at time t

is denoted R(t).
R(t) = P (T > t)
The reliability at time t, R(t) is called the reliability function and can be
expressed in terms of the distribution function of T , i.e.
R(t) = 1 − P (T ≤ t) = 1 − F (t)
[HKM, pp. 373 – 374]
Example 7.1 If R(t) = 0.999 then on average only 1 item in every 1000 will fail during
t time units.
147
7.2 Mean Time to Failure
7.2.1 Definition
Let T be the time to failure of a system with lifetime distribution F (t),

density f (t), and reliability function R(t) = 1 − F (t), t ≥ 0.
The mean time to failure is
Z ∞ Z ∞
µ= tf (t)dt = R(t)dt
0 0
7.2.2 Integration: Recap

R
Example 7.2 Find ef (x) dx
ef (x)
Z
ef (x) dx = +C
f 0 (x)
R∞
Example 7.3 Find 0
e−λx dx, λ > 0
∞ ∞
e−λx
Z
−λx
e dx =
0 −λ 0

1 1
= 0− =
−λ λ
STAT600 Page 148

Example 7.4 Let T denote the lifetime of a lightbulb in hours, and let T ∼ Exp(0.001).
Find R(t) and the MTTF.
7.2.3 Mean Time to Failure for Common Distributions

Distribution F (t) R(t) = 1 − F (t) E(X) =MTTF
1
Exponential 1 − e−λt e−λt
λ
t−a b−t (a + b)
Uniform ,a≤t≤b ,a≤t≤b
b−a b−a 2
Normal pnorm(t, mu, sigma) 1 - pnorm(t, mu, sigma) µ

β β 1
Weibull 1 − e−(t/α) e−(t/α) αΓ 1 +
β
Example 7.5 Compute Γ(5) .

gamma(5)
## [1] 24
factorial(4)
## [1] 24
4*3*2*1
## [1] 24
STAT600 Page 149

Example 7.6 The lifetime of a car tyre (in days) has a Weibull distribution with
0.5
distribution function F (t) = 1 − e(t/1000) . Find R(t) and the MTTF.
7.2.4 Repairable vs Non Repairable Systems

• Non Repairable Systems: Mean time to failure (MTTF)
• Repairable Systems: Mean time between failures (MTBF)
STAT600 Page 150

7.3 Modelling Reliability of Systems
Systems of consist of a number of components which may be in series or in parallel.
To assess the reliability of the system, typically, the reliability of the components are
measured and then used to compute the reliability of the system.
Image from [HKM, p. 375]
Discussion Name some examples of systems with components in series and in

parallel.
STAT600 Page 151

7.3.1 Series Systems
Discussion If component 1 fails, will this system fail?
Let a system consist of n components with reliability functions

R1 (t), R2 (t), . . . , Rn (t). Assume that components fail independently of one
another. The reliability function of the series system is:
n
Y
RS (t) = Ri (t)
i=1
[HKM, p. 375]
STAT600 Page 152

Example 7.7 The times to failure of three components are exponentially distributed with
means in thousands of hours given by 2, 2.5 and 4 respectively. Assume the components
are arranged in series and fail independently of each other.
1. Draw the system
2. Find the reliability functions of each component, Ri (t), i = 1, 2, 3.
3. Find the reliability function of the system, RS (t).
4. Find the probability that the system lasts more than 1000 hours.
5. Find the mean time to failure for this system.
6. Check your answers using simulation
STAT600 Page 153

Check using simulation
set.seed(12354)
# Define parameters
lambda <- c(0.5, 0.4,0.25)
nSims <- 100000
# Generate random variables representing lifetime of each component

sim_components <- sapply(lambda, function(x)rexp(nSims, x))
head(sim_components)
## [,1] [,2] [,3]

## [1,] 2.55811 0.22013 3.6589
## [2,] 7.02441 1.80964 3.2462
## [3,] 0.54515 2.22227 4.8965
## [4,] 4.95680 1.20275 3.4263
## [5,] 8.36425 13.16991 1.7524
## [6,] 1.98118 0.37026 10.7051
colMeans(sim_components)
## [1] 1.9982 2.5144 4.0080
# Compute minimum lifetime for each simulation

sim_systemlife <- apply(sim_components, 1, min)
head(sim_systemlife)
## [1] 0.22013 1.80964 0.54515 1.20275 1.75244 0.37026
# Estimate P(lasts more than 1000 hours)

table(sim_systemlife > 1)/nSims
##
## FALSE TRUE
## 0.68351 0.31649
#Theoretical value
exp(-1.15*1)
## [1] 0.31664
# Estimate MTTF
mean(sim_systemlife)
## [1] 0.87046
#Theoretical value
1/1.15
## [1] 0.86957
STAT600 Page 154

7.3.2 Parallel Systems
Discussion If component 1 fails, will this system fail?
Let a system consist of n components with reliability functions

R1 (t), R2 (t), . . . , Rn (t). Assume that components fail independently of one
another. The reliability function of the parallel system is:
n
Y
RP (t) = 1 − (1 − Ri (t))
i=1
[HKM, p. 375]
are arranged in parallel and fail independently of each other.
1. Draw the system
2. Find the reliability functions of each component, Ri (t), i = 1, 2, 3.
3. Find the reliability function of the system RP (t).
STAT600 Page 155

R1 (t) = e−0.5t
R2 (t) = e−0.4t
R3 (t) = e−0.25t
RP (t) = 1 − (1 − R1 (t))(1 − R2 (t))(1 − R3 (t))
= 1 − (1 − e−0.5t )(1 − e−0.4t )(1 − e−0.25t )
= 1 − (1 − e−0.5t )(1 − e−0.25t − e−0.4t + e−0.65t )
= 1 − (1 − e−0.25t − e−0.4t + e−0.65t − e−0.5t (1 − e−0.25t − e−0.4t + e−0.65t ))
= 1 − (1 − e−0.25t − e−0.4t + e−0.65t − e−0.5t + e−0.5t e−0.25t + e−0.5t e−0.4t − e−0.5t e−0.65t )
= 1 − (1 − e−0.25t − e−0.4t + e−0.65t − e−0.5t + e−0.75t + e−0.9t − e−1.15t )
= e−0.25t + e−0.4t − e−0.65t + e−0.5t − e−0.75t − e−0.9t + e−1.15t
RP (1) = 0.97131
=
Z ∞
MTTF = RP (t)dt
0
Z ∞
= (e−0.25t + e−0.4t − e−0.65t + e−0.5t − e−0.75t − e−0.9t + e−1.15t )dt
0 −0.25t ∞
e e−0.4t e−0.65t e−0.5t e−0.75t e−0.9t e−1.15t
= + − + − − +
−0.25 −0.4 −0.65 −0.5 −0.75 −0.9 −1.15 0
1 1 1 1 1 1 1
= (0 − ( + − + − − + ))
−0.25 −0.4 −0.65 −0.5 −0.75 −0.9 −1.15
= 5.38666
RP (1) =0.97131, MTTF= 5.38666
STAT600 Page 156

set.seed(12354)
lambda <- c(0.5, 0.4,0.25)
nSims <- 100000
## [,1] [,2] [,3]

## [1,] 2.55811 0.22013 3.6589
## [2,] 7.02441 1.80964 3.2462
## [3,] 0.54515 2.22227 4.8965
## [4,] 4.95680 1.20275 3.4263
## [5,] 8.36425 13.16991 1.7524
## [6,] 1.98118 0.37026 10.7051
## [1] 1.9982 2.5144 4.0080
sim_systemlife <- apply(sim_components, 1, function(x) max(x[1], x[2], x[3]))

## [1] 3.6589 7.0244 4.8965 4.9568 13.1699 10.7051

##
## FALSE TRUE
## 0.02896 0.97104
#Theoretical value
1 - (1-exp(-0.5*1))*(1- exp(-0.4*1))*(1- exp(-0.25*1))
## [1] 0.97131
# Estimate MTTF
## [1] 5.4024
#Theoretical value
0 - (1/(-0.25) + 1/(-0.4) - 1/(-0.65) + 1/(-0.5) - 1/(-0.75) - 1/(-0.9) + 1/(-1.15))
## [1] 5.3867
STAT600 Page 157

7.3.3 Complex Systems
Systems are often comprised of subsystems, which may consist of components in series
and in parallel. To determine the reliability of complex systems, compute the reliability
of the subsystems and then combine.
Discussion If component 1 fails, will this system fail? If component 3 fails, will
this system fail?
are arranged as shown in the following diagram and fail independently of each other.
1. Find the reliability function for the system, R(t).
STAT600 Page 158

R1 (t) = e−0.5t
R2 (t) = e−0.4t
R3 (t) = e−0.25t
R12 (t) = 1 − (1 − R1 (t))(1 − R2 (t))
= 1 − (1 − e−0.5t )(1 − e−0.4t )
R(t) = R12 (t)R3 (t)

= (1 − (1 − e−0.5t )(1 − e−0.4t ))e−0.25t
= e−0.25t − e−0.25t (1 − e−0.5t )(1 − e−0.4t )
= e−0.25t − e−0.25t (1 − e−0.5t − e−0.4t + e−0.9t )
= e−0.25t − e−0.25t + e−0.75t + e−0.65t − e−1.15t
= e−0.75t + e−0.65t − e−1.15t
R(1) = 0.67778
Z ∞
MTTF = R(t)dt
Z0 ∞
= (e−0.75t + e−0.65t − e−1.15t )dt
0 −0.75t ∞
e e−0.65t e−1.15t
= + − )
−0.75 −0.65 −1.15 0

1 1 1
= 0−( + − )
−0.75 −0.65 −1.15
= 2.00223
STAT600 Page 159

set.seed(12354)
lambda <- c(0.5, 0.4,0.25)
nSims <- 100000
## [,1] [,2] [,3]

## [1,] 2.55811 0.22013 3.6589
## [2,] 7.02441 1.80964 3.2462
## [3,] 0.54515 2.22227 4.8965
## [4,] 4.95680 1.20275 3.4263
## [5,] 8.36425 13.16991 1.7524
## [6,] 1.98118 0.37026 10.7051
## [1] 1.9982 2.5144 4.0080
sim_systemlife <- apply(sim_components, 1, function(x) min(max(x[1], x[2]), x[3]))

## [1] 2.5581 3.2462 2.2223 3.4263 1.7524 1.9812

##
## FALSE TRUE
## 0.32192 0.67808
#Theoretical value
exp(-0.75*1) + exp(-0.65*1) - exp(-1.15*1)
## [1] 0.67778
# Estimate MTTF
## [1] 2.0036
#Theoretical value
- (1/(-0.75) + 1/(-0.65) - 1/(-1.15))
## [1] 2.0022
STAT600 Page 160

Example 7.10 Independent components are configured as shown in the following figure.
The times to failure are measured in thousands of hours and have exponential distribu-
tions with means µi .
1. Find the reliability function for this system.
STAT600 Page 161

STAT600 Page 162
set.seed(12354)
lambda <- c(1/2, 1/2, 1/3, 1/3, 1/4, 1/4)
nSims <- 100000
## [,1] [,2] [,3] [,4] [,5] [,6]

## [1,] 2.55811 0.17611 2.7442 1.9539 7.88656 18.83996
## [2,] 7.02441 1.44771 2.4347 2.8196 3.04876 5.22867
## [3,] 0.54515 1.77782 3.6723 3.6739 6.99595 4.91924
## [4,] 4.95680 0.96220 2.5697 1.1062 2.84568 4.83711
## [5,] 8.36425 10.53593 1.3143 3.0352 0.89255 4.32901
## [6,] 1.98118 0.29621 8.0288 1.1371 3.82637 0.74345
## [1] 1.9982 2.0115 3.0060 3.0118 3.9924 4.0037
sim_systemlife <- apply(sim_components, 1, function(x)

min(max(min(x[1], x[2]), min(x[3], x[4])), x[5], x[6]))
## [1] 1.95389 2.43467 3.67234 1.10619 0.89255 0.74345

##
## FALSE TRUE
## 0.86557 0.13443
#Theoretical value
(1-(1-exp(-2))*(1-exp(-2*2/3)))*exp(-0.5*2)
## [1] 0.13364
exp(-0.5 *2) - exp(-0.5 *2) + exp(-1.5*2) + exp(-7*2/6) - exp(-13*2/6)
## [1] 0.13364
# Estimate MTTF
## [1] 1.0632
#Theoretical value
-(1/(-1.5)+ 1/(-7/6) - 1/(-13/6))
## [1] 1.0623
163
Chapter 8
Introduction to Markov Chains
References:
[Win04, Chapter 17] (Available on the STAT600 Blackboard page
under Course Resources or in the Library)
164
8.1 A Simple Game
8.1.1 Game Introduction
Consider the following game. The player is presented with a 1 x 5 grid. Starting at square
1, the play must reach square 5 as quickly as possible. The player rolls a fair 6-sided die
to determine which move to make, so each roll is independent of subsequent rolls. The
player may move forward by either 1 or 2, stay in the same place or move backward by
1, 2 or 3. The player cannot move below square 1 and the game ends when the player
reaches square 5.
1 2 3 4 5
Start End
8.1.2 Game Set-up

Task: Complete the following table, assigning the numbers 1, 2, 3, 4, 5 and 6 to each
row. Choose any allocation you like, but use each number exactly once.
Move Number rolled

Move forward by 2
Move forward by 1
Stay in the same place
Move backward by 1
Move backward by 2
Move backward by 3
165
8.1.3 Playing the Game
Game time! Roll the die and determine your next move. Complete the table below
after each roll:
Move Number rolled Position

1
2
3
4
5
6
7
8
9
10
Move Number rolled Position

1
2
3
4
5
6
7
8
9
10
STAT600 Page 166

8.1.4 Probability of Outcomes
Suppose you are in square i, compute the probability of being in square j after 1 roll of
the die.
Position after 1 roll
1 2 3 4 5
Starting position 1
2
3
4
5
Suppose you are in square 1, what is the probability of being in square 5 (i.e. winning
the game) after:
• 1 roll
• 2 rolls
STAT600 Page 167

8.2 Stochastic Processes
8.2.1 Introduction to Stochastic Processes
A stochastic process is a collection of random variables {X(t), t ∈ T }. For

each t in T , X(t) is a random variable. The value t is often interpreted as time.
8.2.2 Classification of Stochastic Processes

Stochastic processes can be classified in four broad categories.
Time
• If the set T is a countable set, e.g. T = N = {0, 1, 2, . . .} then {X(t), t ∈ T } is

a discrete time stochastic process. Discrete time stochastic processes are often
denoted as {X(n), n ∈ N} or {Xn }.
• If the set T is a not countable, e.g. T is a continuum of values, then {X(t), t ∈ T }

is a continuous time stochastic process.
State
The set S of all possible values of X(t) is called the state space. The state space can
be:
• Discrete (either countable e.g. S = N = {0, 1, 2, . . .} or finite e.g. S = {0, 1, 2, . . . , M }.
• Continuous
In this course we consider discrete time, discrete space stochastic processes.

Therefore if Xn = i then the process is said to be in state i at time n. Time n is
often referred to as the nth trial.
STAT600 Page 168

8.3 Matrices
8.3.1 Definition
A matrix is any rectangular array of numbers. [Win04, p 11]
Properties:
• If a matrix A has m rows and n columns it is called an m × n matrix.
• The number in the ith row and jth column of a matrix A is called the ijth
element of A and is written aij .
• Two matrices A and B are equal if and only if aij = bij for all i and j
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
 .. .. 
A= . .
 

 .. .. 
 . . 
am1 am2 . . . amn

2 −4 5 0.2 1.4
B= C=
5 1 0 5.5 1.0
8.3.2 Matrix Operations

• Scalar multiplication

−2 4 −6 12
If A = , then 3A =
5 1 15 3
• Addition of two matrices (only if matrices are the same size)

−2 4 1 2 −1 6
If A = , and B = , then A + B =
5 1 4 −2 9 −1
• Transpose of a matrix
For any m × n matrix
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
 .. .. 
A= . .
 

 .. .. 
 . . 
am1 am2 . . . amn
STAT600 Page 169

the transpose of A (written AT ) is the n × m matrix
 
a11 a21 . . . am1
 a12 a22 . . . am2 
 .. .. 
AT =  . .
 


 .
.. .
..


a1n a2n . . . amn
Example:  
−2 4
T −2 5 3
A =  5 1 A =
4 1 0
3 0
• Matrix Multiplication
Two matrices A and B can be multiplied together if:
Number of columns in A = Number of Rows in B
Suppose that m, n and r are positive integers and suppose A is a m × r matrix and B is
a r × n matrix. Then the product AB is a m × n matrix.
 
  b11 . . . b1n  
a11 . . . . . . a1r  .. c11 . . . c1n
.

.. .. ..
=
    
 . . 
 .
..  . 
am1 . . . . . . amr cm1 . . . cmn
br1 . . . brn
The ijth element of C = AB is the scalar product of row i of A and column j of B.
Cij = Ai1 B1j + Ai2 B2j + . . . + Air Brj

Example 8.1 Let  
1 1
1 1 2
A= B = 2 3
2 1 3
1 2
Show that the matrix product C = AB is:

5 8
C = AB =
7 11
STAT600 Page 170

Example 8.2 Compute the matrix product BA.
Notice that AB 6= BA.
# Define matrix A
(A <- matrix(c(1, 1, 2,
2, 1, 3), nrow=2, byrow=TRUE))
## [,1] [,2] [,3]

## [1,] 1 1 2
## [2,] 2 1 3
# Define matrix B
(B <- matrix(c(1, 1,
2, 3,
1, 2), nrow=3, byrow=TRUE))
## [,1] [,2]
## [1,] 1 1
## [2,] 2 3
## [3,] 1 2
Multiply matrices
A %*% B
## [,1] [,2]
## [1,] 5 8
## [2,] 7 11
B %*% A
## [,1] [,2] [,3]

## [1,] 3 2 5
## [2,] 8 5 13
## [3,] 5 3 8
STAT600 Page 171

Example 8.3 Let
2 4 1 2
A= B=
3 1 2 3
i Compute the matrix product AB
ii Compute the matrix product BA
# Define matrix A
(A <- matrix(c(2, 4,
## [,1] [,2]
## [1,] 2 4
## [2,] 3 1
# Define matrix B
(B <- matrix(c(1, 2,
## [,1] [,2]
## [1,] 1 2
## [2,] 2 3
# Multiply matrices
A %*% B
## [,1] [,2]
## [1,] 10 16
## [2,] 5 9
B %*% A
## [,1] [,2]
## [1,] 8 6
## [2,] 13 11
STAT600 Page 172

8.3.3 Matrix Inverse
Example: Consider the equation
4x = 3
To solve for x, we multiply both sides by 4−1 = 41 .
(4−1 )(4x) = (4−1 )3

3
x=
4
This works because 4−1 is the multiplicative inverse of 4.
Example: Consider a square linear system of equations
Ax = b
To solve the system, we need A−1 , the inverse matrix of A
A−1 Ax = A−1 b
x = A−1 b
A square matrix is any matrix that has an equal number of rows and
columns.
The diagonal elements of a square matrix are those elements aij such that
i = j.
A square matrix for which all diagonal elements are equal to 1 and all
nondiagonal elements are equal to 0 is called an identity matrix. [Win04,
p36]
Example:
 
  1 0 0 0 0
1 0 0 0 1 0 0 0
1 0  
I2 = I3 = 0 1 0 0
I5 =  0 1 0 0
0 1 
0 0 1 0 0 0 1 0
0 0 0 0 1
• If A is an m × m matrix, then
Im A = AIm = A
STAT600 Page 173

Example: Consider a matrix
2 5
A=
1 3
Notice that:
1 0 2 5 2 5 1 0
I2 A = = AI2 = =A
0 1 1 3 1 3 0 1
In R:
A = matrix(c(2, 5,
1, 3), nrow = 2, byrow = TRUE)
A %*% diag(2)
## [,1] [,2]
## [1,] 2 5
## [2,] 1 3
For a given m × m matrix A, then m × m matrix B is the inverse of A if
BA = AB = Im
[Win04, p37]
Note: Some square matrices do not have inverses.
If there is a matrix B that satisfies BA = AB = Im , then we say B = A−1 and call A−1
the inverse of A. Example: Consider the matrices A and B. Verify that B = A−1 .

2 5 3 −5
A= B=
1 3 −1 2
If B = A−1 , then AB = I2 .

2 5 3 −5 1 0
AB = =
1 3 −1 2 0 1
• Gauss-Jordan Method (beyond the scope of this course - see [Win04, p39] for an
example)
• Using R

2 5
Example: Consider the matrix A = . Use R to find A−1 .
1 3
A = matrix(c(2, 5, 1, 3), nrow = 2, byrow = TRUE)
solve(A)
## [,1] [,2]
## [1,] 3 -5
## [2,] -1 2
STAT600 Page 174

Check AA−1 = I2 and A−1 A = I2
A %*% solve(A)
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
solve(A) %*% A
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
STAT600 Page 175

8.4 Markov Chains
8.4.1 Markov Property
A stochastic process {Xn }, n ≥ 0 is said to have the Markov property if

for n = 0, 1, 2, . . .
P (Xn+1 = j|Xn = i, Xn−1 = in−1 , . . . , X0 = i0 ) = P (Xn+1 = j|Xn = i)
The future of the process {Xn+1 = j} depends only the present {Xn = i}, not the past
{Xn−1 = in−1 , . . . , X0 = i0 }.
That is, given the present state of the process, the future state is independent of the past
states.
History.
Andrey Markov was born in 1856 in Russia. He studied

mathematics at St Petersburg University and became
well-known for his work in the area of stochastic pro-
cesses. He came from a family of mathematicians, with
his brother and son also making notable contributions
to the field.
Andrey Markov
https://en.wikipedia.org/wiki/Andrey_Markov
Russian Mathematician
1856–1922
8.4.2 Markov Chains: Definition
Consider a discrete time, discrete space stochastic process {Xn }. Let Pi,j
denote the probability that the process will move to state j given that it is
currently in state i, such that:
Pi,j = P (Xn+1 = j|Xn = i, Xn−1 = in−1 , . . . , X0 = i0 )

= P (Xn+1 = j|Xn = i)
Then, the stochastic process {Xn } is said to be a Markov Chain.
STAT600 Page 176

8.4.3 Applications of Markov Chains
Markov chains have been applied to a wide variety of problems:
• Stock prices
• Inventory levels
• Google’s PageRank algorithm
• Population growth
• Queues
• Speech recognition
• Bioinformatics
For further applications see https://en.wikipedia.org/wiki/Markov_chain#Applications.
8.4.4 Transition Probabilities
The probability Pi,j is said to be the one-step transition probability from

state i to state j.
X
Pi,j ≥ 0 and Pi,j = 1
j
For all states i, j ∈ S, the transition probabilities can be displayed in the (one-step)
transition matrix, P.
 
p0,0 p0,1 p0,2 · · · p0,j ···
p1,0 p1,1 p1,2 · · · p1,j · · ·
 
P= ··· ··· ··· ··· ··· · · ·

 pi,0 pi,1 pi,2 · · · pi,j · · ·
··· ··· ··· ··· ··· ···
STAT600 Page 177

Example 8.4 Ball Throwing Three children A, B and C are throwing a ball to each
other. A always throws the ball to B and B always to C; but C is just as likely to throw
the ball to B as to A.
Let Xn denote the nth person to throw the ball. The state space for this problem is:
S = {A, B, C}.
This is a Markov chain since the person throwing the ball is not directly influenced by
those who previously had the ball.
Exercise: State the transition matrix for this problem.
STAT600 Page 178

Example 8.5 Signal Processing: Suppose a binary signal has to pass through a number
of stages where it may be unchanged at each stage with a probability p (and therefore
changed with a probability of 1 − p).
Represent this system in the table below:

Next State
0 1
Current State 0
1
Transition Matrix
The probabilities can also be represented as a matrix called the transition matrix P:

p 1−p
P=
1−p p
Probability Pij is the probability from moving from state i to state j. For example:
• the probability of being in state 0 at the next stage, given that the signal is currently
in state 0 is P00 = p
• the probability of being in state 1 at the next stage, given that the signal is currently
in state 0 is P01 = 1 − p
At each stage, the signal has to either stay the same, or change, so the rows of the
transition matrix must sum to 1, i.e. P00 + P01 = 1.
Let Xn denote the state of the system at stage n = 0, 1, 2, . . ..
Starting in state 0 means X0 = 0.

Determine the following probabilities:
1. P (X1 = 0|X0 = 0) = p
2. P (X1 = 1|X0 = 0) = 1 − p
3. P (X1 = 0|X0 = 1) = 1 − p
4. P (X1 = 1|X0 = 1) = p
STAT600 Page 179

Two-Step Transition Probabilities
Starting from state 0, what is the probability of being in each state after two stages?
State at State at Probability State at Probability Joint Probabil-
stage 0 stage 1 stage 2 ity
0 0 0
0 0 1
0 1 0
0 1 1
• P (X2 = 0|X0 = 0) =
• P (X2 = 1|X0 = 0) =
State at stage n = 2
0 1
State at stage n = 0 0
1
The probability of moving from state i to state j after 2 stages can be represented in a
matrix:

2 p 1−p p 1−p
P =
1−p p 1−p p
2
p + (1 − p)2 2p(1 − p)
=
2p(1 − p) p2 + (1 − p)2
Two-Step Transition Probabilities - Example

Suppose p = 0.2. Then
0.2 0.8
P=
0.8 0.2
The probability of being in state 0 after two stages, given that it starts at stage 0 is
2
P00 = 0.22 + 0.82 = 0.68.
The other probabilities are shown in the 2-step transition matrix:

2 0.68 0.32
P =
0.32 0.68
STAT600 Page 180

8.4.5 n-Step Transition Probabilities
n
The n-step transition probability Pi,j of the Markov Chain is the
probability that the chain will be in state j after n transitions, given that it
is currently in state i.
n
Pi,j = P (Xn+m = j|Xm = i), n ≥ 0, i, j, ≥ 0
The probability of moving from state i to state j after n transitions is given by the
n-step transition matrix Pn , where Pn is the transition matrix P multiplied by itself
n times.
STAT600 Page 181

8.4.6 Chapman-Kolmogorov Equations
The Chapman-Kolmogorov Equations can be used to obtain the n-step

transition probabilities:
∞
X
n+m n m
Pi,j = Pi,k Pk,j
k=0
These equations are derived as follows:

n+m
Pi,j = P (Xn+m = j|X0 = i)
X∞
= P (Xn+m = j, Xn = k|X0 = i)
k=0
∞
X
= P (Xn+m = j|Xn = k, X0 = i)P (Xn = k|X0 = i)
k=0
∞
X
n m
= Pi,k Pk,j
k=0
Note: this uses the result that for events A, B, and C,
P (A, B|C) = P (A|B, C)P (B|C)
Let P(n) denote the matrix of n-step transition probabilities Pi,j

n
. The n-step transition
(n)
probability matrix P can be obtained multiplying the matrix by itself n times. Since:
P(n+m) = P(n) · P(m)
where · represents matrix multiplication
P(2) = P(1+1) = P · P = P2
Therefore, by induction it can be shown that:
P(n) = Pn
STAT600 Page 182

Example 8.6 Suppose that whether it rains today depends on the previous weather
conditions from the last two days. Specifically, suppose that:
• if it has rained for the past two days, then it will rain tomorrow with probability
0.7
• if it rained today but not yesterday, then it will rain tomorrow with probability 0.5
• if it rained yesterday but not today, then it will rain tomorrow with probability 0.4
• if it has not rained in the past two days, then it will rain tomorrow with probability
0.2
[Ros02, example 4.1b]
Let the state at time n be determined by the weather conditions during that day and the
previous day. For instance the process is in:
• state 0 if is rained both today and yesterday
• state 1 if is rained today but not yesterday
• state 2 if is rained yesterday but not today
• state 3 if is rained neither today nor yesterday
This process is a four-state Markov chain.
a. State the transition probability matrix for this Markov chain.
STAT600 Page 183

b. Suppose that it rained on Monday and Tuesday. What is the probability that it will
rain on Thursday?
The transition matrix is:  

0.7 0 0.3 0
0.5 0 0.5 0 
P=
 0 0.4 0 0.6

0 0.2 0 0.8
Therefore the two-step transition matrix is:
 
0.49 0.12 0.21 0.18
 0.35 0.20 0.15 0.30
P2 = 0.20

0.12 0.20 0.48
0.10 0.16 0.10 0.64
STAT600 Page 184

On Tuesday that process is in state 0. If it rains on Thursday, the process could be in
state 0 or 1. Therefore given that the process is in state 0 on Tuesday, the probability
that it is raining on Thursday is given by:
2 2
P (X2 = 0|X0 = 0) + P (X2 = 1|X0 = 0) = P0,0 + P0,1
= 0.49 + 0.12
= 0.61
c. Suppose that it rained on Monday and Tuesday. What is the probability that it will
rain on Friday? (See week 8 lab for further details)
STAT600 Page 185

The three-step transition matrix is:
   
403/1000 3/25 207/1000 27/100 0.403 0.120 0.207 0.270
 69/200 3/25 41/200 33/100   0.345 0.120 0.205 0.330 
P3 =  1/5
= 
22/125 3/25 63/125   0.200 0.176 0.120 0.504 
3/20 21/125 11/100 143/250 0.150 0.168 0.110 0.572
Therefore the probability it will rain on Friday, given it has rained on Monday and
Tuesday is:
186
Chapter 9
Classification of Markov Chains
References:
[Win04, Chapter 17]
187
9.1 State Transition Diagram
The transition probability matrix of a Markov chain can be represented graphically using
a directed graph.
• Nodes: Each state is represented by a node
• Arcs: The transition probability Pi,j is represented by the arc (i, j).
Example: Ball Throwing

Example 9.1 Recall the ball throwing example from last week:
 
0 1 0
P= 0 0 1
1/2 1/2 0
Represent this Markov chain using a state transition diagram.
1 0.5
0.5
B C
188
9.2 Computing Probabilities for Markov Chains
Recall that:
P (A ∩ B) = P (A)P (B|A)
and
P (A ∩ B|C) = P (A|C)P (B|A ∩ C)

Let {Xn } be a Markov Chain with states {s0 , s1 , s2 , . . . , sN }.
Therefore if we define events A, B, C, such that:

• A = {X1 = s1 }
• B = {X2 = s2 }
• C = {X0 = s0 }
Then
P (X2 = s2 , X1 = s1 |X0 = s0 ) = P (A ∩ B|C)
= P (A|C)P (B|A ∩ C)
= P (X1 = s1 |X0 = s0 )P (X2 = s2 |X1 = s1 , X0 = s0 )
By the Markov Property
= P (X1 = s1 |X0 = s0 )P (X2 = s2 |X1 = s1 )

= Ps0 ,s1 Ps1 ,s2
It can be shown by induction that this result holds for more than n = 2 transitions.
9.2.1 Initial Distribution
(0)
The probability distribution Pj = P (X0 = j), j ∈ S is called the
initial distribution of the chain.
For any Markov Chain:
P (Xn = sn , . . . , X2 = s2 , X1 = s1 ,X0 = s0 )
= Ps(0)
0
Ps0 ,s1 Ps1 ,s2 . . . Psn−1 ,sn
STAT600 Page 189

Example 9.2 Suppose the entire cola industry produces only two colas. Given that a
person last purchased cola 1, there is a 90% chance that her next purchase will be cola 1.
Given that a person last purchased cola 2, there is a 80% chance that her next purchase
will be cola 2. [Win04, p 929]
Questions of interest:
a. If a person is currently a cola 2 purchaser, what is the probability that she will purchase
cola 1 two purchases from now?
b. If a person is currently a cola 1 purchaser, what is the probability that she will purchase
cola 1 three purchases from now?
Modelling with a Markov Chain:

We have a two state Markov chain, S = {1, 2} in which:
State 1 if person last purchased cola 1

State 2 if person last purchased cola 2
• Suppose a person last purchased cola i, then X0 = i.
• If a person next purchases cola j, then X1 = j.
• If a person purchases cola j in n purchases from now, then Xn = j.
The sequence of random variables X0 , X1 , X2 , . . . , can be described as a Markov Chain

with transition matrix:
State transition diagram:
STAT600 Page 190

Plotting a transition diagram/graph in R.
# Define transition matrix

P <- matrix(c(0.9, 0.1,
0.2, 0.8), nrow=2, ncol=2, byrow=TRUE)
library(shape)
library(diagram)
# Default plot
plotmat(t(P))
# Customise plot
plotmat(t(P), curve=0.3, pos=c(2), box.size=0.1,
self.shifty = c(0.1, 0.1),
self.shiftx = c(-0.1, 0.1),
self.lwd = 2,
self.arrpos = c(1, 1),
shadow.size = 0,
cex = 1,
box.cex = 1.5)
STAT600 Page 191

Questions of interest:
a. If a person is currently a cola 2 purchaser, what is the probability that she will purchase
cola 1 two purchases from now?
2. If a person is currently a cola 1 purchaser, what is the probability that she will
purchase cola 1 three purchases from now?

3 0.9 0.1 0.83 0.17
2 0.781 0.219
P =P ·P = =
0.2 0.8 0.34 0.66 0.438 0.562
3
We seek P1,1 = 0.781.
STAT600 Page 192

R Demo:
# Define matrix P
P <- matrix(data=c(0.9, 0.1,
0.2, 0.8),
nrow=2, ncol=2, byrow=TRUE)
# Compute the 2-step transition matrix

P2 <- P %*% P
# Compute the 3-step transition matrix

P3 <- P2 %*% P
P; P2; P3
## [,1] [,2]
## [1,] 0.9 0.1
## [2,] 0.2 0.8
## [,1] [,2]
## [1,] 0.83 0.17
## [2,] 0.34 0.66
## [,1] [,2]
## [1,] 0.781 0.219
## [2,] 0.438 0.562
# Compute the n-step transition matrix

n=50
pi_est=P
for(i in 2:n){
pi_est = pi_est%*%P
if(i %% 5 ==0 & i < 25){
print(i)
print(pi_est)
}
}
## [1] 5
## [,1] [,2]
## [1,] 0.72269 0.27731
## [2,] 0.55462 0.44538
## [1] 10
## [,1] [,2]
## [1,] 0.67608 0.32392
## [2,] 0.64783 0.35217
## [1] 15
## [,1] [,2]
## [1,] 0.66825 0.33175
## [2,] 0.66350 0.33650
## [1] 20
## [,1] [,2]
## [1,] 0.66693 0.33307
## [2,] 0.66613 0.33387
pi_est
STAT600 Page 193

## [,1] [,2]
## [1,] 0.66667 0.33333
## [2,] 0.66667 0.33333
library(expm)
#to use matrix power %^%
p50= P %^% 50
p50
## [,1] [,2]
## [1,] 0.66667 0.33333
## [2,] 0.66667 0.33333
STAT600 Page 194

9.3 Classification of States
9.3.1 Accessibility, Communication and Irreducibility
Given two states i and j, a path from i to j is a sequence of transitions that

begins in i and ends in j, such that each transition in the sequence has a positive
probability of occurring.
n
State j is said to be accessible or reachable from state i if Pi,j > 0 for some
0
n ≥ 0. Note that state i is accessible from itself since Pi,i = P (X0 = i|X0 = i) =
1.
If state j is accessible from i and state i is accessible from j, then states i and
j communicate. Symbolically, communication of states i and j is represented:
i↔j
The communication relation satisfies three properties:
1. i ↔ j
2. if i ↔ j then j ↔ i
3. if i ↔ j and j ↔ k, then i ↔ k
The state space can be divided up into classes depending on whether or not states
communicate.
Two states which communicate are said to be in the same class.
• As a result of the properties above, any two classes of states are either
identical or disjoint.
• If a Markov chain has only one class, then it is said to be irreducible.
• A class is a closed set, once the chain enters a closed set it can never
leave.
STAT600 Page 195

Example 9.3 – [Ros02, Example 4.3a]
Consider a Markov chain with three states and having transition probability matrix:
1 1 
2 2
0
P = 2 4 14 
 1 1
0 13 23
• Is state 3 accessible from state 1?
Yes. Even though P1,3 = 0, we can access state 3 by first going via state 2. Since
n
P1,3 > 0 (for some n), state 3 is accessible from state 1. In this case we can see
2
that P1,3 > 0.
• Do all states communicate?

n
Yes. All states i ∈ S are accessible from all other states since Pi,j > 0 for some
n ≥ 0. Therefore all states i ∈ S communicate.
• How many classes does this Markov chain have?

All states communicate, so there is only one class {1, 2, 3}.
• Is this Markov Chain irreducible?

Since all states communicate, the Markov chain has only one class and is thus
irreducible.
Example 9.4 – [Ros02, Example 4.3b]

Consider a Markov chain with four states and having transition probability matrix:
1 1 
2 2
0 0
 1 1 0 0
P = 2 2 
1 1 1 1
4 4 4 4
0 0 0 1
• Do all states communicate?

No. For example, it is not possible to get from state 4 to any other state, since
P4,4 = 1.

Three. The classes are {1, 2}, {3}, {4}.
While state 1 is accessible from state 3, the reverse is not true so state 1 and 3 do
not communicate. This means they must be in separate classes.
• Is this Markov Chain irreducible?

No, it has more than one class.
STAT600 Page 196

9.3.2 Absorbing, Transient, Recurrent States
If Pi,i = 1, then state i is an absorbing state.
A state i is a transient state if there exists a state j that is reachable from

i, but the state i is not reachable from state j.
A state that is not transient is called a recurrent state.
Example 9.5 Consider a Markov chain with four states and having transition probability
matrix: 1 1 
2 2
0 0
 1 1 1 0
P =  14 41 1 
 2
0 4 4 2
0 0 0 1
• Does this Markov chain have any absorbing states?

Yes, state 4 is an absorbing state since P4,4 = 1

n
Yes, there is a path from 1 to 4, e.g. 1 → 3 → 4. Therefore P1,4 > 0 for some n.

n
No, there no path from 4 to 1. There is no n such that P4,1 > 0.
• Which states communicate?

State 1 communicates with states 1, 2 and 3.
State 4 communicates with itself only.

Two. The classes are {1, 2, 3}, {4}.
• Which states are transient and which are recurrent?
– State 1 is transient because state 4 is reachable from state 1, but state 1 is not
reachable from state 4.
– State 2 and 3 are also transient for the same reason as state 1.
– State 4 is not transient because there is no other state j such that j is reachable
n
from 4 and 4 is not reachable from j. i.e. There is no j such that P4j > 0 and
n
Pj4 = 0. Therefore, State 4 is recurrent.
STAT600 Page 197

Example 9.6 Gambler’s Ruin: A gambler plays a series of independent games. At
each game he either wins $1 with probability p or loses $1 with probability 1 − p. The
gambler stops playing either if he goes broke or as soon as he amasses $N , whichever
happens first.
Let Xn denotes his total stake (winnings plus initial capital) after n games. {Xn } is a
Markov chain with state space S = {0, 1, ..., N }.
Transition probabilities:
pi,i+1 = p for (i = 1, ..., N − 1)

pi,i−1 = 1 − p for (i = 1, ..., N − 1)
p0,0 = pN,N = 1
pi,j = 0 for all other i and j
Suppose N = 4. State the transition matrix for this process.
In this example, state 0 and 4 are absorbing states and are also recurrent states. States
1, 2, 3 are transient states.
Typical questions of interest:
1. If I start with $1 what is the probability that I will have $4 in, say, 6 steps?
2. If I start with $2 what is the probability I will lose it all?
3. On the average how long will it take for me to be ruined or to win $N?
STAT600 Page 198

9.3.3 Periodic States and Ergodic Chains
A state i is periodic with period k > 1, if k is the smallest number such

that all paths leading from state i back to state i have length that is a multiple
of k.
If a recurrent state is not periodic, it is referred to as aperiodic. If a state j

has pj,j > 0 then the state j is aperiodic.
Example 9.7 Consider a chain with transition matrix:

 
0 1 0
P = 0 0
 1
1 0 0
Each state has period 3. Starting in state 1, the only way to return is to take the path
1 → 2 → 3 → 1 some number of times m. It will take 3m transitions to return to state
1, so state 1 has period 3.
Example 9.8 Consider a chain with transition matrix:

 
0 1/2 1/2
P = 1/2 0 1/2
0 0 1
Consider the paths:
• 1→2→1
• 1→2→1→2→1
• 1→2→1→2→1→2→1
States 1 and 2 are periodic with period 2. State 3 is aperiodic. Starting in state 1, the
only way to return is to take the path 1 → 2 → 1 some number of times m. It will take
2m transitions to return to state 1, so state 1 has period 2. Similarly for state 2.
If all states in a chain are recurrent, aperiodic and communicate with each
other, then the chain is said to be ergodic.
STAT600 Page 199

Example 9.9 Give an example to demonstrate why the gambler’s ruin problem is not
ergodic.
9.4 Steady State Probability

9.4.1 Definition
n
For an irreducible, ergodic Markov chain, lim Pi,j exists and is independent
n→∞
of i. Furthermore, letting
n
πj = lim Pi,j
n→∞
then πj , j ≥ 0 is the unique, nonnegative solutions of:

X
πj = πi Pi,j (9.1)
i
X
πj = 1 (9.2)
j
[Ros14, Theorem 4.1]

The vector π = [π1 π2 . . . πs ] is called the steady state distribution or
equilibrium distribution for the Markov Chain.
9.4.2 Finding the Steady State Probabilities

In matrix form, equation (9.1) can be written as:
π = πP
For a two state chain:

p1,1 p1,2
[π1 π2 ] = [π1 π2 ]
p2,1 p2,2
The steady state probability πj can also be interpreted as the long-run proportion of
time that the chain is in state j.
STAT600 Page 200

Example 9.10 Cola example - continued: Finding and applying steady state
probabilities
Finding the steady state probabilities

Recall that:
0.9 0.1
P =
0.2 0.8
To find the steady state probabilities we need to solve:

0.9 0.1
[π1 π2 ] = [π1 π2 ]
0.2 0.8
2
X
1= πi
i=1
Expanding these equations yield:
π1 = 0.9π1 + 0.2π2
π2 = 0.1π1 + 0.8π2
1 = π1 + π2
This chain has s = 2 states, so we use the first (s − 1) equations and the last equation:
π1 = 0.9π1 + 0.2π2
1 = π1 + π2
Solving for π1 and π2 gives:
2
π1 =
3
1
π2 =
3
2
After a long time, the probability that a person will purchase cola 1 is 3
and the
probability that they will purchase cola 2 is 13 .
STAT600 Page 201

9.4.3 Application of state probabilities
Suppose that each customer makes one purchase of cola during any week (52 weeks=1
year). Suppose there are 1 million cola customers. One selling unit of cola costs company
1 $1 and is sold for $2. For $5 million per year an advertising agency guarantees to
decrease from 10% to 5% the fraction of cola 1 customers who switch to cola 2 after a
purchase.
Should the company that makes cola 1 hire the advertising firm?
STAT600 Page 202

9.4.4 Using linear algebra to find steady state probabilities
Example 9.11
Notice that the equations:
0.9π1 + 0.2π2 = π1
π1 + π 2 = 1
can be rewritten as:
−0.1π1 + 0.2π2 = 0
π1 + π 2 = 1
Or equivalently

−0.1 0.2 π1 0
=
1 1 π2 1
This has the form Aπ = b. So π = A−1 b.
It can be shown that:

−1 −10/3 2/3
A =
10/3 1/3
Therefore
π = A−1 b

−10/3 2/3 0
=
10/3 1/3 1

2/3
=
1/3
9.4.5 Using R to find steady state probabilities

R can be used to solve the system of linear equations using the function solve.
Using R to find the inverse of A
#Define matrix a and vector b
A = matrix(data=c(-0.1, 0.2,
1, 1),nrow=2, byrow=TRUE)
b = c(0,1)
solve(A) %*% b
## [,1]
## [1,] 0.66667
## [2,] 0.33333
STAT600 Page 203

Using R to solve the system of equations directly.
The R function solve(A,b) gives the solution (i.e. the values of x) for the equation Ax = b,
where A is a matrix and b is a vector.
solve(A,b)
## [1] 0.66667 0.33333
Using R to find steady state probabilities directly from P

#Enter matrix
P <- matrix(c(0.9, 0.1,
0.2, 0.8), nrow=2, ncol=2, byrow=TRUE)
nStates <- dim(P)[1]
# Method 1
# Compute P - I(n) and select first n-1 rows
A = t(P - diag(nStates))[1:(nStates-1),]
# Add "sum to 1" constraints
A = rbind(A, rep(1, nStates))
# Define RHS of system of equations
b <- c(rep(0, nStates-1), 1)
# Solve
pi_theoretical <- solve(A,b)
round(pi_theoretical,4)
## [1] 0.6667 0.3333
MASS::fractions(pi_theoretical)
## [1] 2/3 1/3
# Method 2
# Compute P - I(n)
a = t(P - diag(nStates))
# Add "sum to 1" constraints
a = rbind(a, rep(1, nStates))
# Define RHS of system of equations
d <- c(rep(0, nStates), 1)
qr.solve(a, d)
## [1] 0.66667 0.33333
# Method 3
eigenP <- eigen(t(P))
ev <- eigenP$vectors[,1] / sum(eigenP$vectors[,1])
ev
## [1] 0.66667 0.33333
MASS::fractions(ev)
## [1] 2/3 1/3
STAT600 Page 204

9.5 Simulating a Markov Chain
9.5.1 Simulating a Markov Chain By Hand
Example 9.12 Recall the cola example. This problem can be modelled by a Markov
chain with the transition matrix P .

0.9 0.1
P =
0.2 0.8
Simulate this Markov Chain “by hand” using the probabilities below.
For example, if in state 1, and probability is in (0, 0.9), next state is state 1. If probability
is in (0.9, 1) then next state is state 2.
Stage State Probability Next State
0 1 0.0702
1 0.233
2 0.0479
3 0.9999
4 0.7361
5 0.8035
6 0.0249
7 0.8194
8 0.6268
9 0.3876
10 0.0501
STAT600 Page 205

1 2
Number of times in state during stages 1 – 10 :
STAT600 Page 206

9.5.2 Simulating a Markov Chain Using R
Example 9.13 Recall the cola example. This problem can be modelled by a Markov
chain with the transition matrix P .
Simulate this Markov chain using R.
set.seed(1234567)
#Specify states
S <- c(1,2)
M <- length(S)
#Define transition matrix P

P <- matrix(data=c(0.9, 0.1,
0.2, 0.8),
dimnames(P) <- list(S, S)
P
## 1 2
## 1 0.9 0.1
## 2 0.2 0.8
#Determine number of simulations

numStages <- 1e5
#Initialize vector x
x <- rep(NA, numStages+1)
names(x) <- 0:numStages
#Specify starting value x0 and assign to first element of x

x0 <- 1
x[1] <- x0
#For each simulation run

for(i in 1:numStages){
#Select row
rowIndex <- which(x[i]==S)
#Sample using probabilities from selected row and assign to x[i+1]

x[i+1] <- sample(S, 1, prob=P[rowIndex,], replace=TRUE)
}
Inspect output
head(x, 10)
## 0 1 2 3 4 5 6 7 8 9
## 1 1 1 2 2 2 2 2 1 1
#Print matrix of stage number and state

summaryTable <- cbind(Stage=0:numStages, State=x)
head(summaryTable)
STAT600 Page 207

## Stage State
## 0 0 1
## 1 1 1
## 2 2 1
## 3 3 2
## 4 4 2
## 5 5 2
#Compute number of times in each state

table(x)
## x
## 1 2
## 67080 32921
Plot Simulated Trajectory
#Plot trajectory
plotLen <- 50
plot(names(x[1:plotLen]), x[1:plotLen], type='s',
xlab="Time", ylab='State' , yaxt='n')
axis(2, at = c(1,2), labels = S)
head(x, 10)
## 0 1 2 3 4 5 6 7 8 9
## 1 1 1 2 2 2 2 2 1 1
2
State
0 10 20 30 40 50
Time
STAT600 Page 208

Plot Convergence of Steady State Probabilities
# Create an empty matrix

mat <- matrix(0, nrow = M, ncol = numStages)
# If MC is in state at i at time n, record a 1 in cell (i,n)

mat[cbind(x[2:(numStages+1)], seq_len(numStages))] <- 1
# Compute cumulative sum of number of times in each state
mat_sum <- t(apply(mat,1,cumsum))
# Compute proportion of times in each state
pi_cumsum <- t(sweep(mat_sum,2,colSums(mat_sum),"/"))
#head(mat_sum)
mycols <- c("hotpink", "blue")

numStagesPlot <- min(1000, numStages)
plot(1:numStagesPlot, pi_cumsum [1:numStagesPlot,1],

ylim=c(0,1),
col=mycols[1],
ylab=expression(pi[i]), type="l",
xlab="Stage", lty=2,
cex.axis = 1.5,
cex.lab = 1.5
)
# Plot theoretical probability, simulated probability and label

for(i in 1:M){
if(i > 1)lines(pi_cumsum[1:numStagesPlot,i], col=mycols[i],

type="l", lty=2)
abline(h=pi_theoretical[i], col=mycols[i])
text(numStagesPlot*(1-0.03*i), pi_theoretical[i],
parse(text =paste("pi[", i, "]")),
col=mycols[i], cex = 1.5)
}
# Add legend
legend("topright",
leg=c(paste0("State ", S), "Theory", "Simulation"),
col=c(mycols, 1, 1),
lty=c(rep(NA, M), 1, 2),
pch=c(rep(16, M), NA, NA),
cex = 1)
STAT600 Page 209

1.0
State 1
State 2
Theory
Simulation
0.8
0.6 π1
πi
0.4
π2
0.2
0.0
0 200 400 600 800 1000

Stage
In this example we have simulated the Markov chain once.
Discussion Would we get the same results if we run the simulation again?
STAT600 Page 210

9.5.3 Estimating a transition matrix
Example 9.14 We can estimate the transition matrix from our simulation of the cola
example.
head(x, 10)
## 0 1 2 3 4 5 6 7 8 9
## 1 1 1 2 2 2 2 2 1 1
table(x[1:9],
x[2:10])
##
## 1 2
## 1 3 1
## 2 1 4
(trans_freq <- table(x[1:(length(x)-1)],

x[2:length(x)]))
##
## 1 2
## 1 60476 6604
## 2 6603 26317
Convert transitions to probabilities
(P_sim <- trans_freq/rowSums(trans_freq))
##
## 1 2
## 1 0.90155 0.09845
## 2 0.20058 0.79942
MASS::fractions(P_sim)
##
## 1 2
## 1 1163/1290 127/1290
## 2 6603/32920 26317/32920
Compare simulation vs actual transition matrix
P - P_sim
## 1 2
## 1 -0.00155039 0.00155039
## 2 -0.00057716 0.00057716
STAT600 Page 211

Example 9.15 Each day my train is early, on-time, or late. This scenario can be modelled
as a Markov chain with three states. Suppose that over a period of 20 days I collected
data about the arrival time of my train and I want to estimate the transition matrix.
To demonstrate the process of estimating the transition matrix we will simulate some
data. We will assume that the difference between the arrival time and the scheduled time
(minutes) can be modelled by a normally distributed random variable with mean 0 and
standard deviation of 5.
Simulate data
set.seed(123345456)
# Generate data
n <- 20
df <- data.frame(x = rnorm(n, 0 , 5))
# Create categorical variables using the data

# (-infty, 1): more than 1 minute before scheduled time = early
# (-1, 1): within 1 minute of scheduled time = ontime
# (1, infty): more than 1 minute after scheduled time = late
library(tidyverse)
df <- mutate(df,
cat_x = cut(x, c(-Inf, -1, 1, Inf),
ordered_result=TRUE,
labels=c("early", "on-time", "late")) )
Inspect the data

df
## x cat_x
## 1 0.802981 on-time
## 2 2.305315 late
## 3 -9.012655 early
## 4 4.814735 late
## 5 2.292868 late
## 6 4.412924 late
## 7 5.541629 late
## 8 3.866881 late
## 9 -9.069756 early
## 10 6.327343 late
## 11 5.345333 late
## 12 -2.140519 early
## 13 -0.063766 on-time
## 14 4.289332 late
## 15 -5.666453 early
## 16 4.628747 late
## 17 4.771764 late
## 18 1.774790 late
## 19 -2.661365 early
## 20 2.211989 late
STAT600 Page 212

Explore the data
# Summarise the categorical variables

table(df$cat_x)
##
## early on-time late
## 5 2 13
Explore the transitions by hand.

Compute the frequency with which each transition occurs.
Early On-Time Late
Early
On-Time
Late
Explore the transitions using R
# Explore transitions
trans_freq <- table(df$cat_x[1:(nrow(df)-1)],
df$cat_x[2:nrow(df)])
trans_freq
##
## early 0 1 4
## on-time 0 0 2
## late 5 0 7
Convert transitions to probabilities
P <- trans_freq/rowSums(trans_freq)
P
##
## early 0.00000 0.20000 0.80000
## on-time 0.00000 0.00000 1.00000
## late 0.41667 0.00000 0.58333
MASS::fractions(P)
##
## early 0 1/5 4/5
## on-time 0 0 1
## late 5/12 0 7/12
STAT600 Page 213

Therefore transition matrix can be estimated as follows:
 
0 1/5 4/5
P = 0 0 1 
5/12 0 7/12
Discussion What would make the estimation of this transition matrix more
accurate?
214
Chapter 10
Properties and Applications of

Markov Chains
References:
[Win04, Chapter 17]
215
10.1 Example: Stock Prices
Example 10.1 Consider two stocks. Stock 1 always sells for $10 or $20. If stock 1 is
selling for $10 today, there is a .70 chance that it will sell for $10 for tomorrow. If it
is selling for $20 today, there is a .75 chance that it will sell for $20 tomorrow. Stock
2 always sells for $10 or $25. If stock 2 sells today for $10 there is a .90 chance that it
will sell tomorrow for $10. If it sells today for $25, there is a .85 chance that it will sell
tomorrow for $25.
The price of the stocks over time can be modelled by two Markov chains.
a. Construct the transition probability matrix for stock 1.

S = {1, 2} where state 1 = $10 and state 2 = $20.

0.7 0.3
P =
0.25 0.75
b. Construct the transition probability matrix for stock 2.
c. Consider the stock price 1 Markov chain.
• Does the steady state distribution exist for this Markov chain?
• Find the steady state probabilities.
• In the long-run, what is the average price of the stock?
• If the stock is currently selling for $10, on average, how long until it will next sell
for $10?
• If the stock is currently selling for $10, on average, how long until it will next sell
for $20?
216
Does the steady state distribution exist for this Markov chain?
• Recurrent: There are no transient states (there is only one class) so all states are
recurrent.
• Aperiodic: No states are periodic (no k > 1 such that all paths back to a state
are a multiple of k), so the Markov chain is aperiodic.
• Communicate: All states communicate with each other, so there is one class and
thus the Markov chain is irreducible.
• Ergodic: All states communicate with each other, are recurrent, and are aperiodic
so the Markov chain is ergodic, therefore the steady state distribution exists.
Find the steady state probabilities.
To find the steady state probabilities we need to solve:

0.7 0.3
[π1 π2 ] = [π1 π2 ]
0.25 0.75
2
X
1= πi
i=1
Expanding these equations yield:

π1 = 0.7π1 + 0.25π2
π2 = 0.3π1 + 0.75π2
1 = π1 + π2
This chain has s = 2 states, so we use the first (s − 1) equations and the last equation:
π1 = 0.7π1 + 0.25π2
1 = π1 + π 2
Solving for π1 and π2 gives:
5
π1 = = 0.4545455
11
6
π2 = = 0.5454545
11
In the long-run, what is the average price of the stock?
In the long run the price will be $10 about 45% of the time and $20 about 55% of the
time. Therefore the average stock price is:
10 × 0.4545... + 20 × 0.5454... = $15.45
STAT600 Page 217

10.2 Mean First Passage Times
For an ergodic Markov chain, let the mean first passage time mi,j =
the expected number of transitions before we reach state j, given that we are
currently in state i. [Win04, p939]
The mean first passage times can be found by solving the following set of
equations:
X
mi,j = 1 + pi,k mk,j (10.1)
k6=j
The mean first passage times to state i given that the system is currently in
state i is:
1
mi,i = (10.2)
πi
Example 10.2 Stock Prices - continued
If the stock is currently selling for $10, on average, how long until it will next
sell for $10?
1
mi,i =
πi
1
m1,1 =
0.4545455
= 2.2 days
If the stock is currently selling for $10, on average, how long until it will next
sell for $20?
X
mi,j = 1 + pi,k mk,j
k6=j
X
m1,2 = 1 + p1,k mk,2
k6=2
m1,2 = 1 + p1,1 m1,2

m1,2 = 1 + 0.7m1,2
m1,2 (1 − 0.7) = 1
1
m1,2 = = 3.33 days
0.3
STAT600 Page 218

The other mean first passage times are:
1
m2,2 =
0.54545
= 1.833 days
X
m2,1 = 1 + p2,k mk,1
k6=1
m2,1 = 1 + p2,2 m2,1

m2,1 = 1 + 0.75m2,1
m2,1 (1 − 0.75) = 1
1
m2,1 = = 4 days
0.25
STAT600 Page 219

10.3 Absorbing Chains
10.3.1 Introduction
A Markov chain in which one or more states is an absorbing state is an

absorbing Markov chain.
Absorbing Markov chains have a number of interesting applications.
Example 10.3 A gambler plays a series of independent games. At each game he either
wins $1 with probability p or loses $1 with probability 1 − p. The gambler stops playing
either if he goes broke or as soon as he amasses N = $4, whichever happens first.
 
1 0 0 0 0
1 − p 0 p 0 0
 
P= 0
 1−p 0 p 0
 0 0 1 − p 0 p
0 0 0 0 1
10.3.2 Classification of States

Let {Xn } be a Markov chain with states S = {1, 2, . . . , s}.
If {Xn } is an absorbing Markov chain, then we can classify the states as follows:
• s − m transient states, labelled t1 , t2 , . . . , ts−m
• m absorbing states, labelled a1 , a2 , . . . , am
If we list all of the states S, first by the transient states and then by the absorbing states,
we can rewrite the transition matrix as follows:
s−m columns m columns

s−m rows Q R
P =
m rows 0 I
where
• Q is an (s − m) × (s − m) matrix representing transitions between transient states
• R is an (s−m)×m matrix representing transitions from transient states to absorbing

states
• I is an m × m identity matrix
• 0 is an m × (s − m) matrix consisting entirely of zeros
STAT600 Page 220

Example 10.4 A gambler plays a series of independent games. At each game he either
wins $1 with probability p or loses $1 with probability 1 − p. The gambler stops playing
either if he goes broke or as soon as he amasses N = $4, whichever happens first.
 
1 0 0 0 0
1 − p 0 p 0 0
 
P=  0 1−p 0 p 0
 0 0 1 − p 0 p
0 0 0 0 1
Transient states (s − m = 3): t1 = 1, t2 = 2, t3 = 3
Absorbing states (m = 2): a1 = 0, a2 = 4
1 2 3 0 4
 
1 0 p 0 1−p 0
2 
 1−p 0 p 0 0 

P = 3 
 0 1−p 0 0 p 

0  0 0 0 1 0 
4 0 0 0 0 1
1 2 3 0 4
   
1 0 p 0 1 1−p 0
Q= 2  1−p 0 p  R= 2  0 0 
3 0 1−p 0 3 0 p
0 4

0 1 0
I=
4 0 1
STAT600 Page 221

10.3.3 Properties of Absorbing Chains
• How many periods do we expect to spend in a given transient state
before absorption takes place?
If the chain begins in a given transient state ti , and before we reach an absorbing
state, what is the expected number of times that each state tj will be entered?
If we are in transient state ti , the expected number of periods that will be spent in
transient state tj before absorption is the ijth element of the matrix (I − Q)−1 .
• If a chain begins in a given transient state, what is the probability that

we end up in each absorbing state?
If we are in transient state ti , the probability that we will eventually be absorbed in

absorbing state aj is the ijth element of the matrix (I − Q)−1 R.
Note: the matrix (I − Q)−1 is often called the Markov chain’s fundamental matrix.
Here I is a (s − m) × (s − m) identity matrix.
Example 10.5 The law firm of Mason and Burger employs three types of lawyers: junior
lawyers, senior lawyers, and partners. During a given year, there is a .15 probability that
a junior lawyer will be promoted to senior lawyer and a .05 probability that he or she will
leave the firm. Also, there is a .20 probability that a senior lawyer will be promoted to
partner and a .10 probability that he or she will leave the firm. There is a .05 probability
that a partner will leave the firm. The firm never demotes a lawyer. [Win04, p 943]
1. What is the average length of time that a newly hired junior lawyer spends working
for the firm?
2. What is the probability that a junior lawyer will be promoted to partner before
leaving the firm?
3. What is the average length of time that a partner spends with the firm (as a
partner)?
Transition Matrix
Transition Diagram
STAT600 Page 222

#Define transition matrix
S = c("J", "S", "P", "LeaveNP", "LeaveP")
M=length(S)
P = matrix(c(
.80, .15 , 0 , .05 , 0,
0 , .70 , .20 , .10 , 0,
0 , 0 , .95 , 0 , .05,
0 , 0 , 0 , 1 , 0,
0 , 0 , 0 , 0 , 1),
ncol = M, nrow = M, byrow = TRUE
)
dimnames(P) = list(S, S)
library(shape)
library(diagram)
plotmat(t(P), curve=0.1, pos=c(3,2), box.size=0.08)
0.8 0.7 0.95
J S P
0.15 0.2
0.1
0.05 0.05
1 1
LeaveNP LeaveP
Transient states:
Absorbing states:
Q= R= I=
STAT600 Page 223

STAT600 Page 224
Compute (I − Q)
     
1 0 0 0.8 0.15 0 0.2 −0.15 0
I − Q = 0 1 0 −  0 0.7 0.2  =  0 0.3 −0.2
0 0 1 0 0 0.95 0 0 0.05
Find inverse, (I − Q)−1 (using Gauss-Jordan or solve() in R).
 
5 2.5 10
(I − Q)−1 = 0 10/3 40/3
0 0 20
1. What is the average length of time that a newly hired junior lawyer spends working
for the firm?
Expected time junior lawyer spends with firm = expected time as a junior +
expected time as a senior +
expected time as a partner
−1 −1 −1
=(I − Q)11 + (I − Q)12 + (I − Q)13
=5 + 2.5 + 10
=17.5 years
2. What is the probability that a junior lawyer will be promoted to partner before
leaving the firm?
   
5 2.5 10 0.05 0
(I − Q)−1 R = 0 10/3 40/3 ·  0.1 0 
0 0 20 0 0.05
Leave as NP Leave as P
 
Junior 0.5 0.5
= Senior  1/3 2/3 
Partner 0 1
3. What is the average length of time that a partner spends with the firm (as a partner)?
STAT600 Page 225

Finding the Markov chain’s fundamental matrix with R.
#Define transition matrix

S = c("J", "S", "P", "LeaveNP", "LeaveP")
M=length(S)
P = matrix(c(
.80, .15 , 0 , .05 , 0,
0 , .70 , .20 , .10 , 0,
0 , 0 , .95 , 0 , .05,
0 , 0 , 0 , 1 , 0,
0 , 0 , 0 , 0 , 1),
ncol = M, nrow = M, byrow = TRUE
)
P
## J S P LeaveNP LeaveP
## J 0.8 0.15 0.00 0.05 0.00
## S 0.0 0.70 0.20 0.10 0.00
## P 0.0 0.00 0.95 0.00 0.05
## LeaveNP 0.0 0.00 0.00 1.00 0.00
## LeaveP 0.0 0.00 0.00 0.00 1.00
#Check rows sum to 1

abs(rowSums(P)-1) < 1e-10
## J S P LeaveNP LeaveP
## TRUE TRUE TRUE TRUE TRUE
#Alternative method
all.equal(rowSums(P), rep(1,5), check.names = FALSE)
## [1] TRUE
#Define matrices
Q = P[1:3, 1:3]
R = P[1:3, 4:5]
I = diag(dim(Q)[1])
#To get inverse

N = solve(I-Q)
N
## J S P
## J 5 2.5000 10.000
## S 0 3.3333 13.333
## P 0 0.0000 20.000
fractions(N)
## J S P
## J 5 5/2 10
## S 0 10/3 40/3
## P 0 0 20
STAT600 Page 226

# Multiply inverse by R
N %*% R
## LeaveNP LeaveP
## J 0.50000 0.50000
## S 0.33333 0.66667
## P 0.00000 1.00000
R demo
#Check that N really is the inverse of (I-Q)

#a matrix A multipled by its inverse B equals the identity matrix I
#AB=BA=I, or in this case (I-Q)N=I
(I-Q)%*%N
## J S P
## J 1 5.5511e-17 0
## S 0 1.0000e+00 0
## P 0 0.0000e+00 1
# Check max difference between (I-Q)N and I

max(abs((I-Q)%*%N -diag(3)))
## [1] 5.5511e-17
max(abs(N%*%(I-Q) -diag(3)))
## [1] 1.1102e-16
#Alternative method
all.equal((I-Q)%*%N , diag(3), check.attributes = FALSE)
## [1] TRUE
all.equal(N%*%(I-Q) , diag(3), check.attributes = FALSE)
## [1] TRUE
STAT600 Page 227

Alternative formulation of the lawyer problem
S = c("J", "S", "P", "LeaveJ", "LeaveS", "LeaveP")

M = length(S)
P = matrix(c(
.80, .15 , 0 , .05 , 0 , 0,
0 , .70 , .20 , 0 , .10 , 0,
0 , 0 , .95 , 0 , 0 , .05,
0 , 0 , 0 , 1 , 0 , 0,
0 , 0 , 0 , 0 , 1 , 0,
0 , 0 , 0 , 0 , 0 , 1), ncol = M, nrow = M, byrow = TRUE
)
P
## J S P LeaveJ LeaveS LeaveP

## J 0.8 0.15 0.00 0.05 0.0 0.00
## S 0.0 0.70 0.20 0.00 0.1 0.00
## P 0.0 0.00 0.95 0.00 0.0 0.05
## LeaveJ 0.0 0.00 0.00 1.00 0.0 0.00
## LeaveS 0.0 0.00 0.00 0.00 1.0 0.00
## LeaveP 0.0 0.00 0.00 0.00 0.0 1.00
#Define matrices
Q = P[1:3, 1:3]
R = P[1:3, 4:6]
I = diag(dim(Q)[1])
#To get inverse

(N = solve(I-Q))
## J S P
## J 5 2.5000 10.000
## S 0 3.3333 13.333
## P 0 0.0000 20.000
# Multiply inverse by R
N %*% R
## LeaveJ LeaveS LeaveP

## J 0.25 0.25000 0.50000
## S 0.00 0.33333 0.66667
## P 0.00 0.00000 1.00000
STAT600 Page 228

10.3.4 Simulating an Absorbing Markov Chain Using R
Drunkard’s Walk Problem
Example 10.6 Consider an application of the gambler’s ruin problem in section 9.3.2
called the “drunkard’s walk”. A patron of a local drink establishment is walking home
and is stumbling along. He is equally likely to stumble left or right. To get home he must
cross a bridge. The bridge is 10 steps long. Unfortunately, if he stumbles 5 steps right he
will fall off the bridge into the water, if he stumbles 5 steps left he will hit a fence which
will propel him back towards the middle of the bridge in his next step. Assuming that he
starts in the middle of the bridge, what is the probability that he will fall off the bridge
before reaching the other side?
Simulate this Markov chain using R. Repeat the simulation a large number of times.
#Specify states
S <- -5:5
M <- length(S)
xWater <- min(S)

xFence <- max(S)
#Specify transition matrix

p <- 0.5
P <- matrix(0, nrow=M, ncol=M)
P[1,1] <- 1
P[M,M-1] <- 1
for(i in 2:(M-1)){
P[i, i+1] <- p
P[i, i-1] <- 1-p
}
print(round(P, 4))
## -5 -4 -3 -2 -1 0 1 2 3 4 5
## -5 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## -4 0.5 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## -3 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## -2 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0
## -1 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 0.0
## 0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0
## 1 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0
## 2 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0
## 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0
## 4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5
## 5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
set.seed(123465)
#Determine number of simulations

numSimulations = 1000
STAT600 Page 229

maxNumSteps = 10
#Initialize vector
fallInWater <- rep(NA, numSimulations)
x0 <- 0
for(j in 1:numSimulations){
#Specify starting value x0 and assign to first element of x
x <- rep(NA, maxNumSteps +1)
names(x) <- 0:maxNumSteps
x[1] <- x0
#For each simulation run

for(i in 1:maxNumSteps){
#Select row
rowIndex <- which(x[i]==S)
#Sample using probablities from selected row and assign to x[i+1]

x[i + 1] <- sample(S, 1, prob=P[rowIndex,], replace=TRUE)
if(x[i+1]==xWater){
fallInWater[j] <- TRUE
#break; #uncomment to stop simulation once in water
}
}
}
#Compute number of times the patron ended up in the water

table(fallInWater)
## fallInWater
## TRUE
## 106
## 0 1 2 3 4 5 6 7 8 9 10
## 0 -1 -2 -3 -4 -5 -5 -5 -5 -5 -5
#Plot trajectory
plot(names(x), x, type='s', xlab="Time",
ylab='State', ylim=c(xWater, xFence))
abline(h=xWater, col="blue", lwd=6)
abline(h=xFence, col="gray", lwd=6)
STAT600 Page 230

4
2
State
0
−2
−4
0 2 4 6 8 10
Time
STAT600 Page 231

10.4 Applications of Markov Chains
There are many applications of Markov chains. Some common examples include:
• Gambler’s Ruin
• Work force planning
• Random Walk (can be used for stock prices)
• Inventory planning
• and many more!
We will now example two particular applications.
10.4.1 Simple Random Walk

A Simple Random Walk is a Markov chain whose state space is the set of all integers
and whose transition probabilities are given by:
Pi,i+1 = p = 1 − Pi,i−1 i = 0, ±1, ±2, . . .
for some 0 < p < 1.
At each stage the process increases or decreases by 1.
The probability that the chain will be back in its initial state after 2n transitions is given
by [Ros02, p 111]:

2n 2n n (2n)!
P0,0 = p (1 − p)n = [p(1 − p)]n
n n!n!
Using this result it can be shown:
• If p 6= 0.5 then the chain is transient.
• If p = 0.5 then the chain is recurrent and the chain is called a symmetric random walk.
Example 10.7 Consider a simple random walk, with p = 0.4. Suppose this Markov
chain is in state 0 at time 0.
1. What is the probability that the Markov chain will be in state 0 at time 1?
P0,0 = 0
P0,0 = p = 0.4
STAT600 Page 232


2n 2n n (2n)!
P0,0 = p (1 − p)n = [p(1 − p)]n
n n!n!

10 2×5 10
P0,0 = P0,0 = 0.45 (1 − 0.4)5
5
= 0.20066
10
Answer: P0,0 = 0.20066
In R:
choose(10, 5)*(0.4^5)*(0.6^5)
## [1] 0.20066
dbinom(5, 10, 0.4)
## [1] 0.20066
4. What is the probability that the Markov chain will be in state 0 at an odd-numbered
state?
2n−1
P0,0 = 0, n = 1, 2, . . .
STAT600 Page 233

10.4.2 Inventory Models
Example 10.8 Consider an inventory system in which the sequence of events during
each period is as follows:
• We observe an inventory level i at the beginning of the period.
• If i ≤ 1, then 4 − i units are ordered. If i ≥ 2, then 0 units are ordered. Delivery

of all orders is immediate.
• With probability 13 , 0 units are demanded during the period; with probability 13 ,
1 unit is demanded during the period; with probability 13 , 2 units are demanded
during the period.
• We observe the inventory level at the beginning of the next period.
Let Xn be the inventory level at the start of period n.
Exercise: State the transition matrix for this Markov chain.
234
Chapter 11
Further Properties of Random

Variables
References:
[Ros13, §8.3]
[SY10, §8.4]
235
11.1 Examples
11.1.1 Normal Distribution
Example 11.1 A coffee machine has two mechanisms: one which provides milk and one
which provides coffee. Under the “flat white” setting, the amount of coffee dispersed is
normally distributed with mean 40 mls and standard deviation of 5 mls; and the amount
of milk dispersed is normally distributed with mean 100 mls and standard deviation of 8
mls. The amount of coffee and milk dispersed are independent. George places a 150ml
cup under this machine and requests a flat white. What is the probability that his cup
will overflow?
Explore using simulation
E_W <- 40 + 100

STD_W <- sqrt(5^2+8^2)
E_W; STD_W
## [1] 140
## [1] 9.434
x <- rnorm(1e5, 40, 5)

y <- rnorm(1e5, 100, 8)
w <- x + y
hist(x)
hist(y)
hist(w)
#P(W > 150) actual

1-pnorm(150, E_W, STD_W)
## [1] 0.14457
#P(W > 150) simulation

table(w > 150)
##
## FALSE TRUE
## 85615 14385
Histogram of x Histogram of y Histogram of w

20000
15000
20000
15000
15000
10000
Frequency
Frequency
Frequency
10000
10000
5000
5000
5000
0
20 30 40 50 60 80 100 120 140 100 120 140 160 180
x y w
Theoretical answers
X ∼ N (40, 25) and Y ∼ N (100, 64)
236
Let W be the total amount of liquid dispersed in a “flat white”, therefore W = X + Y .
W ∼ N (140, 89) (sum of 2 normal RV = normal)
P (W > 150) = 1 − P (W ≤ 150)

= 1 − pnorm(150, 140, sqrt(89))
= 0.14457
11.1.2 Exponential Distribution

Example 11.2 Sampling from the Exponential Distribution.
Suppose the number of vehicles that are owned by university students has an exponential
distribution with λ = 2.
• What is the expected number of vehicles owned by a randomly selected student?

What is the variance?
• A particular university paper has 36 students. What is the expected number of

vehicles owned by a student in this class? What is the standard deviation?

vehicles owned by all students in this class? What is the variance?
Explore with simulation
# Set parameters for exponential distribution

lam <- 2
n <- 36
numSims <- 1e4
x <- replicate(numSims, rexp(n, lam))
cat("lambda =", lam)
## lambda = 2
cat("n =", n)
## n = 36
# Plot x
cat("E(X) =", mean(x))
## E(X) = 0.50167
cat("SD(X) =", sd(x))
## SD(X) = 0.50243
STAT600 Page 237

hist(x, prob=TRUE)
curve(dexp(x, lam), col="red", add=TRUE)
# Plot sample means

hist(colMeans(x), prob=TRUE, ylim=c(0, 6))
#Add Normal Distribution

EXbar <- 1/lam
SDXbar <- (1/lam)/sqrt(n)
curve(dnorm(x, EXbar, SDXbar), col="red", add=TRUE)
cat("E(Xbar) =", EXbar, ", SD(Xbar)=", SDXbar)
## E(Xbar) = 0.5 , SD(Xbar)= 0.083333
# Plot sample sums

hist(colSums(x), prob=TRUE, ylim=c(0, .16))
#Add Normal Distribution

EY <- (1/lam)*n
SDY <- (1/lam)*sqrt(n)
cat("E(Y) =", EY, ", SD(Y)=", SDY)
## E(Y) = 18 , SD(Y)= 3
curve(dnorm(x, EY, SDY), col="red", add=TRUE)
Histogram of x Histogram of colMeans(x) Histogram of colSums(x)

6
1.2
0.15
5
1.0
0.10
0.8
Density
Density
Density
3
0.6
0.05
0.4
2
0.2
0.00
0.0
0 1 2 3 4 5 6 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10 15 20 25 30
x colMeans(x) colSums(x)
Theoretical answers
X ∼ Exp(2)
• What is the expected number of vehicles owned by a randomly selected student?

What is the standard deviation?
E[X] = µ = 1/λ = 0.5 vehicles
STD[X] = σ = 1/λ = 0.5 vehicles
VAR[X] = σ 2 = 1/λ2 = 0.25 vehicles2

vehicles owned by students in this class? What is the variance?
STAT600 Page 238

E[X̄] = E[X] = µ = 0.5 vehicles
√ √
STD[X̄] = σ/ n = 0.5/ 36 = 0.5/6 = 1/12 = 0.0833 vehicles

vehicles owned by all students in this class? What is the standard deviation?
Let Y = ni=1 Xi denote the total number of vehicles owned by the students.
P
E[Y ] = nµ = 36 × 0.5 = 18 vehicles

√ √
STD[X̄] = σ n = 0.5 36 = 0.5 × 6 = 3 vehicles
STAT600 Page 239

11.2 Sum of Random Variables
11.2.1 Sum of Independent Random Variables
Let Xi , i = 1, . . . , n be independent random variables and define:
Y = a1 X1 + a2 X2 + . . . an Xn
Then:
E[Y ] = a1 E[X1 ] + a2 E[X2 ] + . . . + an E[Xn ]
VAR[Y ] = a21 VAR[X1 ] + a22 VAR[X2 ] + . . . + a2n VAR[Xn ]
Example 11.3 Let W denote the number of faults observed each week on a particular
printer. The probability mass function of W is:
w 0 1 2
p(w) 0.8 0.15 0.05
1. Is W a discrete or continuous random variable?
2. Find E[W ].
3. Find VAR[W ].
STAT600 Page 240

4. Let the number of faults that occur in 1 year be defined as Y = 52
P
i=1 Wi , where
Wi are independent and identically distributed (i.i.d) random variables with mean
E[W ] and variance VAR[W ]. Find E[Y ] and STD[Y ].
11.2.2 Sum of Independent Normal Random Variables
The sum of independent normal random variables is also normally

distributed.
In particular:
Let Xi ∼ N (µi , σi2 ); i = 1, 2, . . . , n be independent random variables and define
Y = X 1 + X2 + · · · + Xn
It can be shown that:
Y ∼ N (µ1 + µ2 + · · · + µn , σ12 + σ22 + · · · + σn2 )

n
X n
X
E[Y ] = µi VAR[Y ] = σi2
i=1 i=1
STAT600 Page 241

Example 11.4 Suppose that the height of 10 year old children is normally distributed
with a mean of 140cm and a standard deviation of 5cm. A rugby field is 100m long. If
72 randomly selected 10-year olds were to lie end-to-end along a rugby field, what is the
probability that their combined height will be more than 100 metres?
Verify with simulation
# Define X
E_X <- 1.40
STD_X <- .05
# Define Y
n <- 72
E_Y <- n*E_X
VAR_Y <- n*STD_X^2
STD_Y <- sqrt(VAR_Y)
E_Y; VAR_Y; STD_Y
## [1] 100.8
## [1] 0.18
## [1] 0.42426
# Simulate X and Y
x <- replicate(1e4, rnorm(n, E_X, STD_X))
y <- colSums(x)
# Inspect simulated values

hist(x, prob=TRUE)
curve(dnorm(x, E_X, STD_X), col="red", add=TRUE)
hist(y, prob=TRUE)
curve(dnorm(x, E_Y, STD_Y), col="red", add=TRUE)
STAT600 Page 242

Histogram of x Histogram of y
0.8
6
0.6
Density
Density
4
0.4
2
0.2
0.0
0
1.2 1.3 1.4 1.5 1.6 99.0 99.5 100.0 100.5 101.0 101.5 102.0 102.5
x y
# P(Y > 100)

# Compute actual probability
1-pnorm(100, E_Y, STD_Y)
## [1] 0.97033
# Estimate probability using simulation

prop.table(table(y > 100))
##
## FALSE TRUE
## 0.0276 0.9724
STAT600 Page 243

11.3 Central Limit Theorem
11.3.1 Central Limit Theorem for Means and Sums
Let X1 , X2 , . . . , Xn be a random sample with E(Xi ) = µ and STD(Xi ) = σ

for all i.
The central limit theorem states that for sufficiently large n,
(a) the sum X1 + X2 + · · · + Xn has an approximate

√ normal distribution
with mean nµ and standard deviation σ n.
Pn
iX
(b) the sample mean X̄ = i=1n
has an approximate
√ normal distribution
with mean µ and standard deviation σ/ n.
This property was demonstrated using the student vehicles example.
STAT600 Page 244

11.3.2 Normal Approximation of Binomial Distribution
Let X ∼ Bin(n, p), and Y ∼ N (np, np(1 − p)), then for suitable n and p:
P (a ≤ X ≤ b) ≈ P (a − 0.5 < Y < b + 0.5)
• For large values of n and values of p that are not extreme, X will have an ap-
proximate
p normal distribution with mean µ = np and standard deviation σ =
np(1 − p).
• The approximation is improved by using a continuity correction, to account for

the fact that we are approximating a discrete distribution by a continuous one.
• Various rules are suggested for the required values of n and p. A good rule of thumb
is np > 5 and n(1 − p) > 5.
CLT and the Normal Approximation to the Binomial

approximation of W ∼ Bin(n, p) with a normal distribution with µ = np and
The p
σ = np(1 − p) is an application of the central limit theorem.
Recall that the Binomial distribution can be defined as follows:
• Let X ∈ {0, 1} represent the outcome of a Bernoulli trial.

p
• X has mean p and standard deviation p(1 − p).
Pn
Define W = i=1 Xi , where Xi are i.i.d. Bernoulli random variables. Then W ∼
Bin(n, p). Since W is a sum of X1 + X2 + · · · + Xnpso W is has an approximate normal
distribution with mean np and standard deviation np(1 − p).
STAT600 Page 245

Example 11.5 Let X ∼ Bin(1000, 0.05), and Y ∼ N (50, 47.5).
Use the normal approximation to estimate P (40 ≤ X ≤ 60).
P (40 ≤ X ≤ 60) ≈ P (40 − 0.5 < Y < 60 + 0.5)

= P (39.5 < Y < 60.5)
= P (Y < 60.5) − P (Y < 39.5)
60.5 − 50 39.5 − 50

=P Z< √ −P Z < √
47.5 47.5
= P (Z < 1.5235) − P (Z < −1.5235)
= 0.872
n <- 1000; p <- 0.05

mu <- n*p
sigma <- sqrt(n*p*(1-p))
mu; sigma
## [1] 50
## [1] 6.892
# Compute probability using pbinom (exact probability)

pbinom(60, n, p)-pbinom(39, n, p)
## [1] 0.87312
# Compute probability using dbinom (exact probability)

sum(dbinom(40:60, n, p))
## [1] 0.87312
# Compute probability using pnorm (approximate probability)

pnorm(60.5, mu, sigma) - pnorm(39.5, mu, sigma)
## [1] 0.87237
# Compute probability using pnorm (approximate probability)

# Without continuity correction
pnorm(60, mu, sigma) - pnorm(40, mu, sigma)
## [1] 0.85321
STAT600 Page 246

11.4 CLT and Markov Chains
Example 11.6 Recall the law firm example from chapter 10 .
Junior Senior Partner Leave as NP Leave as P

 
Junior .80 .15
.05 0 0
Senior 
 0 .70
.10 .20 0 

P = Partner   0 0 0 .95.05 

Leave as NP  0 0 1 0 0 
Leave as P 0 0 0 0 1
     
1 0 0 0.8 0.15 0 0.2 −0.15 0
I − Q = 0 1 0 −  0 0.7 0.2  =  0 0.3 −0.2
0 0 1 0 0 0.95 0 0 0.05
Find inverse, (I − Q)−1 (using Gauss-Jordan or solve() in R).
 
5 2.5 10
N = (I − Q)−1 = 0 10/3 40/3
0 0 20
See R code in notes for details.
P <- matrix(c(
0.8, 0.15, 0, 0.05, 0,
0, 0.7, 0.2, 0.1, 0,
0, 0, 0.95, 0, 0.05,
0, 0, 0, 1, 0,
0, 0, 0, 0, 1),

abs(rowSums(P)-1) < 1e-10
## [1] TRUE TRUE TRUE TRUE TRUE
S <- c("Junior", "Senior", "Partner", "LeaveNP", "LeaveP")

P
## Junior Senior Partner LeaveNP LeaveP

## Junior 0.8 0.15 0.00 0.05 0.00
## Senior 0.0 0.70 0.20 0.10 0.00
## Partner 0.0 0.00 0.95 0.00 0.05
## LeaveNP 0.0 0.00 0.00 1.00 0.00
## LeaveP 0.0 0.00 0.00 0.00 1.00
absorbStates <- S[4:5]

transientStates <- S[1:3]
numTransStates <- length(transientStates)
Q <- P[transientStates, transientStates]

R <- P[transientStates, absorbStates]
STAT600 Page 247

I <- diag(numTransStates)
N <- solve(I-Q)
N
## Junior Senior Partner

## Junior 5 2.5000 10.000
## Senior 0 3.3333 13.333
## Partner 0 0.0000 20.000
expTimeTillAbsorb <- N%*%rep(1, dim(N)[1])

varTimeTillAbsorb <- (2*N - I)%*%expTimeTillAbsorb - expTimeTillAbsorb^2
(Note: computation of the variance of the time till absorption is not assessed, but is
included here for completion).
Time Until Absorption

Average Time Until Absorption
What is the average length of time that a newly hired junior lawyer spends
working for the firm?
Let T denote the time that a junior lawyer spends working for the firm. T is a discrete
random variable. We seek E[T ].
E[T ] = N1,1 + N1,2 + N1,3

where Ni,j is the average amount of time that a lawyer in position i will spend in position
j before leaving the firm.
Expected time junior lawyer spends with firm =N1,1 + N1,2 + N1,3
=5 + 2.5 + 10
=17.5 years
Distribution of Time Until Absorption

We have computed the expected time E[T ] a junior lawyer spends with the firm. What
is the distribution of T ?
One way to investigate this is by simulating this Markov chain repeatedly and inspecting
the results.
Define values and compute theoretical answers
STAT600 Page 248

## Law firm simulation
P <- matrix(c(
0.8, 0.15, 0, 0.05, 0,
0, 0.7, 0.2, 0.1, 0,
0, 0, 0.95, 0, 0.05,
0, 0, 0, 1, 0,
0, 0, 0, 0, 1),
S <- c("Junior", "Senior", "Partner", "LeaveNP", "LeaveP")

P

rowSums(P)
all.equal(rowSums(P) , rep(1, nrow(P)), check.names=FALSE, check.attributes=FALSE)
# Define states
transientStates <- S[1:3]
absorbStates <- S[4:5]
Q <- P[1:3, 1:3]
R <- P[1:3, 4:5]
I <- diag(3)
# Compute theoretical results

N <- solve(I-Q)
N
expTimeTillAbsorb <- N%*%rep(1, dim(N)[1])
varTimeTillAbsorb <- (2*N - I)%*%expTimeTillAbsorb - expTimeTillAbsorb^2
Define function for simulation
lawFirmSim <- function(numLawyers, maxYears, x0, S, P,

transientStates, absorbStates,
verbose = TRUE){
# Define dataframe to store info

law <- data.frame(lawyerNum=1:numLawyers,
positionLeftFirm=NA,
timeAtFirm=NA,
timeBecamePartner=NA,
timeAsPartner=NA
)
for(j in 1:numLawyers){
#Print i after 10% of lawyers
if(j %% (numLawyers/10)==0 & verbose) cat("Sim", j)
#Define state vector

x <- numeric(maxYears+1)*NA
names(x) <- 0:maxYears
# Set initial state
STAT600 Page 249

x["0"] <- x0
i <- 1
# Simulate the career of lawyer j

while(x[i] %in% transientStates){
currentState <- x[i]
#Identify row corresponding to current state

rowIndy <- which(currentState == S)
#Simulate next state

nextState <- sample(S, 1, prob= P[rowIndy,])
x[i+1] <- nextState
#Check if chain is in an absorbing states

if(nextState =="Partner" && currentState !="Partner"){
law[j, "timeBecamePartner"] <- i
}
else if(nextState %in% absorbStates){
#Record time and state when lawyer j leaves

law[j, "timeAtFirm"] <- i
law[j,"positionLeftFirm"] <- nextState
break;
}#END ELSE
i <- i+1
}#END WHILE
}#END j
# Compute time as partner

law[,"timeAsPartner"] <- law[,"timeAtFirm"] - law[,"timeBecamePartner"]
return(list(law=law, x = x))
}# END lawFirmSim
Run simulation
#Run Simulation
set.seed(123456789)
num_lawyers <-1000
max_years <- 500
# Define initial position, e.g. lawyer starts as a Junior

x_0 <- "Junior"
law_data <- lawFirmSim(numLawyers = num_lawyers, maxYears = max_years,
x0 = x_0,
P = P,
S = S,
transientStates = transientStates,
absorbStates = absorbStates,
STAT600 Page 250

verbose = TRUE)
# Inspect careers for first few lawyers

head(law_data$law)
tail(law_data$law)
Inspect data
# Inspect career of lawyer j=numLawyers
lengthOfCareer_finalLawyer <- min(which(is.na(law_data$x)))-2
law_data$x[lengthOfCareer_finalLawyer+2] <- law_data$x[lengthOfCareer_finalLawyer+1] #for nicer plot
law_data$x <- na.omit(law_data$x)
plot(names(law_data$x), factor(law_data$x, levels=S),
type="s", xlab="Years", ylab="State",
yaxt="n", ylim=c(1, length(S)))
axis(2, labels=S, at=1:length(S))
LeaveP
LeaveNP
Partner
State
Senior
Junior
0 5 10 15
Years
# Compare theoretical results to simulated results for sample of size, numLawyers -----
#################################
## How long does a junior spend at the firm? ----
#Theoretical Answer
expTimeTillAbsorb["Junior",]
varTimeTillAbsorb["Junior",]
#Simulated Answer
mean(law_data$law$timeAtFirm)
var(law_data$law$timeAtFirm)
#Inspect distribution of time a junior spends at the firm

hist(law_data$law$timeAtFirm, xlab="Years",
main="Time spent at firm by Junior")
#################################
## What is the probability that a junior lawyer will be promoted to partner before leaving? ----
#Theoretical Answer
(N%*%R)["Junior", "LeaveP"]
STAT600 Page 251

#Simulated Answer
prop.table(table(law_data$law$positionLeftFirm))['LeaveP']
#################################
## Average length of time a partner spends with the firm. -----
#Theoretical Answer
N["Partner", "Partner"]
expTimeTillAbsorb["Partner",]
varTimeTillAbsorb["Partner",]
#Simulated Answer
mean(law_data$law$timeAsPartner, na.rm=TRUE)
var(law_data$law$timeAsPartner, na.rm=TRUE)
#Inspect distribution of time a partner spends at the firm

hist(law_data$law$timeAsPartner, xlab="Years",
main="Time spent at firm by Partner")
summary(law_data$law$timeAsPartner)
#################################
Time spent at firm by Junior Time spent at firm by Partner

500
300
400
250
200
300
Frequency
Frequency
150
200
100
100
50
0
0 50 100 150 0 50 100 150
Years Years
STAT600 Page 252

Distribution of Average Time Until Absorption
Simulation allows us to estimate E[T ]. Each time we simulate we will get a different
value of E[T ] because E[T ] is also a random variable.
R demo
Discussion Is E[T ] a discrete or continuous random variable?
Discussion How can we explore the distribution of E[T ]?
STAT600 Page 253

#Run Simulation
sd <- 29038423
set.seed(sd)
num_lawyers <- 100 # increase this number
max_years <- 500
num_reps <- 30 # increase this number
# Define dataframe
meanTime <- data.frame(firmNum=1:num_reps,
meanTimeAtFirm=NA,
meanTimeAsPartner=NA)
for(k in 1:num_reps){
#Print k after 10% of cohorts
if(k %% (num_reps/10)==0) cat("Firm", k, "\n")
law_data <- lawFirmSim(numLawyers = num_lawyers, maxYears = max_years,
x0 = x_0,
P = P,
S = S,
transientStates = transientStates,
absorbStates = absorbStates, verbose = FALSE)
# Compute mean time at firm and as partner for cohort k
meanTime$meanTimeAtFirm[k] <- mean(law_data$law[,"timeAtFirm"], na.rm=TRUE)
meanTime$meanTimeAsPartner[k] <- mean(law_data$law[,"timeAsPartner"], na.rm=TRUE)
}#END k
save(P, S, num_reps, num_lawyers, meanTime, sd, law_data,

file = paste0("STAT600_week11_lawFirm_CLT_sim_", num_lawyers, "_", num_reps, ".Rdata"))
Analyse results
##################
# How long does a junior spend at the firm? -----
# T = time junior spends at firm

# E(T) and Var(T)
expTimeTillAbsorb["Junior",]; varTimeTillAbsorb["Junior",]
# Distribution of "Tbar"
hist(meanTime$meanTimeAtFirm, prob=TRUE,
main=substitute(
paste(bar(T), " ~ N(",
mu, "=", m,
", ", sigma^2, "=", s2, ") ",
", n =", n,
" lawyers, n_reps = ", nc),
list(m = expTimeTillAbsorb["Junior",],
s2 = round(varTimeTillAbsorb["Junior",]/num_lawyers, 3),
n = num_lawyers,
nc = num_reps)),
xlab="Years")
STAT600 Page 254

# By CLT, Tbar ~ N(E(T), Var(T)/n)
curve(dnorm(x, expTimeTillAbsorb["Junior",],
sqrt(varTimeTillAbsorb["Junior",]/num_lawyers)),
col="red", add=TRUE)
##################
T ~ N(µ=17.5, σ2=3.346) , n =100 lawyers, n_reps = 30
0.25
0.20
0.15
Density
0.10
0.05
0.00
14 15 16 17 18 19 20 21
Years
This example has a small number of lawyers and a small number of firms/repetitions.
We will now explore what happens as both of these values increase.
load("STAT600_week11_lawFirm_CLT_sim_100_1000.Rdata")
head(meanTime$meanTimeAtFirm)
## [1] 18.92 17.91 15.06 17.92 16.88 19.35
hist(meanTime$meanTimeAtFirm, main=paste("Mean time at firm over n =",

num_lawyers, "lawyers, \n (number of repetitions = ",
num_reps, ")"), xlim=c(10, 25))
STAT600 Page 255

Mean time at firm over n = 100 lawyers,
(number of repetitions = 1000 )
200
150
Frequency
100
50
0
10 15 20 25
meanTime$meanTimeAtFirm
By the central limit theorem E(T ) is approximately normally distributed.
main=paste("Mean time at firm over n =",
num_reps, ")"),
xlab="Years", ylim=c(0,0.4), xlim=c(10, 25))
# Plot normal curve

curve(dnorm(x, expTimeTillAbsorb[1],
sqrt(varTimeTillAbsorb[1]/num_lawyers)),
from = 10, to = 25, col="red", add = TRUE)

0.4
0.3
Density
0.2
0.1
0.0
10 15 20 25
Years
main=paste("Mean time at firm over n =",
STAT600 Page 256

num_reps, ")"),
xlab="Years", ylim=c(0,0.8), xlim=c(10, 25))
# Plot normal curve

curve(dnorm(x, expTimeTillAbsorb[1],
sqrt(varTimeTillAbsorb[1]/num_lawyers)),
from = 10, to = 25, col="red", add = TRUE)

0.8
0.6
Density
0.4
0.2
0.0
10 15 20 25
Years
Example 11.7 Law Firm Example
Let Ti denote the time a newly appointed junior spends with the firm.
Therefore, let T1 , T2 , T3 , . . . , Tn be independent and identically distributed random vari-

ables with mean E[Ti ] = µ and VAR[Ti ] = σ 2 .
Define T̄n = n1 ni=1 Ti as the mean time that a group of n newly appointed juniors spend
P
with the firm.
By the central limit theorem, as n → ∞, T̄n ∼ N (µ, σ 2 /n).

In this example µ = 17.5 and σ 2 = 334.583.
• For a sample of size n = 100 employees, the sample mean time at the firm is
approximately normally distributed with:
– mean E[T̄ ] = µ = 17.5,

– variance VAR[T̄ ] = σ 2 /100 = 3.3458 and
– standard deviation STD[T̄ ] = 1.8292.
• For a sample of size n = 1000 employees, the sample mean time at the firm is
approximately normally distributed with:
– mean E[T̄ ] = µ = 17.5,
STAT600 Page 257

– variance VAR[T̄ ] = σ 2 /1000 = 0.3346 and
– standard deviation STD[T̄ ] = 0.5784.
As n → ∞, VAR[T̄ ] decreases.
load("STAT600_week11_lawFirm_CLT_sim_100_1000.Rdata")

0.4
0.3
Density
0.2
0.1
0.0
10 15 20 25
Years
STAT600 Page 258

11.5 Bivariate distributions
11.5.1 Discrete random variables
Let X and Y be two discrete random variables. Then, the joint probability
mass function of two X and Y is:
p(x, y) = P (X = x, Y = y)
where (X = x, Y = y) denotes the intersection of events X = x and Y = y.
The marginal probability mass functions of X and Y are:

X
pX (x) = P (X = x) = P (X = x, Y = y)
y
X
pY (y) = P (Y = y) = P (X = x, Y = y)
x
The conditional probability mass function of X given Y = y is:
p(x, y)
P (X = x|Y = y) = , for all y such that pY (y) > 0
pY (y)
and similarly, of Y given X = x is:
p(x, y)
P (Y = y|X = x) = , for all x such that pX (x) > 0
pX (x)
Discrete random variables, X and Y , are independent if and only if:
p(x, y) = pX (x)pY (y) for all x, y (11.1)
STAT600 Page 259

Let X and Y be two random variables. Then, the joint distribution
function of X and Y is:
F (x, y) = P ((X ≤ x) ∧ (Y ≤ y))
Example 11.8
A manufacturing company in concerned about the number of defects on its production
lines. Let:
• Let X denote the number of defects per hour on production line 1
• Let Y denote the number of defects per hour on production line 2
The two production lines operate independently, so it can be assumed that the random
variables X and Y are independent.
x 0 1 2
p(x) 0.5 0.3 0.2
F (x) = P (X ≤ x) 0.5 0.8 1

y 0 1
p(y) 0.9 0.1
F (y) = P (Y ≤ y) 0.9 1
1. What is the probability that there are no defects in 1 hour?

p(0, 0) = pX (0)pY (0)
= (0.5)(0.9)
= 0.45
The joint probability mass function can be represented in a table:
p(x, y) 0 1
0 0.45 0.05
1 0.27 0.03
2 0.18 0.02
2. What is the probability that both production lines produce exactly 1 defect each?
STAT600 Page 260

3. What is the probability that there are 2 or more defects in 1 hour across both
machines?
4. What is the probability that there are 2 defects on line 1 given that there is 1 defect
on line 2?
5. Find the marginal probability mass function of X.
6. Find the marginal probability mass function of Y .
STAT600 Page 261

7. Specify the joint distribution function of X and Y
STAT600 Page 262

11.5.2 Continuous random variables
A function f (x, y) is the joint probability density function for the

continuous variables X and Y if:
• f (x, y) ≥ 0 for all x, y ∈ R

R∞ R∞
• f (x, y)dx dy = 1
−∞ −∞
• For any region B,

ZZ
P ((X, Y ) ∈ B) = f (x, y)dxdy
{x,y:(x,y)∈B}

Let X and Y be two random variables. Then, the joint distribution function
of X and Y is:
P ((X ≤ x) ∧ (Y ≤ y)) = F (x, y)
Let X and Y be two jointly continuous random variables with joint cdf F and
joint pdf f , then X and Y are continuous random variables with marginal pdfs:
Z ∞
fX (x) = f (x, y)dy, x∈R
−∞
Z ∞
fY (y) = f (x, y)dx, y∈R
−∞
263
Appendix A
Useful Formula
264
Mathematical
n
X n
Binomial Theorem: ax bn−x = (a + b)n , n ∈ Z+
x=0
x
n
X n x
a = (1 + a)n , n ∈ Z+
x=0
x
∞
X xn
Exponential Series: = ex
n=0
n!
n−1
X a(1 − rn )
Geometric Series: arx = , for r 6= 1
x=0
(1 − r)
∞
X a
Infinite Geometric Series: arx = , for |r| ≤ 1
x=0
(1 − r)
n
X n[2a + (n − 1)b]
Arithmetic Series: a + (x − 1)b =
x=0
2
∞
X (−1)(n+1) xn
Logarithmic Series: = ln(1 + x), −1 < x ≤ 1
n=1
n
f 00 (a)
Taylor’s Series:f (x) = f (a) + (x − a)f 0 (a) + (x − a)2 + ...
2!
for small|x − a|
Z ∞
Gamma function: Γ(α) = y α−1 e−y dy
0
Γ(1) = 1, Γ(α) = (α − 1)Γ(α − 1) for α > 1
Γ(n) = (n − 1)! for any positive integer n
STAT600 Page 265

Differentiation Integration
Z
d n
(x ) = nxn−1 a dx = ax + C
dx
d axn+1
Z
(f (x) · g(x)) = f 0 (x)g(x) + f (x)g 0 (x) axn dx = + C, n 6= −1
dx n+1
f 0 (x)g(x) − f (x)g 0 (x)
Z
d f (x) 1
= dx = ln |x| + C
dx g(x) g(x)2 x
Z Z Z
d
(f (g(x))) = f 0 (g(x))g 0 (x) (f (x) + g(x))dx = f (x)dx + g(x)dx + C
dx
Z Z Z
d 1
(ln x) = (f (x) − g(x))dx = f (x)dx − g(x)dx + C
dx x
g 0 (x)
Z
d
(ln(g(x))) = ex dx = ex + C
dx g(x)
d x ef (x)
Z
(e ) = ex ef (x) dx = +C
dx f 0 (x)
d f (x)
Z Z
e = f 0 (x)ef (x) u dv = uv − vdu + C
dx
Z Z Z Z
0
u v dx = u v dx − u v dx dx + C
STAT600 Page 266

A.1 Probability
P (A∪B) = P (A)+P (B)−P (A∩B)
P (A ∩ B)
P (A|B) =
P (B)
P (A0 ) = 1−P (A)
P (A) = P (A|B)P (B)+P (A|B 0 )P (B 0 )
P (A|B)P (B)
P (B|A) =
P (A|B)P (B) + P (A|B 0 )P (B 0 )
P (A∩B|C) = P (A|C)P (B|C)
Related results:
• If Y is a function of a discrete random variable X, whereby Y = Q(X), then:

X
E(Y ) = E(Q(X)) = Q(x)p(x)
x
• If Y = a + bX, then E(Y ) = a + bE[X].
• If Y = a + bX, then VAR(Y ) = b2 VAR[X].
• If X is a random variable and µ = E[X], then
VAR[X] = E[X 2 ] − µ2
n
X
• If Xi , i = 1, . . . , n are independent random variables and W = ai Xi , then:
i=1
n
X n
X
E[W ] = ai E[Xi ] and VAR[W ] = a2i VAR[Xi ]
i=1 i=1
STAT600 Page 267

Distribution Parameters Probability Mass Function Cumulative Expected Value Variance
Distribution
Function
X
Any P (X = x) E[X] = xp(x)
x 2
P (X ≤ x) = VAR[X] = E[(X − E[X])2 ] =
STAT600
discrete X x E[X 2 ] − (E[X]) , where
P (X = k) X
random n n
E[X ] = x p(x)
k=−∞
variable X x
n

Binomial X ∼ Bin(n, p) P (X = x) = x px (1 − p)n−x , x = 0, 1, . . . , n, where E[X] = np VAR[X] = np(1 − p)
n n!

n = 1, 2, . . ., x = x!(n−x)!
0≤p≤1
1−p 1−p
Geometric X∼ P (X = x) = p(1 − p)x , x = 0, 1, 2, . . . F (x) = P (X ≤ E[X] = p VAR[X] = p2
Geometric(p), x) = 1−(1−p)x+1 ,
0≤p≤1 x = 0, 1, 2, . . .
r(1−p) r(1−p)
Negative X∼ E[X] = p VAR[X] = p2
Binomial Neg Bin(r, p), (
x+r−1

r−1 if x = 0, 1, . . .
P (X = x) =
0 ≤ p ≤ 1, r > 0 pr (1 − p)x
0 otherwise
r r(1−p)
Y ∼ E[Y ] = p VAR[Y ] = p2
Neg Bin(r, p), (
y−1

r−1 if y = r, r + 1, . . .
P (Y = y) =
0 ≤ p ≤ 1, r > 0 pr (1 − p)y−r
0 otherwise

k
Hyper- E[X] = n VAR[X]
N
X∼ =
geometric Hypergeometric,  k
N −k
k k
n
N −n
N = 0, 1, . . .,  x n−x N N
1−
 for x =0, 1, . . . , k;
N
N −1
k = 0, 1, . . . , N , P (X = x) = n
with ab = 0 if a > b

n = 0, 1, . . . , N 
0 otherwise
λk −λ
Poisson P (X = k) = e k = 0, 1, 2, . . . E[X] = λ VAR[X] = λ
k!
X ∼ Pois(λ)
Page 268
Distribution Parameters Probability Density Function Cumulative Expected Value Variance
Distribution
Function
R∞
STAT600
Any f (x) E[X] = −∞ xf (x)dx
x 2
F (x) = P (X ≤ x) =
Z VAR[X] = E[(X − E[X])2 ] =
continuous Special Case where
f (x)dx ∞
E[X 2 ] − (E[X])
Z
random −∞ Non-negative
R ∞ RV: E[X n ] = xn p(x)
variable X E[X] = 0 P (X > −∞
x)dx
α
Beta E[X] = VAR[X] =
α+β
X ∼ Beta(α, β),
α > 0, β > 0, ( αβ
Γ(α+β) α−1
Γ(α)Γ(β) x (1 − x)β−1 0≤x≤1 (α + β)2 (α + β + 1)
f (x) =
0 elsewhere
Exponential X ∼ Exp(λ), λ > 0 f (x) = λe−λx , x > 0 F (X) = 1 − e−λx , E[X] = 1/λ VAR[X] = 1/λ2
x≥0
xα−1 λα e−λx
Gamma X ∼ Gamma(α, λ), f (x) = Γ(α) , x ≥ 0; E[X] = α/λ VAR[X] = α/λ2
α, λ > 0
1 x−µ 2
Normal fX (x) = √ 1 e− 2 ( σ ) , E[X] = µ, VAR[X] = σ 2
2πσ 2
X ∼ N (µ, σ 2 ), x ∈ R,
σ>0
(
1
d−c , x−c (c+d) (d−c)2
Uniform f (x) = F (x) = E[X] = VAR[X] =
c≤x≤d
X ∼ Unif(c, d) d−c , c≤x≤d 2 12
0, otherwise.
α
α x
( α−1
θ θ e−(x/θ) 1

Weibull f (x) = F E[X] = θΓ 1 + VAR[X] =
x≥0
X ∼ Weibull(α, θ) ((x; θ, α) = α α
α > 0, θ > 0 0 x<0 2
2
1 − e−(x/θ) x≥0 θ Γ 1 + α2 − Γ 1 + α1
0 x<0
(λx)k−1
Erlang X ∼ Erlang(k, λ) f (x) = λe−λx F (x) = P (X ≤ x) = E(X) = k/λ VAR(X) = k/λ2
(k − 1)! k−1
x ≥ 0; k = 1, 2, . . .; X (λx)i
λ>0 e−λx
i!
1−
i=0
Page 269
Appendix B
Useful R commands
This document provides an overview of useful R commands.
B.1 Opening R on SECMS computers

• Start Menu > Type “R Studio”
B.2 Using R as a Calculator

+ Addition
- Subtraction
* Multiplication
/ Division
^ Powers
%*% matrix multiplication.
Examples
2+2
## [1] 4
(1+4)*(3-2)
## [1] 5
10^6
## [1] 1000000
1/10000
## [1] 0.0001
270
B.3 Assignments
y <- 2 y becomes 2
2 -> y 2 goes to y
y = 2 y becomes 2
B.4 Sequences
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
700:690
## [1] 700 699 698 697 696 695 694 693 692 691 690
seq(1, 7, by=0.5)
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
B.5 Logical Tests

x == y Equality (Note the ‘double’ equals!)
x >= y Greater than or equal to (similar for less than or equal ≤)
x > y Strictly greater than (similar for less than <)
which(x > y) Indices of elements meeting logical statement
x %in% y Assesses if elements of x are in y
Examples
2>5
## [1] FALSE
(1:10) > 4.4
## [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
## [10] TRUE
x <- 1:10
table(x < 5)
##
## FALSE TRUE
## 6 4
which(x > 5)
STAT600 Page 271

## [1] 6 7 8 9 10
"a" %in% c("a", "b", "c")
## [1] TRUE
c(1:4, 100) %in% 1:10
## [1] TRUE TRUE TRUE TRUE FALSE
B.6 In-built datasets

There are a large number of in-built datasets. To view a list of them:
data()
library(help = "datasets")
Examples:
cars
iris
B.7 Creating data

To place a group of elements into a vectorx use the concatenate functionc(), e.g.
x <- c(-1, 0.33, 104)
For a vector x:
x[n] returns the nth element.
x[-n] returns a vector with the nth element removed.
x[n:m] returns a vector consisting of elements in the range n to m.
To place a group of elements into a matrix x, use the matrix function to create an n x m
matrix:
matrix(x, nr = n, nc = m, byrow=FALSE)
For a matrix x:
STAT600 Page 272

x[i,j] returns the element in the ith row and jth column.
x[i,] returns a vector which consists of the elements from the ith
row.
x[,j] returns a vector which consists of the elements from the jth
column.
x[-i,] returns a matrix with the ith row removed.
x[,-j] returns a matrix with the jth column removed.
rowSums(x) returns row sums of x
colSums(x) returns column sums of x
sum(x) returns sum of all elements in x
To create a dataframe:
mydf = data.frame(name1=x, name2=y, name3=z)
Creates a dataframe from vectors x, y, z
mydf$name1 returns the nth element.
mydf[,"name"] returns a vector with the nth element re-
moved.
Examples
x <- c(-1, 0.33, 104)

x
## [1] -1.00 0.33 104.00
x[2]
## [1] 0.33
indices =c(3,2,2)
x[indices]
## [1] 104.00 0.33 0.33
STAT600 Page 273

B.8 Statistical Functions
sqrt(x) Square root of x
sum(x) Sums all elements in a vector x
length(x) Length of a vector x
mean(x) Mean of the elements in the vector x
var(x) Variance of the elements in the vector x
sd(x) Standard deviation of the elements in the vector x
median(x) Median of the elements in the vector x
quantile(x) Quantiles of the elements in the vector x
summary(x) Provides a summer of most R objects
min(x) Minimum of the elements in the vector x
max(x) Maximum of the elements in the vector x
IQR(x) Interquartile range
Examples
sqrt(3)
## [1] 1.7321
sqrt(1:10)
## [1] 1.0000 1.4142 1.7321 2.0000 2.2361 2.4495 2.6458 2.8284

## [9] 3.0000 3.1623
B.9 Combinatorics
factorial(x) x!
sample(x, k, replace=FALSE) Takes a sample of size k from set x without replacement
Binomial coefficient of nk = (n−k)!
n!

choose(n,k) k!
replicate(n, x) Repeats expression x n times
combn(x, m) Returns the combinations of the elements of x of size m
Examples
vowels <- c("a", "e", "i", "o", "u")

n <- length(vowels)
r <- 2
# Choose 2 elements from 5 without replacements

combn(x=1:n, m=r)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 1 1 1 2 2 2 3 3 4
## [2,] 2 3 4 5 3 4 5 4 5 5
STAT600 Page 274

# Number of ways
choose(n, r)
## [1] 10
# Simulate
sample(vowels, size=r)
## [1] "o" "u"
sample(vowels, size=r)
## [1] "o" "i"
replicate(n=4, sample(vowels, size=r))
## [,1] [,2] [,3] [,4]

## [1,] "u" "i" "i" "i"
## [2,] "e" "e" "u" "e"
#install if necessary
#install.packages('gtools')
#load library
library(gtools)
combinations(n, r, v=vowels, set=TRUE, repeats.allowed=FALSE)
## [,1] [,2]
## [1,] "a" "e"
## [2,] "a" "i"
## [3,] "a" "o"
## [4,] "a" "u"
## [5,] "e" "i"
## [6,] "e" "o"
## [7,] "e" "u"
## [8,] "i" "o"
## [9,] "i" "u"
## [10,] "o" "u"
permutations(n, r, v=vowels, set=TRUE, repeats.allowed=FALSE)
## [,1] [,2]
## [1,] "a" "e"
## [2,] "a" "i"
## [3,] "a" "o"
## [4,] "a" "u"
## [5,] "e" "a"
## [6,] "e" "i"
## [7,] "e" "o"
## [8,] "e" "u"
## [9,] "i" "a"
## [10,] "i" "e"
## [11,] "i" "o"
STAT600 Page 275

## [12,] "i" "u"
## [13,] "o" "a"
## [14,] "o" "e"
## [15,] "o" "i"
## [16,] "o" "u"
## [17,] "u" "a"
## [18,] "u" "e"
## [19,] "u" "i"
## [20,] "u" "o"
# Number of permutations
factorial(n)/factorial(n-r)
## [1] 20
B.10 Probability Distributions

rnorm r random number
dnorm d density/mass function
pnorm p cumulative distribution function P (X <= x)
qnorm q quantile function
Random numbers from other distributions

runif(), rnorm(), rbinom, rpois(), rgamma(), rweibull(), sample()
Use?rnorm to find out more.
Examples
rnorm(10) sample of 10 from the standard normal distribution with
mean=0, std deviation=1
rnorm(10,5,2) sample of 10 from the normal distribution with mean=5,
standard deviation=2
STAT600 Page 276

B.11 Creating plots
plot(x,y) Plots vectors x and y
plot(x,y,pch=16, cex=3) Point type and Point size
plot(x,y,col="red") Point/Line colour
plot(x,y,type="l", lwd=3) Type of plot: l=line, p=points, b=both
plot(x, y, xlim=c(0,10)) x axis limits (similarly ylim)
plot(x, y, type="l", lty=3) Line type and Line width
plot(x,y,xlab="x axis title",ylab="y axis title", main="main title")
Add axis labels and titles
hist(x) Histogram
stem(x) Stem and Leaf
pie(x) Pie chart
boxplot(x) Boxplot of x
boxplot(x~grp) Boxplot of, split by group
barplot(x) Bar chart
abline(a,b) adds the straight line ‘y = a + bx’ to a plot.
abline(v=2) Adds a vertical line at x=2 (similarly for h=2)
points(x,y) Adds points at (x, y)
lines(x,y) Adds lines at (x,y)
stripchart(x) Strip chart
par(mfrow=c(nrows, ncols)) Create a plot with nrows x ncols subplots.
Examples
plot(cars)
plot(cars[,1], cars[,2])
x = seq(-8, 8, by=0.1)
plot(x, dnorm(x, mean=0, sd=2))
0.20
120
120
100
100
0.15
dnorm(x, mean = 0, sd = 2)
80
80
cars[, 2]
0.10
dist
60
60
40
40
0.05
20
20
0.00
0
5 10 15 20 25 5 10 15 20 25 −5 0 5
speed cars[, 1] x
STAT600 Page 277

B.12 Control Statements & Functions
If Statements
if(logical-statement){
#do something
}
if(logical-statement){
#do something
}else{
#do something else
}
Loops
for(i in 1:n){
#do something
}
while(logical-statement){
#do something
}
Examples
x <- 2
y <- 10
if(x > y){
z <- x + y
}else{
z <- x*y
}
z
## [1] 20
for(i in 1:10){
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
STAT600 Page 278

for(i in 1:10){
print(1:i)
}
## [1] 1
## [1] 1 2
## [1] 1 2 3
## [1] 1 2 3 4
## [1] 1 2 3 4 5
## [1] 1 2 3 4 5 6
## [1] 1 2 3 4 5 6 7
## [1] 1 2 3 4 5 6 7 8
## [1] 1 2 3 4 5 6 7 8 9
## [1] 1 2 3 4 5 6 7 8 9 10
# Combining for and if

myword <- "probability"
x <- strsplit(myword, "")[[1]]
for(i in x) {
if(i %in% c("a", "e", "i", "o", "u")) print(i)
}
## [1] "o"
## [1] "a"
## [1] "i"
## [1] "i"
Custom functions
#Define function
fun <- function(x){
# do something
}
#Call function
fun(x)
Examples
getBMI <- function(height, weight){

# height in metres
# weight in kg
bmi <- weight/(height^2)
return(bmi)
}
getBMI(2.05, 80)
## [1] 19.036
STAT600 Page 279

my.mean <- function(x){
xbar <- sum(x)/length(x)
return(xbar)
}
my.mean(c(1,2,3))
## [1] 2
B.13 Reading and Writing Data

In R studio, the “Import Dataset” button is particularly useful. You can also use the
following commands:
read.delim(file.choose(), header=TRUE) tab delimited
read.csv(file.choose(), header=TRUE) csv file
read.csv("C:/dataset/globaldata.txt") csv file with known file
path
write.table(x, file="C:/mydata/test.txt") Write data
xx=read.table(file="C:/mydata/test.txt") Read data
STAT600 Page 280

Appendix C
Solutions to Selected Examples
1.7:
D: Outcomes with a head first D = {(H, H), (H, T )}
E: Outcomes with exactly one tail, E = {(H, T ), (T, H)}
1.8:
G: Outcomes which include an even number
G = {(2, H), (2, T ), (4, H), (4, T ), (6, H), (6, T )}
H : Outcomes which include exactly one tail,
H = {(1, T ), (2, T ), (3, T ), (4, T ), (5, T ), (6, T )}
1.9: G ∪ H = {(2, H), (2, T ), (4, H), (4, T ), (6, H), (6, T ), (1, T ), (3, T ), (5, T ), }
1.10: Gc = {(1, H), (1, T ), (3, H), (3, T ), (5, H), (5, T )}
1.11: G ∩ H = {(2, T ), (4, T ), (6, T )}
1.14:
A : primary system is operational, A = {ooo, oon, ono, onn}
B : first generator is operational, B = {ooo, oon, noo, non}
C : second generator is operational, C = {ooo, ono, noo, nno}
1.14:
• Primary or first generator are operational, A ∪ B = {ooo, oon, ono, onn, noo, non}
• Primary and first generator are operational, A ∩ B = {ooo, oon}
• Primary or first generator are operational, but second is not, (A ∪ B) ∩ C c =

{oon, onn, non}
• At least one of the systems is operational, A∪B∪C = {ooo, oon, ono, onn, noo, non, nno}
1.16: P (Ei ) = 2/3 for i = 1, 2, 3; P (A) = 2/3; P (B) = 2/3

1.18: 0.2
1.20: 120
1.21: 6
1.22: 20
1.23: 10
1.24: 1365
1.25: a. 125; b. 35
281
1.27: 60
1.28: 4200
1.28: 1/120
1.29: 0.399123
2.3: a. 0.624615
2.3: b. 0.172549
2.4: 2/3
2.5: b. 0.005; c. 0.02485
2.7:
• Sensitivity: 0.99
• Specificity: 0.98
• Prevalence: 0.5%
• Predictive Value: 0.199195
2.8: 1. 0.9; 0.9; 0.5; 0.9

2.8: 2. 0.9; 0.9; 0.0909; 0.4737
2.8: 3. 0.9; 0.9; 0.0099; 0.0826
2.11: No
2.12: P(A) =0.25; P(B) =0.5; P(AB)= 0.25; P(O) = 0
2.13: b.0.0465; c. 0.376344
3.1: 0.25, 0.5, 1
3.2: 6/11, 9/22, 1/22,
3.6: 0.5
3.7: a. 0.75, not fair; b. 0, fair
3.9: VAR[X] = 0.341, STD[X] = 0.584
3.11: 0.75 fall in this interval; (88.4, 151.6)
3.12: E[X] = 0.7; VAR[X] = 0.61; E[Y ] = 0.1; VAR[Y ] = 0.09
3.12: 5. E[C(X)] = 3.5; VAR[C(X)] = 15.25
4.3: a i. 0.201327
4.3: a ii. 0.879126
4.3: a iii. 0.3222
4.3: b. 2, 1.6
4.6: 0.065536
4.6: 625, 559.02
4.6: 1. $0 and $1743
4.6: 2. 0.32768
4.6: 3. 0.4096
4.6: 4. 0.4096
4.8: 2/3
4.11: 0.01024
4.15: 1. 0.007
4.15: 2. 0.00005
4.15: 3. 0.0076
STAT600 Page 282

4.15: 4. 0.1246
4.21: 10/21
4.22: 1. 0.969697
4.22: 2. 0.99798
5.3: 0.135335
5.3: 0.767456
5.3: 0.606531
5.4: 0.632
5.4: 0.095
5.4: 0.537
5.4: 693.147181
6.6: 0.367879
6.6: 0.258741
6.6: 0.491212
6.6: Left
6.11: 0.5
6.11: 0.933193
6.11: 0.07302
6.11: 1 inch
6.14: E[X] = 5.25, VAR[X] = 7.875
6.14: 0.46439
6.14: $228.38
6.20: 0.10516
6.20: 2.65868
6.25: 0.66667
6.25: 0.1875
6.26: 1.44
7.4: MTTF=1000
7.6: 2000 days
7.7: 0.31664, 0.86957
7.10: 0.13364, 1.06227
8.4:  
0 1 0
P= 0 0 1
1/2 1/2 0
8.5: P (X2 = 0|X0 = 0) = p2 + (1 − p)2 , P (X2 = 1|X0 = 0) = 2p(1 − p)

8.6:  
0.7 0 0.3 0
0.5 0 0.5 0 
P=  0 0.4 0 0.6

0 0.2 0 0.8
8.6: c. 0.523
9.2:
0.9 0.1
P =
0.2 0.8
STAT600 Page 283

2
9.2: P2,1 = 0.34
9.6:  
1 0 0 0 0
1 − p 0 p 0 0
 
P= 0
 1−p 0 p 0
 0 0 1−p 0 p
0 0 0 0 1
9.10: Company 1 should hire the advertising agency as the profit with the agency is $36,
600, 000 compared with $34, 666, 667 without the agency .
10.1: S = {1, 2} where state 1 = $10 and state 2 = $25.

0.9 0.1
P =
0.15 0.85
10.5:
Junior Senior Partner Leave as NP Leave as P
 
Junior .80 .15 0 .05 0
Senior 
 0 .70 .20 .10 0 

P = Partner 
 0 0 .95 0 .05 

Leave as NP  0 0 0 1 0 
Leave as P 0 0 0 0 1
10
10.4.1. ?.b: P0,0 = 0.20066
10.8:
0 1 2 3 4
 
0 0 0 1/3 1/3 1/3
1 
 0 0 1/3 1/3 1/3 

P = 2 
 1/3 1/3 1/3 0 0 
3  0 1/3 1/3 1/3 0 
4 0 0 1/3 1/3 1/3
11.3: 2. 0.25
11.3: 3. 0.2875
11.3: 4. 13, 3.866523
11.4: 0.97062
11.8: 0.03
11.8: 0.23
11.8: 0.2
STAT600 Page 284

Bibliography
[HKM] J. J. Higgins and S. Keller-McNulty. Concepts in Probability and Stochastic

Modelling.
[Ros02] S. Ross. Probability Models for Computer Science. Harcourt Academic Press,
San Diego, CA, 11 edition, 2002.
[Ros13] S. Ross. First Course in Probability. Pearson Higher Education, USA, 9 edition,
2013.
[Ros14] S. Ross. Introduction to Probability Models. Academic Press, Boston, MA, 11

edition, 2014.
[SY10] R.L. Scheaffer and L.J. Young. Introduction to Probability and its Applications.
Brooks/Cole, Boston MA, 3 edition, 2010.
[Win04] W. Winston. Operations Research: Applications and Algorithms. Brooks/Cole,

Belmont CA., 2004.
285

STAT600 Notes Student

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT600 Notes Student

Uploaded by

Copyright:

Available Formats

Auckland University of Technology

School of Engineering, Computer and

Learning Activity Hours

2 Conditional Probability and Independence 31

3 Discrete Random Variables 46

5 Continuous Random Variables 94

6 Continuous Distributions 110

8 Introduction to Markov Chains 164

10 Properties and Applications of Markov Chains 215

11 Further Properties of Random Variables 235

B Useful R commands 270

C Solutions to Selected Examples 281

Notes compiled by Dr Sarah Marshall: July 1, 2021

1.2 Why Study Probability?

We live in an Information Society – we must use the information presented to us to make

• Deterministic Model vs Probabilistic Models

1.2.1 Deterministic Models

1.2.3 Applications of Probability

• Maintenance Planning – should a part be replaced routinely or when it fails?

• Population Modelling – will a population become extinct?

• Disease Control – what will the impact be of an outbreak of a pandemic disease?

• Customer Service – how long will it take to serve a customer?

Two uses of probability models

• Use a given probabilistic model to gain insight

• Determine whether a given probabilistic model is correct, based on data (Statistical

1.3 Sample Space & Events

An experiment is any process that produces an observation or outcome.

The notation n(A) represents cardinality, i.e., the number of outcomes, in

Rolling Two Dice S = {(i, j) : i, j = 1, 2, 3, 4, 5, 6} 36

Race position of 7 S = {the 7! permuations of (1, 2, 3, 4, 5, 6, 7)} 7! = 5040

1.3.2 Events of an Experiment

Any subset E of the sample space is known as an event. [Ros13, p 24]

1.3.3 Set Operators & Notation

The notation a ∈ A means that a is an element in set A and similarly, the

To explain the following techniques, we will consider three examples.

The union of two events A and B, denoted A ∪ B, is the set of outcomes

• D ∪ E = {(H, H), (H, T ), (T, H)}

• Dc = {(T, H), (T, T )}

The intersection of two events A and B, denoted A ∩ B or AB, is the set

Since an event A and its complement Ac are mutually exclusive, then A ∩ Ac = ∅

• D = {(H, H), (H, T )}

S = {ooo, oon, ono, onn, noo, non, nno, nnn}

(Adapted from Milton & Young, 2003 pp6)

Define the following events:

Find the following:

• Primary or first generator are operational, A ∪ B =

• Primary and first generator are operational, A ∩ B =

• Primary or first generator are operational, but second is not, (A ∪ B) ∩ C c =

• At least one of the systems is operational, A ∪ B ∪ C =

History. Who is Kolmogorov?

Andrey Kolmogorov (1903 - 1987) was a Russian

Consider an experiment with sample space S. A probability is a numerically

3. For any sequence of mutually exclusive events E1 , E2 , . . . (i.e. events for

([Ros14, p28]; [SY10, p23])

Many useful properties are provided by the three axioms of probability. A

• 1 = P (S) = P (E ∪E c ) = P (E)+P (E c ) or equivalently, P (E c ) = 1−P (E)

• The inclusion–exclusion principle is stated below for 2 and 3 sets, but

• P (A) = P ({2, 4, 6}) = P ({2}) + P ({4}) + P ({6}) = 1/2

• P (B) = P ({1, 2, 3}) = P ({1}) + P ({2}) + P ({3}) = 1/2

• P (A ∩ B) = P (AB) = P ({2}) = 1/6

• P (A ∪ B) = P (A) + P (B) − P (AB) = 1/2 + 1/2 − 1/6 = 5/6

1.4.3 Equally Likely Outcomes