Course Ware

Mathematics 230
Probability 2021 Version
Probability Math 230 2021

Course Pack
Shaheena Bashir & Isaac Mulolani
This document was typeset on Tuesday 7th September, 2021.

§§ Legal stuff
• Copyright © 2021 Shaheena Bashir.
• Latex macro files are based on the CLP Calculus text by Joel Feldman, Andrew Rech-
nitzer and Elyse Yeager.
• This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

4.0 International License. You can view a copy of the license at
http://creativecommons.org/licenses/by-nc-sa/4.0/.
• Links to the source files can be found at the text webpage
2
Contents
1 Course Outline 1
1.1 Course Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Units of Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Reference Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Prerequisite Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Detailed Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Combinatorics 7
2.1 Randomness & Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Counting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Why Counting Rules in Probability: Classical Definition of Probability 13
2.2.2 Basic Principles of Counting . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.3 Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 Multinomial Coefficients: Permutations with Indistinguishable Objects 18
2.2.5 Circular Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.6 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Basic Concepts & Laws of Probability 25

3.1 Some Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 Types of Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.2 Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.3 Inclusion-exclusion principle . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Multiplication Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Law of Total Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
i
CONTENTS CONTENTS
4 Discrete Distributions 49
4.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.1 Random Variable Types . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.2 Discrete Probability Distribution . . . . . . . . . . . . . . . . . . . . 52
4.1.3 Cumulative Distribution Function (cd f ) . . . . . . . . . . . . . . . . 54
4.2 Expectation of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.1 Expected Values of Sums of Random Variable: Some Properties . . . 59
4.3 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.1 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.2 Variance: Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.1 Conditions for Bernoulli Variable . . . . . . . . . . . . . . . . . . . . 65
4.4.2 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 66
4.4.3 Bernoulli Distribution: Expectation & Variance . . . . . . . . . . . . 66
4.5 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5.1 Background Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5.2 Conditions for Binomial Distribution . . . . . . . . . . . . . . . . . . 67
4.5.4 Shape of Binomial Distribution . . . . . . . . . . . . . . . . . . . . . 70
4.5.5 Binomial Distribution: Expectation & Variance . . . . . . . . . . . . 73
4.6 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.6.1 Conditions for Poisson Variable . . . . . . . . . . . . . . . . . . . . . 74
4.6.3 Poisson Distribution: Expectation & Variance . . . . . . . . . . . . . 76
4.6.4 Poisson Approximation to the Binomial Distribution . . . . . . . . . 77
4.6.5 Comparison of Binomial & Poisson Distribution . . . . . . . . . . . . 78
4.7 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.7.1 Geometric Distribution Conditions . . . . . . . . . . . . . . . . . . . 78
4.7.3 Geometric Distribution: Cumulative Distribution Function cd f . . . 81
4.7.4 Geometric Distribution: Expectation & Variance . . . . . . . . . . . 82
4.8 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.8.2 Negative Binomial Distribution: Expected Value & Variance . . . . . 86
4.8.3 Comparison of Binomial, & Negative Binomial Models . . . . . . . . 88
4.9 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.9.2 Hypergeometric Distribution: Expected Value & Variance . . . . . . 90
4.9.3 Binomial Approximation to Hypergeometric Distribution . . . . . . . 91
4.10 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5 Continuous Distributions 95
5.1 Continuous Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 Continuous Probability Distribution . . . . . . . . . . . . . . . . . . . . . . 97
5.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2.3 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
ii
CONTENTS CONTENTS
5.2.4 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.3 Piecewise Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4 Continuous Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . 111
5.4.1 Probability Density Function . . . . . . . . . . . . . . . . . . . . . . 112
5.4.3 Uniform Distribution: Expectation & Variance . . . . . . . . . . . . 114
5.5 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.5.1 Probability Density Function (pd f ) . . . . . . . . . . . . . . . . . . . 117
5.5.2 Effect of Mean & Variance . . . . . . . . . . . . . . . . . . . . . . . . 117
5.5.3 Properties of Normal Distribution . . . . . . . . . . . . . . . . . . . . 119
5.5.4 Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . 120
5.5.5 Finding Probabilities Using Table . . . . . . . . . . . . . . . . . . . . 122
5.5.6 Finding Probabilities & Percentiles . . . . . . . . . . . . . . . . . . . 127
5.5.7 Empirical Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.5.8 Normal Distribution: Moment Generating Function . . . . . . . . . . 130
5.5.9 Sums of Independent Normal Random Variables . . . . . . . . . . . . 131
5.6 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.6.1 Link between Poisson & Exponential Distribution . . . . . . . . . . . 133
5.6.2 Exponential Distribution: (cd f ) . . . . . . . . . . . . . . . . . . . . . 135
5.6.3 Exponential Distribution: (pd f ) . . . . . . . . . . . . . . . . . . . . . 136
5.6.4 Exponential Distribution: Expectation & Variance . . . . . . . . . . 137
5.7 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6 Limit Theorems 143

6.1 Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.1.1 Chebyshev Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.2 Central Limit Theorem (CLT) . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.2.1 Sample Total (CLT) . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.2.2 Sample Mean (CLT) . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.2.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.2.4 Normal Approximation to Binomial . . . . . . . . . . . . . . . . . . . 154
6.3 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7 Joint Distributions 159

7.1 Bivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.2 Joint Distributions: Discrete case . . . . . . . . . . . . . . . . . . . . . . . . 160
7.2.1 Joint Cumulative Distribution Function (cd f ) . . . . . . . . . . . . . 162
7.2.2 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . 165
7.3 Joint Distributions: Continuous Case . . . . . . . . . . . . . . . . . . . . . . 166
7.3.1 Joint Cumulative Distribution Function (cd f ) . . . . . . . . . . . . . 169
7.3.2 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . 172
7.4 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.4.1 Convolution: Discrete Case . . . . . . . . . . . . . . . . . . . . . . . 174
7.4.2 Convolution: Continuous Case . . . . . . . . . . . . . . . . . . . . . 176
7.5 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
iii
CONTENTS CONTENTS
8 Properties of Expectation 183

8.1 Jointly Distributed Variables: Expectation for Discrete Case . . . . . . . . . 183
8.2 Jointly Distributed Variables: Expectation for Continuous Case . . . . . . . 184
8.3 Some Function of Jointly Distributed Random Variable . . . . . . . . . . . . 185
8.3.1 Expectation of Sums of Jointly Distributed Random Variables . . . . 187
8.3.2 Expectation of Sums of Functions of Jointly Distributed Random Vari-
ables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.4 Conditional Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
8.4.1 Conditional Distributions: Discrete Case . . . . . . . . . . . . . . . . 190
8.4.2 Conditional Expectation: Discrete Case . . . . . . . . . . . . . . . . 191
8.4.3 Conditional Distribution: Continuous Case . . . . . . . . . . . . . . . 193
8.4.4 Conditional Expectation: Continuous Case . . . . . . . . . . . . . . . 193
8.4.5 Properties of Conditional Expectation . . . . . . . . . . . . . . . . . 194
8.5 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
8.5.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.6 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.7 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Index 211
iv
Chapter 1
Course Outline
Since this course is on-campus, you should aim to spend time each day going through some
course content on the relevant unit covered during class. Then you should spend some time
doing some work on the assigned questions in that content.
1.1 Ĳ Course Description

This course begins with a review of combinatorics & uses in probability. A review of set
theory is in order for this unit. This is followed by an introduction to basic concepts of
probability, laws of probability, Bayes’ theorem, etc. A case study about the use of Bayes’
theorem in practical life is discussed. There is a discrete distributions unit. The fourth unit
will go through continuous random variables with some simple applications. The fifth unit
will cover joint distributions, covariance & correlation. The last unit in the course will cover
some properties of expectations.
1.2 Ĳ Units of Instruction

The following units will be covered in this course:
a. Combinatorics
b. Basic Concepts of Probability & Laws
c. Discrete Random Variables & Discrete Distributions
d. Continuous Random Variables & Continuous Distributions
e. Properties of Expectations
f. Joint Distributions
The following are the divisions of the course for this delivery:
1
Course Outline 1.3 Reference Materials
Course Instructor Assignments

Instructor wk topic
Shaheena week 1-2 Combinatorics
Shaheena week 3 Axioms of Probability
Shaheena week 4 Conditional Probability
Shaheena week 5-8 Discrete Random
Variables
Shaheena week 9-11 Continuous Random
Variables
Shaheena week 12 Limit Theorems
Shaheena week 13 Jointly Distributed
Random Variables
Shaheena week 14 Properties of Expectation
1.3 Ĳ Reference Materials

For this course, a number of reference materials will be used. The following is a list of
references for this course. Note that there are also books in the library in these areas that
you can use as additional resources.
1. Sheldon Ross, A First Course in Probability, 9th Edition, 2012.
2. Sheldon Ross, Introduction to Probability Models, 10th Edition, 2010.
3. Freund, John, E. Modern Elementary Statistics, Twelfth Edition, Prentice-Hall, 2007.
4. TeXstudio 2.10.8 website: http://texstudio.sourceforge.net/.
1.4 Ĳ Evaluation
The evaluation for this course will be based on the following:
Best 5 out of 7 Quizzes 35%

Mid-Term Exam 35%
Final Exam 30%
The primary assessment quizzes would be formative assessments, while summative assess-
ments would be the 2 exams. It is imperative that you keep up with the lecture content.
Reviewing the notes and solving assigned problems will help solidify your conceptual un-
derstanding. The quizzes & exams in this course will test not just your ability to perform
procedural calculations but also your grasp of the concepts.
2
Course Outline 1.5 Prerequisite Content
1.5 Ĳ Prerequisite Content
The prerequisite material for this course includes the use of contents of MATH 101 Cal 1.
The mathematics in this course makes use of the tools discussed in MATH 101 Cal 1. You
are reminded that it is your responsibility to review any material upon which this course
builds upon.
The following are a list of topics you may need to review:
Content for Review

Prerequisite
1. Set Theory; DeMorgan’s Laws; Venn Diagram
2. A clear understanding of the cards in a standard deck of 52 playing cards
3. Basic understanding of differentiation & integration
1.6 Ĳ Detailed Syllabus
The dates/times listed below are approximate and are subject to change. Adjustments will
be made as the semester progresses. Your schedule indicates that you have a ’Probability’
class every of the week for the duration of the semester. There are two lectures, each of 75
minute duration on Monday & Wednesday at 8:00AM and tutorials that will be announced
by the TA. This will most likely be used to work on the assessments.
3
Course Outline 1.6 Detailed Syllabus
Navigation of content during the Semester

WEEK DATES SECTION
1 September 6 Set Theory , Notes, Combinatorics
2 September 13 Combinatorics Contd’ Examples, problems, exercises
3 September 20 Basic Concepts of Probability, Examples, problems, exercises
4 September 27 Conditional Probability & Bayes Theorem examples, problems,
exercises
5 October 4 Random Variables. Discrete Random variables, Expectation &
Variance Examples, problems, exercises
6 October 11 Discrete Distributions; Bernoulli & Binomial Distributions
7 October 18 Poisson Distributions
8 October 25 Geometric, Negative Binomial &, Hypergeometric Distributions
9 November 01 Continuous Random Variable; Probability Density Function
(pdf); Cumulative Distribution Function (cdf)
10 November 08 Uniform & Normal Random Variables
11 November 15 Exponential Random Variables
12 November 22 Central Limit Theorem and applications
13 November 29 Jointly Distributed Random Variables
14 December 06 Properties of Expectation
4
Reading/Studying Schedule
Week Topics Approximate Length
September 6 Introduction to Randomness; Combinatorics 2 weeks
September 20 Axioms of Probability 1 week
September 27 Conditional Probability & Independence 1 week
October 4 Discrete Random Variables, pmf, cdf, expectation,

1 week
variance
October 11 Discrete Distributions examples, problems, exercises 3 weeks
November 01 Continuous Random Variables; pdf, cdf, expectation,

1 week
variance
Uniform, Normal & Exponential Distributions examples,

November 08 2 week
problems
November 22 Limit Theorems examples, problems, exercises 1 week
November 29 Jointly Distributed Random Variables 1 week
December 06 Properties of Expectations, Covariance & Correlations 1 week
5
6
Chapter 2
Combinatorics
AS YOU READ . . .
1. What is randomness?
2. What is probability?
3. Why review set theory?
4. What is the link of combinatorics to probability?
2.1 Ĳ Randomness & Probability

§§ Randomness
Randomness is an erratic behavior of objects, physical phenomenon, etc. A few definitions
of Randomness are given below.
Definition 2.1.1 (Randomness).
• Lack of pattern in events, e.g., many technological systems suffer from a num-
ber of significant uncertainties which may appear at all stages of design, exe-
cution and use
• Statistical uncertainties due to limited availability of data; a random sample

of 50 electrical components taken instead of a batch produced during a certain
time period.
• Lack of knowledge of the behaviour of elements in real conditions
• The Cambridge Dictionary of Statistics defines random as governed by chance;

not completely determined by other factors.
7
Combinatorics 2.1 Randomness & Probability
Some real life examples of Randomness are

1. Everyday chance behavior of markets
2. Daily weather forecast
3. Results of Exit Polls
4. Real life events like accidents
Ĳ Why Study Probability: Background

• Classical mathematical theory successfully describes the world as a series of fixed and
real observable events, e.g., Ohm’s Law and Newton’s 2nd Law, etc.
• Before the 17th century, classical mathematical theory failed in processes or experi-
ments that involved uncertain or random outcomes.
• Calculation of odds in gambling games was an initial motivation for mathematicians
and later scientific analysis of mortality tables within the medical profession led to the
development of the subject area of Probability.
Probability is one of the most important modern scientific tools that treats those aspects of
systems that have uncertainty, chance or haphazard features. Probability is a mathematical
term used for the likelihood that something will occur.
§§ Why Study Probability: Real Life

• Reliability: Many consumer products like consumer electronics, automobiles, etc use
reliability theory in product design to reduce the risk of failure
• What is the chance of COVID-19 at LUMS?
• What are the odds that Twitter’s Stock will plunge tomorrow?
• 27% of U.S. workers worry about being laid off, up from 15% in 2019
https://sciencing.com/examples-of-real-life-probability-12746354.html
§§ Why Study Probability: Sciences

• Engineering: Used in quality control and quality assurance to communication theory
in electrical engineering
• What is the risk of death of a COVID-19 patient, etc.
• Actuarial Science based on the risk of some event, i.e., it deals with lifetimes of humans
to predict how long any given person is expected to live, based on other variables
describing the particulars of his/her life. Though, this expected life span is a poor
prediction when applied to any given person, it works rather well when applied to
many persons. It can help to decide the rates insurance companies should charge for
covering any given person.
8
• To answer a research question (involving a sample to draw a conclusion about some

larger population), e.g., ‘how likely is it ¨ ¨ ¨?, what are the chances of ¨ ¨ ¨?’ we need to
understand probability, its rules, and probabilistic models.
Ĳ Link Between Set Theory & Probability Terminology

Math Word Stats Word Description Stats Notation
Element outcome one possible thing amongst many E1 , E2 , . . .
Universal Set sample space everything under consideration S or Ω
Subset Event collection of elements or outcomes A, B, A Ă B
Set notation tu
Empty set set with nothing in it ϕ
Complement not in set A, but in universal set Ac , Ā, A1
Union or either in A or B; or both AYB
Intersection and in A and B AXB
set subtraction in A but not in B A ´ B or AzB = A X Bc
conditions given restriction to sample space A|B
§§ Some Definitions
Definition 2.1.2 (Random Experiment).
In the classical approach to the theory of probability it is often assumed that the
experiments can be repeated arbitrarily (e.g. tossing a coin, testing a concrete cube)
and the result of each experiment can be unambiguously used to declare whether a
certain event occurred or did not occur (e.g., when tossing a coin, observing a ‘T’).
Such an experiment is called a Random Experiment.
Definition 2.1.3 (sample space S).
The sample space S of a certain random experiment denotes all events which can be
outcomes of the experiment under consideration. The sample space can be finite,
e.g., (tossing a coin 4 times), then the sample space consists of all 24 ´tuples of H
and T; i.e., 
HHHH
 HHHT

S= ..
 .
TTTT
infinite, e.g., record the duration (in seconds) of the next telephone call; then the
sample space is the set of all positive real numbers
S = t0, 1, 2, . . .u
9
Definition 2.1.4 (Tree Diagram).
A tree diagram is a graphical representation of Sample spaces. It is a picture

that branches for each option in choice. Tree diagrams are made up of nodes that
represent events, and branches that connect nodes to outcomes. Consider the 3
tosses of a fair coin with 2 possible outcomes for each toss as in Figure 2.1.1.
Figure 2.1.1.
Tree Diagram for 3 Flips of a Fair Coin
Definition 2.1.5 (Events).
Each possible outcome of a sample space is called a sample point, and an event is
generally referred to as a subset of the sample space having one (simple) or more
(compound) sample points as its elements.
Random Experiment Sample Space Event: Example

4 coin tosses the 4-tuple contains no consecutive
H’s e.g., HTTH.
Length of phone calls S = t0, 1, 2, . . .u A call between 20 and
30 minutes.
§§ Poker Hands
Standard Deck of 52 Playing Cards is displayed in Figure 2.1.2.
10
Figure 2.1.2.
Standard Deck of 52 Playing Cards
Definition 2.1.6 (Poker Hands).
A Poker Hand (see Figure 2.1.3) is a set of 5 cards chosen without replacement from
a pack of 52 cards.
Figure 2.1.3.
Poker Hand Rankings
Poker hand rankings from strongest to weakest are given below:

(1) Royal Flush: A royal flush is a hand consisting of a 10, J, Q, K, and an A all of the

4
same suit. Since the values are fixed, we only need to choose the suit. There are .
1
11
possible suits. After that, the other four cards are completely determined. Thus, there
are 4 possible royal flushes:

4
# royal flushes = =4
1
(2) Straight-Flush (excluding royal flush): A straight-flush consists of five cards with values
in a row, all of the same suit. Ace may be considered as high or low, but not both.
(e.g., A, 2, 3, 4, 5 is a straight, but Q, K, A, 2, 3 is not a straight.) The lowest value
in the straight may be A, 2, 3, 4, 5, 6, 7, 8, or 9. (Note that a straight flush beginning
with 10 is a royal flush, and we don’t want to count those.) So there are 9 choices for
4
the card values, and then = 4 choices for the suit, giving a total choices of
1
9 ˆ 4 = 36
(3) Four-of-a-kind: A four-of-a-kind is four cards showing the same number plus any other

13 4 12 4
# 4-of-a-kind = ˆ ˆ ˆ = 13 ˆ 1 ˆ 48 = 624
1 4 1 1
(4) Full House: A full house is three cards showing the same number plus a pair.

13 4 12 4
# Full House = ˆ ˆ ˆ = 13 ˆ 4 ˆ 12 ˆ 6 = 3, 744
1 3 1 2

4
(5) Flush: A flush consists of five cards, all of the same suit. There are = 4 ways to
1
13
choose the suit, then given that there are 13 cards of that suit, there are = 1287
5
4 13
ways to choose the hand, giving a total of ˆ = 5, 148 flushes. But note that
1 5
this includes the straight and royal flushes, which we don’t want to include. Subtracting
(36+4=40), we get a grand total of 5, 148 ´ 40 = 5, 108.
(6) Straight (excluding straight-flush): A straight consists of five values in a row, not all of
the same suit. The lowest value in the straight could be A, 2, 3, 4, 5, 6, 7, 8, 9 or 10,
giving 10 choices for the card values. Then there are 4 ˆ 4 ˆ 4 ˆ 4 ˆ 4 = 45 ways to
choose the suits of the five cards, for a total of 10 ˆ 45 = 10, 240 choices. But this value
also includes the straight flushes and royal flushes which we do not want to include.
Subtracting the 40 straight and royal flushes, we get 10, 240 ´ 40 = 10, 200.
(7) Three-of-a-kind: A three-of-a-kind is three cards showing the same number plus two
cards that do not form a pair or create a four-of-a-kind.

13 4 12 4 4
# 3-of-a-Kind = ˆ ˆ ˆ ˆ = 54, 912
1 3 2 1 1
12
Combinatorics 2.2 Counting Rules
(8) Two-pair: Two-pairs is two cards showing the same numbers and another two cards
showing the same numbers (but not all four numbers the same) plus one extra card
(not the same as any of the other numbers).

13 4 4 11 4
# 2-Pair = ˆ ˆ ˆ ˆ = 123, 552
2 2 2 1 1
(9) One-pair: One-pair is two cards showing the same numbers and another three cards all
showing different numbers.
3
13 4 12 4
# 1-Pair = ˆ ˆ ˆ = 1, 098, 240
1 2 3 1
(10) High card: High card means we must avoid higher-ranking hands (also known as no
pair or simply nothing is a hand that does not fall into any other category). All higher-
ranked hands include a pair, a straight, or a flush.Because
the numbers showing on
13
the cards must be five different numbers, we have choices for the five numbers
5
4 4
showing on the cards. Each of the cards may have any of four suits, i.e., ˆ ˆ
1 1
5
4 4 4 4
ˆ ˆ = . We subtract the number of straights, flushes, and royal
1 1 1 1
flushes. (Note that we avoided having any pairs or more of a kind.)
5
13 4
# High Cards = ˆ ´ 10, 200 ´ 5, 108 ´ 36 ´ 4 = 1, 302, 540
5 1
2.2 Ĳ Counting Rules
2.2.1 §§ Why Counting Rules in Probability: Classical Definition of Probability
Consider 2 fair dice are rolled. Let A be the event that a sum of 7 occurs. How likely it is
that event A will occur?
13
Figure 2.2.1.
Sample Space for rolls of 2 Fair Dice.
Number Of Ways Event A Can Occur

P( A) =
Total number Of Possible Outcomes
m
=
n
6
=
36
We count the number of ways event A can occur & the total number of possible outcomes
in the sample space S in Figure (2.2.1).
Definition 2.2.1 (Combinatorics: Objectives).
Many problems in probability theory require that we count the number of ways
that a particular event can occur. Systematic methods for counting the number
of favorable outcomes of an experiment fall under the subject area called Combi-
natorics. We will study several combinatorial techniques for counting large finite
sets without actually listing their elements. Combinatorial techniques are helpful
for counting the size of events that is an important concept in probability theory.
When selecting elements of a set, the number of possible outcomes depends on the
conditions under which the selection has taken place.
2.2.2 §§ Basic Principles of Counting

There are different rules to count the number of possible outcomes.
14
Definition 2.2.2 (Generalized Multiplication Rule).
If r experiments that are to be performed are such that the first one may result in
any of n1 possible outcomes; and if there are n2 possible outcomes of the second
experiment; and if, for each of the possible outcomes of the first two experiments,
there are n3 possible outcomes of the third experiment; and so on, then there is a
total of n1 ˆ n2 ˆ ¨ ¨ ¨ ˆ nr possible outcomes of the r experiments. Consider the 3
tosses of a fair coin with 2 possible outcomes for each toss; see Figure 2.1.1. There
are a total of 2 ˆ 2 ˆ 2 = 8 possible outcomes.
Example 2.2.3
1. How many numbers between 99 and 1000 have no repeated digits?
2. How many numbers are there between 99 and 1000 having at least one of their digits
7?
3. In Figure 2.2.2 there are four bus routes between A and B; and three bus routes between
B and C. A man can travel round-trip in number of ways by bus from A to C via B. If
he does not want to use a bus route more than once, in how many ways can he make
round trip?
Figure 2.2.2.
Bus Routes
Solution:
1. Numbers between 99 and 1000 means from 100-999. This can be considered a 4-step
process. Every step can be done in a number of ways that does not depend on previous
choices
(a) Choose first digit; the 1st digit can be any of the choice between 1-9, therefore 9
choices for this stage.
(b) Choose second digit; the 2nd digit can be chosen from 0-9 excluding the choice in
the previous stage, therefore 9 choice
15
(c) Choose third digit; the 3rd digit can be any of the choice between 0-9 excluding
the digits in the first 2 stages, i.e., 8 choices
So there are 9 ˆ 9 ˆ 8 = 648 possible numbers between 99 and 1000 with no repeated
digits.
2. Sample space of the numbers between 99 and 1000 without any restriction consists of
a total of 9 ˆ 10 ˆ 10 = 900 possible ways. With condition of having at least one of
their digits is 7 can be broken down into 2 parts that are complements of each other.
(a) Numbers between 99 and 1000 with condition that none of the digits as 7, i.e.,
8 ˆ 9 ˆ 9 = 648
(b) Numbers between 99 and 1000 with condition of at least one of the digits as 7 is
therefore , i.e., 900 ´ 648 = 252
3. The condition is that he does not want to use a bus route more than once,
(a) For trip from A-C; there are 4 ˆ 3 = 12 choices
(b) For trip from C-A; there are 2 ˆ 3 = 6 choices (excluding the routes chosen in
trip from A-C)
Therefore, 12 ˆ 6 = 72 possible routes for a round trip with condition that a route is
not used more than once.
Example 2.2.3
16
2.2.3 §§ Permutation
Definition 2.2.4 (Permutation Rule).
Permutations are ordered arrangements of all or a part of a set of objects. Permu-

tations are fundamentally of 2 types:
1. Permutation with Repetition:
a. All Objects Arrangement: If all n objects are to be arranged from a set

consisting of n objects, then there are n possibilities for the first choice, n
possibilities for the second choice, and so on. Therefore, n ˆ n ˆ . . . ˆ n =
nn possible ways for this. How many five digit numbers can be formed
with the digits: 1, 2, 3, 4, 5?
Solution:
: 55 five digit numbers can be formed.
b. Part of a Set of Objects Arrangement: The number of permutations of
n objects, taken r at a time, when repetition of objects is allowed is nr .
How many three digit numbers can be formed with the digits: 1, 2, 3, 4,
5?
Solution:
: 53 three digit numbers can be formed.
2. Permutation without Repetition:
a. All Objects Arrangement: The total number of permutations of a set A

of all n elements is given by
n! = n(n ´ 1)(n ´ 2) . . . 1
0! ” 1
https://www.youtube.com/watch?v=RbugCeR-njk
b. Part of a Set of Objects Arrangement: The total number of permutations
of a set A of n elements is an ordered listing of a subset of A of size k
without replacement & is given by
n n!
Pk = = n(n ´ 1)(n ´ 2) ¨ ¨ ¨ (n ´ k + 1).
(n ´ k)!
Hint: n! permutations of all objects, but (n ´ k ) elements not being

picked up, 6 divide by (n ´ k )! to avoid duplication
Example 2.2.5 (Seating Arrangement)

If Ali and Sara (a couple) and Babar and Hina (another couple) and Soban and Muzna (an-
other couple) sit in a row of chairs as in Figure 2.2.3,
17
1. How many different seating arrangements are there?

2. How many ways they can be seated so that each of the 3 couples sit together?
3. Find also the number of ways of their seating if all the ladies sit together.
4. In how many different ways can the 3 women be seated together on the left, and then
the 3 men together on the right?
Figure 2.2.3.
Seating Plan for 6 people
Solution:
The solutions are as follows
1. 6! ways for seating 6 persons (no restriction)
2. 3! ˆ 2 ˆ 2 ˆ 2 = 48 ways. As 3 couples sit together is 1 condition, so a total of 3 objects
to arrange, but there can be 2 possible ways for each couple sitting together.
3. 3! ˆ 4! = 144 ways that 3 ladies sit together.
4. 3! ˆ 3! = 36 ways that 3 women be seated together on the left, and then the 3 men
together on the right.
Example 2.2.5
2.2.4 §§ Multinomial Coefficients: Permutations with Indistinguishable Objects

How many 8-letter sequences possible with 3 A’s, 2 B’s, and 3 C’s?
• Certain items are distinct; others are not
• The number of distinguishable permutations of n objects, of which n1 objects are
identical, another n2 objects are identical, and another n3 objects are identical, and so
on nk objects are identical, is

n n!
= , with n1 + n2 + n3 + ¨ ¨ ¨ + nk = n
n1 , n2 , . . . , n k n1 !n2 !n3 ! ¨ ¨ ¨ nk !

n
is called multinomial coefficient
n1 , n2 , . . . , n k
18
Example 2.2.6
A bridge hand (4 hands of 13 cards each) is dealt from a standard 52 card deck. How many
different bridge hands are there?
Solution:
n! 52!
=
n1 !n2 !n3 !n4 ! (13!13!13!13!)
Example 2.2.6
2.2.5 §§ Circular Permutation

• Permutation in a circle is called circular permutation
• How can you arrange seating 3 friends A, B & C around a round table?
Figure 2.2.4.
Circular Permutation.
If we arrange these 3 persons around a round table as show in the Circular Arrangement 1 in
the Figure 2.2.4, we notice that all the different arrangements are not actually different but
all are same. Same is true for Circular Arrangement 2. If you move clockwise, start with A,
round the table in Figure 2.2.4, you will always get A-B-C. Important points to ponder are:
• If the clockwise and counter clockwise orders CAN be distinguished then total number
of circular permutation of n elements taken all together = (n ´ 1)!. The number is
(n ´ 1)! instead of the usual factorial n! since all cyclic permutations of objects are
equivalent because the circle can be rotated. The point is in circular permutation one
element is fixed and the remaining elements are arranged relative to it.
• If the clockwise and counter clockwise orders CANNOT be distinguished then total
number of circular permutation of n elements taken all together = (n ´ 1)!/2
19
Example 2.2.7 (Seating Arrangement)

How many ways Ali and Sara (a couple) and Babar and Hina (another couple) and Soban
and Muzna (another couple) can be seated around a circular table as in Figure 2.2.5 so that
1. couples sit together?
2. If Ali & Soban insist on sitting besides each other, how many arrangements are possible
now to seat them around the table?
Figure 2.2.5.
Round Table with 6 spots.
Solution:
The solutions are as follows
1. (3 ´ 1)! ways for seating 3 couples (restriction: couples sit together)
2. (5 ´ 1)! ˆ 2 = 48 ways. As Ali & Soban being together is 1 condition, so a total of 5

objects to arrange, but there can be 2 possible ways for Ali & Soban sitting together.
Example 2.2.7
20
2.2.6 §§ Combinations
Definition 2.2.8 (Combinations).
Combinations are unordered selections of all or a part of set of objects. Combina-

tions are fundamentally of 2 types:
1. Combination without Repetition:
• A sample of k elements is to be chosen without replacement from a set

of n elements. The number of different samples of size k that can be
selected from n is equal to

n n!
= , where k ď n
k k!(n ´ k )!

n
called binomial coefficient.
k
• Nice & symmetrical formula

n n!
= .
n´k k!(n ´ k)!
2. Combination with Repetition: If I roll 3 identical dice one time each, how
many possible unique results can I get? We have 6 options on each die (n = 6),
and there are 3 dice (r = 3), but since the dice are identical, a result of ‘1,
2, 3’ would be the same as ‘2, 3, 1’ or ‘3, 1, 2’. The number of unique results
will be
n+r´1 6+3´1
= = 56
r 3
§§ Pascal’s Triangle
There is a connection between the total number of subsets of n elements & the binomial
n
ÿ n
coefficients: = 2n . In Figure 2.2.6, the sum of binomial coefficients in each row is
k
k =0
equal to 2n ; the cardinality of the power-set.
21
Figure 2.2.6.
Pascal’s Triangle.
§§ Relationship Between Permutations & Combinations
Figure 2.2.7.
Relationship between Permutation & Combination.
Figure 2.2.7 shows the connection between Permutations & Combinations.
3 3! 3 3!
P2 = = 6; C2 = =3
(3 ´ 2) ! 2!(3 ´ 2)!
22
Example 2.2.9
1. A store has to hire two cashiers. Five people are interviewed for the jobs. How many
different ways can the hiring decisions be made?
2. Suppose there were 15 business people at a meeting. At the end of the meeting, each
person at the meeting shook hands with every other person. How many handshakes
were there?
3. A poker hand is a set of 5 cards chosen without replacement from a deck of 52 playing
cards. In how many ways can you get a hand with 3 red cards and 2 black cards?
4. There are 3 copies of Harry Potter and the Philosopher’s Stone, 4 copies of The Lost
Symbol, 5 copies of The Secret of the Unicorn. In how many ways can you arrange
these books on a shelf?
Solution:

5
1. = 10
2
2. As each person at the meeting shook hands with every otherperson,

and the order of
15
the handshakes between people does not matter; so a total of = 105 handshakes.
2

26 26
3.
3 2
4. There are a total of 12 books, therefore 12! ways to arrange those. These 12 books can
be categorized in to 3 distinct sets
(a) 3 copies of Harry Potter

(b) 4 copies of The Lost Symbol
(c) 5 copies of The Secret of the Unicorn
However 3 copies of Harry Potter are not distinct; 4 copies of The Lost Symbol are not
distinct & likewise 5 copies of The Secret of the Unicorn are not distinct. Therefore
multinomial coefficients are used to find arrangements here
12!
= 27720 ways
(3! ˆ 4! ˆ 5!)
Example 2.2.9
23
Combinatorics 2.3 Home Work
2.3 Ĳ Home Work

1. Find the number of possible 10 character passwords when:
(a) all the 10 characters have to be letters

(b) all letters must be distinct
(c) letters and digits must alternate and be distinct
2. How many ways are there to seat 10 people, consisting of 5 couples, in a row of seats
if:
(a) seats are assigned at random

(b) all couples are to get adjacent seats?
3. A box contains 30 balls, of which 10 are red and the other 20 blue. Suppose you take
out 8 balls from this box without replacement. How many possible ways are there to
have 3 red & 5 blue balls in this sample?
4. How many ways can eight people (including Mandy and Cindy) line up for a bus, if
Mandy and Cindy refuse to stand together?
5. How many integers, greater than 999 but not greater than 4000, can be formed with
the digits 0, 1, 2, 3 and 4, if repetition of digits is allowed?
§§ Answers:
1. (a) 2610
(b) 26 ˆ 25 ˆ ¨ ¨ ¨ ˆ 17
(c) 2 ˆ 26 ˆ 10 ˆ 25 ˆ 9 ˆ 24 ˆ 8 ˆ 23 ˆ 7 ˆ 22 ˆ 6
2. (a) 10!
(b) 10 ˆ 1 ˆ 8 ˆ 1 ˆ 6 ˆ 1 ˆ 4 ˆ 1 ˆ 2 ˆ 1

10 20
3.
3 5
4. 30240
5. 376
24
Chapter 3
Basic Concepts & Laws of Probability
AS YOU READ . . .
1. Basic Concepts of Probability?
2. What is inclusion-exclusion principle?
3. What are independent & dependent events?
4. What is conditional probability?
5. What is Bayes’ theorem & how it is useful in getting a data based updated probability?
3.1 Ĳ Some Definitions
Definition 3.1.1 (Sample Space & Events).
For each random experiment, there is an associated a random variable, which rep-
resents the outcome of any particular experiment. A sample space is any set that
lists all possible outcomes (or, responses) of some unknown experiment or situa-
tion. A sample space is generally denoted by the capital letter S. When predicting
tomorrow’s weather, then the sample space is S = tRain, Cloudy, Sunnyu. Each
subset of a sample space is defined to be an event. When some experiment is per-
formed, an event either will or will not occur, e.g., for the weather forecast example,
the subsets trainu, tcloudyu, train, cloudyu, train, sunnyu, train, cloudy, sunnyu, . . .,
and even the empty set ϕ = tu, are all examples of subsets of S that could be events.
25
Basic Concepts & Laws of Probability 3.1 Some Definitions
3.1.1 §§ Types of Events
Definition 3.1.2 (Null Event).
A null or empty event is one that cannot happen, denoted by ϕ such as getting a
sum of 14 on 2 rolls of a fair die.
Definition 3.1.3 (Mutually Exclusive Events).
If for 2 events A & B, A X B = ϕ then A and B are said to be mutually exclusive

or disjoint, e.g., in rolling a die, let A be the event that even numbers appear and
B the event that odd numbers appear. Then A and B are mutually exclusive events.
Definition 3.1.4 (Equally Likely Events).
All outcomes in the sample space have an equal chance to occur. e.g., coin toss
outcomes ‘H & T’ are equally likely events. In rolling a balanced die each of the
outcomes t1, 2, . . . , 6u are equally likely.
Definition 3.1.5 (At least & At Most Type of Events).
Take an example of tossing 3 coins & the sample space S also visualized in Figure
2.1.1.
S = tTTT, TTH, THT, THH, HTT, HTH, HHT, HHHu
let X denote the number of heads in this example, let A be the event that at least
2 heads appear & B be the event that at most 2 heads appear. Write down A & B
1. at least 2 heads appear, i.e., number of heads is 2 or more in this example
A = X ě 2 = tTHH, HTH, HHT, HHHu
2. at most 2 heads appear, i.e., number of heads is 2 or less in this example
B = X ď 2 = tTTT, TTH, THT, THH, HTT, HTH, HHTu
26
3.1.2 §§ Axioms of Probability
Definition 3.1.6 (Axioms of Probability).
For a random experiment with sample space S, the probability of an event A is

denoted as:
Number Of Ways Event A Can Occur
P( A) =
Total number Of Possible Outcomes
Certain rules govern the assignment of numeric values (probability) to chance events
& outcomes. P( A) must satisfy the following conditions.
• For every event A in the sample space, 0 ď P( A) ď 1
• P(S) = 1
• P(ϕ) = 0
• If A1 , A2 , . . . , is a collection of disjoint events, then P( A1 Y A2 Y ¨ ¨ ¨ ) =

ÿ8
P ( Ai )
i =1
Example 3.1.7 (Roll of 2 Dice Cont’d)

Consider the sample space S from the 2 rolls of a balanced die as in Figure 2.2.1.
Let A: Sum is even, then probability of the event A is
18
P( A) = P(Sum is even) =
36
Let B: Sum is greater than 6, then probability of the event B is
21
P( B) = P(Sum ą 6) =
36
Example 3.1.7
Definition 3.1.8 (Complement Rule).
Let A: Sum is even when 2 fair dice are rolled. Then you might have to find the
probability that A does not occur, i.e., P( Ac ).
18 18
P( Ac ) = 1 ´ P( A) = 1 ´ =
36 36
27
Figure 3.1.1.
Ac
Complement of an Event.
Example 3.1.9 (Example: 3 Coin Toss Cont’d)

Take an example of tossing 3 coins & the sample space S as shown in Figure 2.1.1.
Find the probability that:
1. at least 2 heads appear,
2. at most 2 heads appear,
Solution:
In Definition 3.1.5, the events corresponding to at least & at most 2 heads were specified.
Let X be the number of heads that appear in 3 tosses, then possible values of X = t0, 1, 2, 3u
1. At least 2 heads appear, then
X ě 2 = tTHH, HTH, HHT, HHHu
28
6 P( X ě 2) = 4/8
= 1/2
We can also use complement rule to find the required probability
P ( X ě 2) = 1 ´ P ( X ă 2)
= 1 ´ 4/8
= 1/2
2. At most 2 heads appear, then
X ď 2 = tTTT, TTH, THT, THH, HTT, HTH, HHTu
P( X ď 2) = 7/8
Alternatively, we can also use complement rule to find the required probability
P ( X ď 2) = 1 ´ P ( X ą 2)
= 1 ´ 1/8
= 7/8
Remember that using complement rule of probability, you partition the sample space into
mutually exclusive events.
Example 3.1.9
3.1.3 §§ Inclusion-exclusion principle

Definition 3.1.10 (Addition Law of Probability).
Also called as Inclusion-exclusion principle
1. P( A Y B) = P( A) + P( B) ´ P( A X B)
• Two dice roll & you might be interested to find the probability that sum
is either even or greater than 6. Let A be the event that sum is even,
while B be the event that sum is greater than 6. Then P( A Y B) =
18/36 + 21/36 ´ 9/36 = 30/36 = 5/6
2. Extension P( A Y B Y C ) = P( A) + P( B) + P(C ) ´ P( A X B) ´ P( A X C ) ´
P( B X C ) + P( A X B X C )
29
Figure 3.1.2.
A
A∪ B
2 4,6 3,5
Union of 2 Events
30
Figure 3.1.3.
2 4
1 5 13
B C
Union of 3 Events
Definition 3.1.11 (Some Simple Propositions).
If A Ă B, then
P( B) = P( A) + P( Ac X B)
P( A) ď P( B) which is called monotonicity of probability.
Example 3.1.12 (Roll of 2 Dice Cont’d)

Let A: Sum is even & greater than 6 & let A Ă B. You are also given that Ac X B is the set
that the sum is even but less than 7. Then find B from the given information.
Solution:
31
A = t8, 8, 8, 8, 8, 10, 10, 10, 12u

9
P( A) =
36
Ac X B = t2, 4, 4, 4, 6, 6, 6, 6, 6u
9
P( Ac X B) =
36
P( B) = P( A) + P( Ac X B)
9 9
= +
36 36
18
=
36
As A Ă B, then
B = A Y ( Ac X B)
= t8, 8, 8, 8, 8, 10, 10, 10, 12u Y t2, 4, 4, 4, 6, 6, 6, 6, 6u
= t2, 4, 4, 4, 6, 6, 6, 6, 6, 8, 8, 8, 8, 8, 10, 10, 10, 12u
B is thus the event that an even number appears when 2 dice are rolled.
Example 3.1.12
Figure 3.1.4.
Subset
32
Definition 3.1.13 (Probability of Equally likely Events).
• If there are N outcomes in the sample space and each outcome is equally likely,
1
then the probability of each outcome is , e.g., the Probability of getting a
N
Red with this spinner in Figure 3.1.5 is 1/8 as each of the N = 8 outcomes
in spinner are equally likely.
• If there are N outcomes in the sample space and each outcome is equally likely,
n
and A is an event with n outcomes, then P( A) = , e.g., the Probability of
N
getting a Yellow with the spinner in Figure 3.1.5 is 3/8
Figure 3.1.5.
A Spinner.
Example 3.1.14
33
1
Ellie will take 2 books on vacation. She will like the first with probability with , the second
2
2 3
with probability . She will like both the books with probability . What is the probability
5 10
that she likes at least one of them? Find the probability that she dislikes both.
Solution:
1 2 3
P(1st) = ; P(2nd) = ; P( Both) =
2 5 10
1 2 3 6
P(likes at least 1 of them) = P(1st) + P(2nd) ´ P( Both) = + ´ =
2 5 10 10
3 7
P(Dislikes both) = 1 ´ P(Likes both) = 1 ´ =
10 10
Example 3.1.14
§§ Odds
Definition 3.1.15 (Odds).
Odds represent the likelihood that the event will occur. The odds in favor - the
ratio of the number of ways that an outcome can occur compared to how many
ways it cannot occur, i.e.,
Odds in favor = Number of successes (r ) : Number of failures (s)
r/s r
P( A) = =
(r/s) + 1 r+s
e.g., when you roll a fair die the odds of getting a ‘6’ are 1 to 5
1
• Convert from odds to probability 6 P(6) =
1+5
• Convert from a probability to odds, e.g., if probability is 1/6 , then the odds
are ‘1 : 5’
https://www.theweek.co.uk/99357/us-election-2020-polls-who-will-win-trump-biden
Example 3.1.16
1. A study was designed to compare two energy drink commercials. Each participant was
shown the commercials, A and B, in random order and asked to select the better one.
There were 100 women and 140 men who participated in the study. Commercial A was
selected by 45 women and by 80 men. Find the odds of selecting Commercial A for the
men. Do the same for the women.
34
2. People with type O negative blood are universal donors. That is, any patient can
receive a transfusion of O negative blood. Only 7% of the American population have
O negative blood. If 10 people appear at random to give blood, what is the probability
that at least 1 of them is a universal donor?
3. Birthday Paradox: Two people enter a room and their birthdays (ignoring years) are
recorded.
(a.) What is the probability that the two people have a specific pair of birthdates?
(b.) What is the probability that the two people have different birthdates?
Solution:
1. Odds for Commercial (Women)= 45:55 = 9:11;

Odds for Commercial (Men)= 80:60 = 4:3
2. Let X be the number of people with O negative group in a group of 10 people. We are
interested to find the probability that in a group of 10 people at least 1 of them has O
negative group, i.e., P( X ě 1). Probability of a single randomly selected person with O
negative group is 0.07, i.e., P( X ) = 0.07; using complement rule P( X c ) = 1 ´ 0.07 =
0.93 is the probability of not having O negative group.
P ( X ě 1) = 1 ´ P ( X ă 1)
= 1 ´ P ( X = 0)
= 1 ´ (0.93)10 7 Donors are independent.
= 0.516
3. (a.) P(two people have a specific pair of birthdates) = 1/365 = 0.0027

(b.) P(two people have different birthdates) = 1 ´ 1/365 = 364/365 = 0.9972
Example 3.1.16
Example 3.1.17 (Birthday Problem Cont’d)

Three people enter a room and their birthdays (ignoring years) are recorded. What is the
probability that there is a pair of people who have the same birthday?
Solution:
Probability of a pair of people sharing the same birthday can be calculated by using the
complement rule as below:
P(at least two people have same birthdates) = 1 ´ P(None have same birthdates)
= 1 ´ 365 ˆ 364 ˆ 363/(3653 )
= 1 ´ 365 ˆ 364 ˆ (365 ´ 3 + 1)/(3653 )
= 0.0082
35
Basic Concepts & Laws of Probability 3.2 Conditional Probability
The birthday problem is also shown in Figure 3.1.6. For sharing a birthday, a single pair
has a fixed probability of 0.0027 for matching. That’s low for just one pair. However, as the
number of people increases rapidly, so does the probability of a match.
Example 3.1.17
Figure 3.1.6.
Probability of Shared Birthday

1.0
0.8
0.6
Probability
0.4
0.2
0.0
0 20 40 60 80 100
Number of People
3.2 Ĳ Conditional Probability

§§ Background
The relationship between multiple events that occur is important, e.g., draw 2 cards from
a deck of 52 playing cards & find the probability of the 2 Aces. The key question to think
about is, ‘Does the first event influence the outcome of the next event?’ When we know
(or assume) something about a random phenomenon in advance, it allows us to essentially
shrink the sample space to a smaller set of possible outcomes also called a reduced sample
space. This fundamentally alters the probabilities.
36
Basic Concepts & Laws of Probability 3.2 Conditional Probability
Definition 3.2.1 (Conditional Probability).
Flip a coin 3 times, (see Figure 2.1.1). What is the probability that the first coin
comes up heads? Suppose that some additional information that exactly two of the
three coins came up heads.
1. P(first coin heads|two coins heads) = ¨ ¨ ¨
2. How do probabilities change when we know that some event B has occurred?
3. The additional information has changed our available information, so proba-

bilities should also change
4. Conditional Probability P( A|B) asks for: Out of all outcomes in B, what

proportion of them are also in A? See Figure 3.2.1.
P( A X B)
P( A|B) ” , given that P( B) ‰ 0
P( B)
Conditional probabilities satisfy all of the three axioms of probabilities
• For every event A in the sample space, 0 ď P( A|B) ď 1
• P(S|B) = 1
• When A1 & A2 are mutually exclusive, then P( A1 |B Y A2 |B) = P( A1 |B) + P( A2 |B)
(given that P( B) ‰ 0)
Figure 3.2.1.
Conditional Probability.
37
Basic Concepts & Laws of Probability 3.3 Multiplication Rule
§§ Contingency Table: Conditional Probability as Relative Frequency
Example 3.2.2
A recent survey in US asked 100 people if they thought women in the armed forces should
be permitted to participate in combat. The results of the survey of Males & Females cross-
classified by their responses are given in the Table below:
Male (M) Female (F) Total
Yes 32 8 40
No 18 42 60
Total 50 50 100
Find the probability that a randomly selected respondent
1. was female who answered ’yes’.
2. who said ’no’ was a male.
Solution:
8/100
1. P( F|Yes) = = 8/40
40/100
18/100
2. P( M|No ) = = 18/60 = 3/10
60/100
Example 3.2.2
3.3 Ĳ Multiplication Rule
Example 3.3.1
When a company receives an order, there is a probability of 0.42 that its value is over $1000.
If an order is valued at over $1000, then there is a probability of 0.63 that the customer will
pay with a credit card. What is the probability that the next order will be valued at over
$1000 but will not be paid with a credit card?
Solution:
P(Over 1k) = 0.42
P(C|Over 1k) = 0.63
P(C1 X Over 1k) =?
P(C1 X Over 1k)
P(C1 |Over 1k) =
P(Over 1k)
P(C1 X Over 1k)
1 ´ 0.63 =
0.42
1
P(C X Over 1k) = 0.42 ˆ (1 ´ 0.63)
= 0.1554
38
Example 3.3.1
Definition 3.3.2 (Multiplication Rule for Dependent Events).
How to compute the joint probability of A and B when we are given the probability
of A and the conditional probability of B given A, etc?
P( A X B) = P( B|A).P( A)
P( A X B) = P( A|B).P( B)
When the outcome or occurrence of the first event affects the outcome or occurrence
of the second event in such a way that the probability is changed, the events are
said to be dependent.
§§ Independence
Definition 3.3.3 (Independent Events).
Roll one fair die and flip one fair coin.
S = t1H, 2H, 3H, 4H, 5H, 6H, 1T, 2T, 3T, 4T, 5T, 6Tu
1. P(Die shows 5) = 2/12 = 1/6
2. What is the probability that the die comes up 5, conditional on knowing that
the coin came up tails?, i.e., P(Die shows 5|tail) = 1/6
In this example P(Die shows 5|tail) = P(Die shows 5) = 1/6, such events are
independent.
Definition 3.3.4 (Multiplication Rule for Independent Events).
Whether or not the occurrence of one event affects the probability of the occurrence
of the other? Two events A and B are independent if the fact that A occurs does
not affect the probability of B occurring. By definition
P( A|B) = P( A)
P( B|A) = P( B)
P( A X B) = P( A).P( B)
39
Example 3.3.5
A Harris pole found that 46% of Americans say they suffer great stress at least once a
week. If three people are selected at random, find the probability that all three will say that
they suffer stress at least once a week.?
Solution:
P(Stress at least once a week)=0.46; As 3 selected people are independent,
6 all three will suffer stress at least once a week = 0.46 ˆ 0.46 ˆ 0.46
= 0.097
Example 3.3.5
Definition 3.3.6 (Independent & Disjoint Events).
Don’t confuse independence and mutually exclusive.
• independence means that probability of one event does not affect the proba-
bility of the other, i.e., P( A|B) = P( A)
• mutually exclusive events can’t occur together, i.e., P( A X B) = 0 ùñ

P( A|B) = 0 or P( B) = 0
Definition 3.3.7 (Dependent Events).
When the outcome or occurrence of the first event affects the outcome or occurrence
of the second event in such a way that the probability is changed, the events are
said to be dependent.
Example 3.3.8
Four of the light bulbs in a box of ten bulbs are burnt out or otherwise defective. If two
bulbs are selected at random without replacement; (see Figure 3.3.1) and tested, what is the
probability that
1. exactly two defective bulbs are found?
2. exactly one defective bulb is found?
Solution:
As the bulbs are selected without replacement, therefore the selection is of dependent events
& Multiplication Law for dependent events is used here.
1. P(exactly two defective bulbs) = P( D1 ) ˆ P( D2 |D1 ) = 4/10 ˆ 3/9 = 12/90
40
2. P(exactly one defective bulb) = P( D1 ) ˆ P( G2 |D1 ) + P( G1 ) ˆ P( D2 |G1 ) = 4/10 ˆ

6/9 + 6/10 ˆ 4/9 = 48/90
Example 3.3.8
Figure 3.3.1.
Tree Diagram for Bulbs Selection.
Generally there are two rules with Tree diagrams that you should keep in mind while
computing probabilities
1. When you are traveling along a branch you multiply the probabilities, i.e., use Multi-
plication Law of Probability.
2. When you go from branch to branch you add, i.e., either of the branches, so use Addition
Law of Probability.
41
Definition 3.3.9 (Multiplication Rule: Dependent Events).
P( A X B) = P( A|B).P( B)
P( A X B) = P( B|A).P( A)
P( A X B X C ) = P(C|A X B) P( A) P( B|A)
P ( A1 X A2 X A n . . . X A n ) = P( A1 ).P( A2 |A1 ).P( A3 |A1 X A2 )
¨ ¨ ¨ P( An |A1 X A2 X A3 ¨ ¨ ¨ X An´1 )
Students find it difficult to decide which Probability law to use for a certain scenario. Use
of Figure 3.3.2 while solving each problem will be helpful in making the correct choice.
Figure 3.3.2.
Some Tips for Selection of Probability Laws: A Flowchart
42
Basic Concepts & Laws of Probability 3.4 Law of Total Probability
3.4 Ĳ Law of Total Probability

Definition 3.4.1 (Law of Total Probability).
Sometimes a problem gives several different conditional probabilities and or inter-

section probabilities of an event A, but never gives the probability of the event A
itself. Let B1 , B2 , . . . , Bk be a partition of the sample space S so that Bi are disjoint
events with S = B1 Y B2 Y ¨ ¨ ¨ Y Bk as in Figure 3.4.1. Events resulting from such
a partition of the sample space into disjoint events are called mutually exclusive
& collectively exhaustive events. Such a partition divides any set A into disjoint
pieces as
A = A X B1 Y A X B2 ¨ ¨ ¨ A X Bk
As event A is the union of mutually exclusive events A X Bi then using Addition

Law of Probability for disjoint events
k
ÿ
P( A) = P( A X Bi )
i =1
ÿ k
= P( A|Bi ) P( Bi )
i =1
Figure 3.4.1.
Partitioning of S & Law of Total Probability.
Example 3.4.2 (Binary Signal)

A simple binary communication channel carries messages by using only two signals, say 0
43
Basic Concepts & Laws of Probability 3.4 Law of Total Probability
and 1. We assume that, for a given binary channel, 40% of the time a 1 is transmitted;
the probability that a transmitted 0 is correctly received is 0.90, and the probability that a
transmitted 1 is correctly received is 0.95. Determine the probability of a 1 being received.
Solution:
Use a Tree diagram as in Figure 3.4.2. Here we are given different simple probabilities
P(0) = 0.6; P(1) = 0.4
& some conditional probabilities
P(0|0) = 0.90; P(1|1) = 0.95
We need to find the probability of 1 being received.
P(one being received) = P(0 X 1) + P(1 X 1)

= 0.6 ˆ (1 ´ 0.90) + 0.4 ˆ 0.95
= 0.44
Example 3.4.2
Figure 3.4.2.
P (0 ∩ 0)
0|0
0 . 90
|0 )=
P (0
P (0)
1|0
P (1
0 |0) =
.60 1−
=0 0.90
0) P (0 ∩ 1)
P(
S
P (1 ∩ 0)
P( 1 0|1
1) .95
=0
.4 1 −0
|1) =
P (0
P (1)
1|1
P (1
|1) =
0 . 95
P (1 ∩ 1)
Binary Signal
44
Basic Concepts & Laws of Probability 3.5 Bayes’ Theorem
3.5 Ĳ Bayes’ Theorem

Definition 3.5.1 (Bayes’ Theorem).
• When an event Bi occurs, it is natural to investigate which of the events A

caused Bi ,
• With known P( A|Bi ), move in the ’reverse’ direction in the tree diagram &
use P( A|Bi ) to find P( Bi |A) called ’Posterior Probability’
P( Bi ).P( A|Bi ) P( Bi ).P( A|Bi )

P( Bi |A) = =
P( A) k
ř
P( A|Bi ).P( Bi )
i =1
The prior (marginal) probability of an event Bi , i.e., P( Bi ) is revised after event A

has been considered to yield a posterior probability P( Bi |A). (See Figure 3.5.1.)
Figure 3.5.1.
P (A ∩ B1 )
A| B 1
)
|B 1
P (A
P (B1 )
A¯|B
1
B1 P (A¯
) |B1
B1 )
P( P (Ā ∩ B1 )
S
P (A ∩ B2 )
B2
P(
B2 A| B 2
)
) |B 2
P (A
P (B2 )
A¯|B
2
P (A¯
|B2
)
P (Ā ∩ B2 )
Example 3.5.2 (Binary Signal Cont’d)

In the Example 3.4.2 given a 1 is received, find the probability that 1 was transmitted.
45
Basic Concepts & Laws of Probability 3.5 Bayes’ Theorem
Solution:
P(one was transmitted|one being received) is asking to move in the reverse direction in the
Figure 3.4.2.
P(one received|one transmitted). P(1 transmitted)
P(one was transmitted|one being received) =
P(one being received)
P (1 X 1)
=
P (0 X 1) + P (1 X 1)
0.4 ˆ 0.95
=
0.6 ˆ (1 ´ 0.90) + 0.4 ˆ 0.95
0.38
=
0.44
= 0.863
There is 86.3% chance that signal one was transmitted when signal one was received.
Example 3.5.2
Example 3.5.3 (Bayes’ Theorem: Detection of Rare Events)

The reliability of a particular skin test for tuberculosis (TB) is as follows:
If the subject has TB, the test comes back positive 98% of the time. If the subject does not
have TB, the test comes back negative 99% of the time. In a large population 2 in every
10,000 people have TB. A person, who was randomly selected from this large population, has
a test that comes back positive. What is the probability they actually have TB?
Solution:
P(TB|+ test Result) is asking to get an updated probability of having TB.
P(+test Result|TB).P( TB)
P(TB|+ test Result) =
P(+test Result)
P( TB X +test Result)
=
P( TB X +test Result) + P( TBc X +test Result)
0.0002 ˆ 0.98
=
0.0002 ˆ 0.98 + 0.9998 ˆ 0.01
0.000196
=
0.010194
= 0.019227
There is 0.019 probability of having TB when the test result was positive, means that most
likely the test result is false positive.
Example 3.5.3
A Case Study: Modern British Legal History

• Sally Clark arrested for the murder of her two infant sons in 1998.
• The prosecution case relied on flawed statistical evidence presented by paediatrician
Professor Roy Meadow, who testified that the chance of two children from an affluent
family suffering SIDS was 1/8500 ˆ 1/8500 =, i.e, 1 in 73 million.
46
Basic Concepts & Laws of Probability 3.6 Home Work
• The Royal Statistical Society later issued a statement and expressed concern at the ’mis-
use of statistics in the courts arguing that there was no statistical basis for Meadow’s
claim,’.
• Clark was wrongly convicted in November 1999. The convictions were upheld on appeal
in October 2000, but overturned in a second appeal in January 2003, after it emerged
that the prosecution forensic pathologist who examined both babies, had failed to
disclose microbiological reports that suggested the 2nd of her sons had died of natural
causes.
• Clark’s experience caused her to develop serious psychiatric problems and she died in
her home in March 2007 from alcohol poisoning.
§§ Applications
Standard applications of the multiplication formula, the law of total probabilities, and Bayes’
theorem occur with two-stage systems. The response for such systems can be thought of as
occurring in two steps or stages.
• Typically, we are given the probabilities for the first stage and the conditional proba-
bilities for the second stage.
• The multiplication formula is then used to calculate joint probabilities for what happens
at both stages;
• Law of Total Probability: used to compute the probabilities for what happens at the
second stage;
• Bayes’ Theorem: used to calculate the conditional probabilities for the first stage, given
what has occurred at the second stage
3.6 Ĳ Home Work

Identify the Probability law used from the scenario and calculate the required probability.
1. The WW Insurance Company found that 53% of the residents of a city had homeowner’s
insurance with its company. Of these clients, 27% also had automobile insurance with
the company. If a resident is selected at random, find the probability that the resident
has both homeowner’s and automobile insurance.
2. If there are 25 people in a room, what is the probability that at least two of them share
the same birthday?
3. You have a blood test for a rare disease that occurs by chance in 1 person in 100,000.
If you have the disease, the test will report that you do with probability 0.95 (and that
you do not with probability 0.05). If you do not have the disease, the test will report
a false positive with probability 0.001. If the test says you do have the disease, what
is the probability it that you actually have the disease? Interpret the results
47
Basic Concepts & Laws of Probability 3.6 Home Work
4. You go to see the doctor about an ingrown toenail. The doctor selects you at random
to have a blood test for swine flu, which is currently suspected to affect 1 in 10,000
people in Australia. The test is 99% accurate, in the sense that the probability of a
false positive is 1%. The probability of a false negative is zero. You test positive. What
is the new probability that you have swine flu? Interpret the results
§§ Answers
1. 0.1431. Multiplication Law of Probability for Dependent Events.
365 ˆ 364 ˆ ¨ ¨ ¨ ˆ 341
2. P( None) = = 0.4313 P(At least 2 share)=1-0.4313=0.5687
36525
3. 0.0094. There is only 0.94% chance that you do have the disease, or in other words the
test result is most likely false positive.
4. 0.0099. There is only 0.99% chance that you do have the swine flue, or in other words
the test result is most likely false positive
48
Chapter 4
Discrete Distributions
AS YOU READ . . .
1. What is a Random Variable & what are different types of Random Variable?
2. What is a Discrete Random Variables?
3. What is Probability Mass Function (pmf), Cumulative Distribution Function (cdf)?
4. What is Expected Value & Variance?
5. What are different Discrete Probability Models, i.e., Bernoulli, Binomial, Poisson, Ge-
ometric, Negative Binomial & Hypergoemetric Distributions?
6. How are these models used to quantify uncertainty in practical life?
4.1 Ĳ Random Variables

Many random processes produce numbers. These numbers are called random variables. The
idea is to summarize the outcome from a random experiment by a simple number, e.g., closing
price of Twitter Stock on NYSE. However, the particular outcome of the random experiment
is not known in advance, e.g., Twitter Stock plunged by 7% the day after banning Trump.
The resulting value of the variable that results from such a random phenomenon is not known
in advance.
Definition 4.1.1 (Random Variable).
1. A random variable is a function from the sample space S to the real numbers,
i.e., X is a rule which assigns a number X (s) for each outcome x P S
2. A random variable is denoted by an uppercase letter such as X, Y, Z. After

an experiment is conducted, the measured value of the random variable is
denoted by a lowercase letter such as x, y, z.
49
Discrete Distributions 4.1 Random Variables
Example 4.1.2
1. Toss a coin. The sample space is S = tT, Hu. Let X be the number of heads from the
result of a coin toss, then X = t0, 1u
"
1/2 for x = 0, 1;
P( X = x ) =
0 otherwise.
2. Let X denote the sum of the numbers on the upper faces that might appear when 2
fair dice are rolled, (see Figure 2.2.1). Then X = t2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12u.
3. If we roll three dice, there are 63 = 216 possible outcomes. Let the random variable X
be the sum of the three dice. In this case, X = t3, 4, . . . , 17, 18u.
Example 4.1.2
4.1.1 §§ Random Variable Types
Consider the difference in how we measure the outcome of 3 tosses of a fair coin, e.g., our
interest might be in the number of heads (see Figure 4.1.1) versus the heights of the students
in cm, (see Figure 4.1.2).
1. Discrete Random Variable: possible values of the variable can be listed in either a finite
or an infinite list, e.g., number of defective parts produced in manufacturing, number
of people getting flu in winter, etc.
50
Figure 4.1.1.
0 1 2 3
Number of Heads in 3 Tosses of a Fair Coin
2. Continuous Random Variable: assumes any values in an interval (either finite or infi-
nite) of real numbers for its range, e.g., height of students, weight of new born babies,
the temperature range during a day, life length of some electric device, etc.
51
Figure 4.1.2.
150 160 170 180 190
Heights of Students in cm
4.1.2 §§ Discrete Probability Distribution

Definition 4.1.3 (Probability Mass function (pm f )).
The probability distribution of a random variable X is a description of the prob-

abilities associated with the possible values of X. For a discrete random variable,
the distribution is often specified by just a list of the possible values along with the
probability of each, or in some cases, it is convenient to express the probability in
terms of a formula. Probability Mass Function or pmf is written as:
P( X = x ) = p( x )
§§ Probability Mass Function (pm f ): Properties

1. 0 ď P( X = x ) ď 1, @x P S .
ÿ
2. P( X = x ) = 1
For a discrete random variable, its pm f or probability mass function defines all that we need
to know about the random variable. A pm f for a discrete random variable is defined (with
52
positive probabilities) only for a finite or countably infinite set of possible values - typically
integers. Toss a fair coin three times and let X denote the number of heads observed. The
probability distribution of X is shown graphically in Figure 4.1.3.
Figure 4.1.3.
1/2
3/8
P(X=x)
1/4
1/8
0
0 1 2 3
Number of Heads in 3 tosses of a fair coin
A Probability Mass Function.
If we roll 2 dice, let X be the sum that appears on the upper faces. The probability
distribution of X is shown graphically in Figure 4.1.4.
53
Figure 4.1.4.
Example 4.1.4
A company has five warehouses, only two of which have a particular product in stock. A
salesperson calls the five warehouses in random order until a warehouse with the product is
reached. Let the random variable Y be the number of calls made by the salesperson. Cal-
culate the probability mass function and cumulative distribution function of the number of
calls made by the salesperson.
Solution:
Let X be the event that a particular product in stock, then P( X ) = p = 2/5. Let Y be the
number of calls made by the salesperson needed to find a warehouse with the product. He
calls the warehouses one by one until he finds the warehouse with required product.
Y P (Y ) F (Y )
1 2/5 2/5
2 3/5 ˆ 2/4 = 3/10 2/5 + 3/10 = 7/10
3 3/5 ˆ 1/2 ˆ 2/3 = 1/5 2/5 + 3/10 + 1/5 = 9/10
4 3/5 ˆ 1/2 ˆ 1/3 = 1/10 2/5 + 3/10 + 1/5 + 1/10 = 1
$
’
’ 0 y ă 1;
& 2/5 1 ď y ă 2;
’
’
F (Y = y ) = 7/10 2 ď y ă 3;
9/10 3 ď y ă 4;
’
’
’
’
1 y ě 4.
%
Example 4.1.4
4.1.3 §§ Cumulative Distribution Function (cd f )

§§ Background
We calculated the probability that at most 2 heads turn up when we roll a coin 3 times, i.e.,
P( X ď 2). The probability that the random variable X does not exceed x (i.e., will be less
54
than or equal to x) is denoted as:
ÿ
F ( x ) = P( X ď x ) = P( X )
Xďx
Definition 4.1.5 (Cumulative Distribution Function (cd f )).
F ( x ) = P( X ď x ) is called the Cumulative Distribution Function cd f . For discrete

random variables the cumulative distribution function will always be a step function
with jumps at each value of x that has probability greater than 0.
Example 4.1.6
Toss a fair coin three times and let x denote the number of heads observed. Find the corre-
sponding cumulative distribution function cd f .
$
’
’ 1/8 for x = 0;
& 3/8 for x = 1;
’
’
P( X = x ) = 3/8 for x = 2;
1/8 for x = 3;
’
’
’
’
0 otherwise.
%
Solution:
$
’
’ 0 x ă 1;
& 1/8 0 ď x ă 1;
’
’
F(X = x) = 4/8 1 ď x ă 2;
7/8 2 ď x ă 3;
’
’
’
’
1 x ě 3.
%
The plot for the cd f is given in Figure 4.1.5.
55
Figure 4.1.5.
1
7/8
●
6/8
5/8
F(x)
4/8
●
3/8
2/8
1/8
●
0
0 1 2 3
Number of Heads in 3 tosses of a fair coin
A Cumulative Distribution Function corresponding to Figure 4.1.3.
Pay attention to the jump size in the step function? What do you conclude?
Example 4.1.6
§§ Getting pm f from cd f
If the range of a discrete random variable X consists of the values x1 ă x2 ă ¨ ¨ ¨ ă xn then
p ( x1 ) = F ( x1 )
p( xi ) = F ( xi ) ´ F ( xi´1 ); i = 2, 3, . . . n
Example 4.1.7
X is a discrete random variable with cd f as:
$
’
’ 0.00 ´8 ď x ă ´3;
& 0.03 ´3 ď x ă 1;
’
’
F(X = x) = 0.20 1 ď x ă 2.5;
0.76 2.5 ď x ă 7;
’
’
’
’
1.00 7 ď x ă 8.
%
56
Discrete Distributions 4.2 Expectation of a Random Variable
Write down the pm f from the above cd f in appropriate form. Given that X is positive, what
is the probability that it will be at least 2?
Solution:
$
’
’ 0.03 for x = ´3;
& 0.17 for x = 1;
’
’
P( X = x ) = 0.56 for x = 2.5;
0.24 for x = 7;
’
’
’
’
0 otherwise.
%
Given that X is positive, what is the probability that it will be at least 2?
P ( X ě 2 X X ą 0)
P( X ě 2|X ą 0) =
P ( X ą 0)
0.80
=
1 ´ 0.03
= 0.8247
Example 4.1.7
§§ Cumulative Distribution Function cdf: Properties
1. lim F ( x ) = 0
xÑ´8
2. lim F ( x ) = 1
xÑ+8
3. For a ď b, F ( a) ď F (b) i.e., F is a non-decreasing function
4. F ( X ) is a right continuous function, ( lim F ( xn ) = F ( x ) for all x P R).

xn Ñx
4.2 Ĳ Expectation of a Random Variable
Toss a coin 50 times and count the number of times heads might turn up. How many heads
do you expect?
57
Figure 4.2.1.
The probability mass function provides complete information about the probabilistic
properties of a random variable. One of the most basic summary measures is the expec-
tation or mean of a random variable, i.e., E( X ). It is the average value of random variable X
and certainly reveals one of the most important characteristics of its distribution, i.e., center.
Definition 4.2.1 (Expected value of a Discrete Random Variable).
If X is a discrete random variable that assumes values x1 , x2 , . . . , xn along with

corresponding probabilities P( x1 ), P( x2 ), . . . , P( xn ). Then expected value of X is
n
defined as: ÿ
E( X ) = µ = x j P( X = x j )
j =1
1. E( X ) is also called the 1st moment of the random variable X about zero. The
first moment of X is synonymously called the mean, expectation, or average
value of X.
2. Let g( X ) is a real valued function of random variable X, then:

ÿ n
E g( X ) = g( x j ) P( X = x j )
j =1

E g( X ) is also called generalized moment. Let g( X ) = X n , n = 1, 2, . . .; the
expectation E( X n ) , when it exists, is called the nth moment of X.
If X takes on a countably infinite number of values x1 , x2 , . . . , then

8
ÿ
E( X ) = x j P( X = x j )
j =1
Example 4.2.2
58
1. Toss three fair coins and let X denote the number of heads observed. Find the expected
number of heads. $
’
’ 1/8 for x = 0;
& 3/8 for x = 1;
’
’
P( X = x ) = 3/8 for x = 2;
1/8 for x = 3;
’
’
’
’
0 otherwise.
%
2. Consider a game that costs $1 to play. The probability of losing is 0.7. The probability
of winning $50 is 0.1, and the probability of winning $35 is 0.2. Would you expect to
win or lose if you play this game?
Solution:
1.
E( X ) = 0 ˆ 1/8 + 1 ˆ 3/8 + 2 ˆ 3/8 + 3 ˆ 1/8

= 12/8
= 1.5
2. Let X be the gain when you play the game
Gain ( X ) P( X )
-1 0.7
(50-1) 0.1
(35-1) 0.2
E( X ) = ´1 ˆ 0.7 + (50 ´ 1) ˆ 0.1 + (35 ´ 1) ˆ 0.2

= 11
In the long run, you are expected to gain $11 if you play this game .
Example 4.2.2
4.2.1 §§ Expected Values of Sums of Random Variable: Some Properties

A fundamental property of the expectation operator is that it is linear.
• If X1 , X2 , . . . , Xn are discrete random variables with finite expected values, and a1 , a2 , . . . , an

are constant numbers, then
E ( a 1 X1 + a 2 X2 + ¨ ¨ ¨ + a n X n ) = a 1 E ( X1 ) + a 2 E ( X2 ) + ¨ ¨ ¨ + a n E ( X n )
• If X1 , X2 , . . . , Xn are identically distributed discrete random variables, then
E ( X1 + X2 + ¨ ¨ ¨ + X n ) = E ( X1 ) + E ( X2 ) + ¨ ¨ ¨ + E ( X n ) (4.2.1)
59
• For any random variable X and constants a & b, if Y = aX + b, then
E( aX + b) = aE( X ) + b
Definition 4.2.3 (Median of a Random Variables).
A median of X is any point that divides the mass of the distribution into two equal
parts; that is, x0 is a median of X if
1
P( X ď xo ) =
2
The mean of X may not exist, but there exists at least one median.
Definition 4.2.4 (Indicator Random Variables: Bernoulli Variable).
Random variables that are 1 when an event occurs or 0 when the event does not
occur are called indicator random variables. In other words, I A maps all outcomes
in the set A to 1 and all outcomes outside A to 0. Roll a die. Let A be the event
that a 6 appears. Then "
1 if x P A;
IA (x) =
0 otherwise.
If X is an indicator random variable for event A (6 appears), then E( X ) = P( A).
Example 4.2.5
Four students order noodles at a certain local restaurant. Their orders are placed indepen-
dently. Each student is known to prefer Japanese pan noodles 40% of the time. How many
of them do we expect to order Japanese pan noodles?
Solution:
Let X denote the number of students that order Japanese pan noodles altogether. Let
X1 , X2 , X3 , X4 be the indicator random variables representing the 4 students if they make a
choice of Japanese pan noodles or not
"
0 for xi = 0;
P ( Xi ) =
0.4 for xi = 1;
E( Xi ) = 1 ˆ 0.4 + 0 ˆ 0.6
= 0.4
Then the number of students that order Japanese pan noodles X = X1 + X2 + X3 + X4 . As
X1 , X2 , . . . , X4 are identically distributed discrete random variables, then expected value of
their sums is equal to the sum of the respective expectations, see Equation 4.2.1. Because
each has the same expected value, E( x ) = 4(0.4) = 1.6.
Example 4.2.5
60
Discrete Distributions 4.3 Variance
4.3 Ĳ Variance
Figure 4.3.1.
Definition 4.3.1 (Variance).
The variance of a random variable X is a measure of how spread out its possible
values are. The variance of X is the 2nd central moment, commonly denoted by σ2
or Var ( X ). It is the most commonly used measure of dispersion of a distribution
about its mean. Large values of σ2 imply a large spread in the distribution of X
about its mean. Conversely, small values imply a sharp concentration of the mass of
distribution in the neighborhood of the mean as shown in Figure 5.2.4. For discrete
random variable
Var ( X ) = E( X ´ µ)2
= E( X 2 ) ´ [ E( X )]2 ,
where E( X 2 ) is the 2nd moment of the random variable X about zero. Variance is
the average value of the squared deviation of X from its mean µ. If X has units of
meters, e.g., the variance has units of meters squared.
For any random variable X, the variance of X is nonnegative, i.e.,
Var ( X ) = E( X ´ µ)2 ě 0
61
4.3.1 §§ Standard Deviation
Definition 4.3.2 (Standard Deviation).
The sample standard deviation σ is the square root of the sample variance. It can
be interpreted as the distance of the data values to the mean.
b
σ = Var ( X )
Standard deviation has the same units as that of X.
Example 4.3.3
1. Ali and his brother both like chocolate chip cookies best. They have a jar of cookies
with 5 chocolate chip cookies, 3 oatmeal cookies, and 4 peanut butter cookies. They
are each allowed to have 3 cookies. To be fair, they agree to randomly select their
cookies without peeking, and they each must keep the cookies that they select. What
is the variance of the number of chocolate chip cookies that Ali gets?
2. A student was at work at the county amphitheater, and was given the task of cleaning
1500 seats. To make the job more interesting, his boss hid a golden ticket somewhere
in the seats. The ticket is equally likely to be in any of the seats. Let X be the number
of seats cleaned until the ticket is found. Calculate the variance of X.
Solution:
1. Let X denote the number of chocolate chip cookies that Ali selects. As he is allowed
to have 3 cookies, therefore X = 0, 1, 2, 3
X P( X
)
X.P( X ) X 2 .P( X )
5 7 12
0 / = 7/44 0 0
03 3
5 7 12
1 / = 21/44 21/44 21/44
12 3
5 7 12
2 / = 7/22 14/22 28/22
21 3
5 7 12
3 / = 1/22 3/22 9/22
3 0 3
Total 5/4 95/44
62
n
ÿ
E( X ) = x j P( X = x j )
j =1
= 5/4
n
ÿ
2
E( X ) = x2j P( X = x j )
j =1
= 95/44
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= 95/44 ´ (5/4)2
= 0.6
2. Let X be the number of seats cleaned until the ticket is found. As there are 1500
seats, so the probability of finding the ticket is p = 1/1500. The student will start
cleaning the seats & move to clean the next seat only if he does not find the ticket in
the previous seat.
X P( X )
1 1/1500
2 1 ´ 1/1500 ˆ 1/1499 = 1/1500
3 1 ´ 1/1500 ˆ 1 ´ 1/1499 ˆ 1/1498 = 1/1500
.. ..
. .
1500 1/1500
E( X ) = 1/1500 ˆ (1 + 2 + ¨ ¨ ¨ + 1500)
= 1/1500 ˆ 1500(1500 + 1)/2
= 750.5
E( X 2 ) = 1/1500 ˆ (12 + 22 + ¨ ¨ ¨ + 15002 )

= 1/1500 ˆ 1500(1500 + 1)(2 ˆ 1500 + 1)/6
= 750750.1667
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= 750750.167 ´ 750.52
= 187499.9067
Example 4.3.3
63
4.3.2 §§ Variance: Properties

1. If a is a constant, then Var ( a) = 0
2. Var ( X ) ě 0
3. If X1 , . . . , Xn are independent random variables, and a1 , . . . , an are constants, then
Var ( a1 X1 + ¨ ¨ ¨ + an Xn ) = a21 Var ( X1 ) + ¨ ¨ ¨ + a2n Var ( Xn )
4. The variance operator is not linear, but it is straightforward to determine the variance
of a linear function of a random variable. For any random variable X and any constants
a and b, let Y = aX + b, then Y is also a random variable and
Var (Y ) = Var ( aX + b) = a2 Var ( X )
b b
SD (Y ) = SD ( aX + b) = Var ( aX + b) = a2 Var ( X )
Example 4.3.4 (Examples of Applications of Variance)
1. Uncertain effect of a change in the State Bank’s monetary policy on economy,

2. Variation in ocean temperatures at a single location indicates something about how
heat moves from place to place in the ocean.
3. Why different patches of the same forest have different plants on them.
Example 4.3.4
Example 4.3.5
Four cards are labeled $1, $2, $3, and $6. A player pays $4, selects two cards without re-
placement at random, and then receives the sum of the winnings indicated on the two cards.
Will the player win or lose money in the long run? What is the variance of the winning?
Solution:
Let X be the sum of the
2 cards that he selects without replacement. Probability for any
4
value of P( X = x ) = 1/ = 1/6.
2 ÿ
As the expected winning E(Y ) = Y.P( X ) = $2 (see Table 4.1), so the player will win
money ($2) in the long run.
Expected winning could also be calculated using expected value of the linear combination,
Y = X ´ 4.
E (Y ) = E ( X ´ 4 )
= E( X ) ´ 4
= 36/6 ´ 4
=2
64
Discrete Distributions 4.4 Bernoulli Distribution
X=Sum Y = X´4 P( X ) Y.P( X )

(1,2)=3 3 ´ 4 = ´1 1/6 ´1/6
(1,3)=4 4´4 = 0 1/6 0
(1,6)=7 7´4 = 3 1/6 3/6
(2,3)=5 5´4 = 1 1/6 1/6
(2,6)=8 8´4 = 4 1/6 4/6
(3,6)=9 9´4 = 5 1/6 5/6
Total 1 12/6 = 2
Table 4.1
Variance of the winning can be calculated using property of variance as given below:
Var ( X ´ 4) = Var ( X )
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= 122/3 ´ 62
= 4.67
So variance of the winning is $4.67.

Example 4.3.5
Ĳ Probability Distribution
A probability distribution is a representation of random variables and the associated prob-
abilities of different outcomes. A probability distribution is characterized by a probability
mass function pm f for discrete, or by probability density function pd f for continuous random
variables respectively.
4.4 Ĳ Bernoulli Distribution

4.4.1 §§ Conditions for Bernoulli Variable
1. A Bernoulli random variable models random experiments that have two possible out-
comes, sometimes referred to as ’success’ and ’failure.’ Bernoulli random variable is
called an indicator variable
2. This random variable can only take two possible values, usually 0 and 1.
(a) Toss a coin. The outcome is ether heads ( X = 1) or tails ( X = 0).

(b) Taking a pass-fail exam; either pass ( X = 1) or fail ( X = 0).
(c) A newborn child is either male ( X = 1) or female ( X = 0).
3. The probabilities of success p and of failure q = 1 ´ p are positive such that p + q = 1
65
Discrete Distributions 4.4 Bernoulli Distribution
4.4.2 §§ Probability Mass Function (pm f )

Definition 4.4.1 (Bernoulli Distribution (pm f )).
A random variable X „ Bernoulli ( p), where p is the probability of success is the

only parameter for this distribution. The pm f for Bernoulli distribution is:
P( X = x ) = p x (1 ´ p)1´x
Also written as
"
1 ´ p for x = 0
P( X ) =
p for x = 1
4.4.3 §§ Bernoulli Distribution: Expectation & Variance
1
ÿ
E( X ) = xi p xi (1 ´ p)1´xi
i =0
=p
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= p (1 ´ p )
Example 4.4.2
Thirty-eight percent of the songs on a student’s music player are rock songs. A student
chooses a song at random, with all songs equally likely to be chosen. Let X indicate whether
the selected song is a rock song. Find the expected number and variance of X.
Solution:
Let X be the indicator random variable with X = 1 if the selected song is a rock song, X = 0
otherwise. The probability of rock song X is p = 0.38.
"
1 ´ 0.38 f or x = 0
P( X ) =
0.38 f or x = 1
E( X ) = p
= 0.38
Var ( X ) = p(1 ´ p)
= 0.38 ˆ (1 ´ 0.38)
= 0.2356
Example 4.4.2
66
Discrete Distributions 4.5 Binomial Distribution
4.5 Ĳ Binomial Distribution

4.5.1 §§ Background Example
The possible outcomes in 3 tosses of a fair coin are:
We want to calculate the probability of getting 1 head in 3 tosses, e.g.,
1 2 1
P( TTH ) = ˆ
2 2
1 1 1 3
P(1 H in 3 tosses) = + + =
8 8 8 8
What will be the probability of obtaining 1 H in 50 tosses? Do we write down the sample
space for the experiment for 50 tosses? It will be very tedious.
1. Count the number of successes X that occur in n independent bernoulli trials, then X
is said to be a binomial random variable with parameters (n, p)
2. Bernoulli random variable is just a binomial random variable with parameters (1, p)
3. If X1 , X2 , . . . , Xn are chosen independently and each has the Bernoulli( p) distribution,
and Y = X1 + ¨ ¨ ¨ + Xn , then Y will have the Binomial (n, p) distribution
Practical Life Examples of Binomial Distribution
1. Each item from manufacturing production line can be either defective or non-defective.
2. Chances of profit and loss in Stock Market.
3. Proportion of students who pass the exam.
4. Proportion of people who have recovered from COVID-19.
5. Proportion of voters who favored Biden in the election.
4.5.2 §§ Conditions for Binomial Distribution

Definition 4.5.1.
Binomial Experiment
1. A random experiment with fixed number of trials, i.e., n.
2. Trials are independent & identical.
3. Each trial results in one of 2 possible outcomes, i.e., ”success”, or a ”failure”.
4. The probabilities of success p and of failure (q = 1 ´ p) are constant across

trials.
67
Definition 4.5.2 (Binomial Distribution (pm f )).
A random variable X „ Bin (n, p), where n & p are the parameters of the Binomial
distribution. The pm f for Binomial distribution is:

n x
P( X = x ) = p (1 ´ p)n´x
x
1. n is the fixed number of trials.
2. x is the number of successes in n trials, i.e., x = 0, 1, . . . , n.
3. p is the probability of success on any given trial.

n
4. is the binomial coefficient.
x
5. n & p are the parameters of the binomial distribution.
Example 4.5.3
A particular concentration of a chemical found in polluted water has been found to be lethal
to 20% of the fish that are exposed to the concentration for 24 hours. Ten fish are placed in
a tank containing this concentration of the chemical in water.
(a). Find the probability that at least 8 survive.
(b). Find the probability that at most 6 survive.
Solution:
n = 10; p = 0.20. Let X be the number of fish that survive,
(a).
P( X ě 8) = P( X = 8) + P( X = 9) + P( X = 10)

10 8 10´8 10
= 0.2 (1 ´ 0.2) + 0.29 (1 ´ 0.2)10´9
8 9

10
+ 0.210 (1 ´ 0.2)10´10
10
= 0.000078
68
(b).
P ( X ď 6) = 1 ´ P ( X ą 6)

= 1 ´ P( X = 7) + P( X = 8) + P( X = 9) + P( X = 10)

10 7 10´7 10
= 1´ 0.2 (1 ´ 0.2) + 0.28 (1 ´ 0.2)10´8
7 8

10 9 10´9 10 10 10´10
+ 0.2 (1 ´ 0.2) + 0.2 (1 ´ 0.2)
9 10
= 0.999136
Example 4.5.3
Example 4.5.4
An airline estimates that 5% of the people making reservations on a certain flight will not
show up. Consequently, their policy is to sell 84 tickets for a flight that can only hold 80
passengers. What is the probability that there will be a seat available for every passenger
that shows up?
Solution:
P(No show) = 0.05; 6 P(show) = 0.95; n = 84
P( X ď 80) = 1 ´ P( X ą 80)

= 1 ´ P( X = 81) + P( X = 82) + P( X = 83) + P( X = 84)
= 0.6103
There will be 61.03% chance of a seat being available for everyone.

Example 4.5.4
Example 4.5.5
A hospital receives 1/5 of its COVID-19 vaccine shipments from Moderna and the remainder
of its shipments from Pfizer. Each shipment contains a very large number of vaccine vials.
For Moderna shipments, 10% of the vials are ineffective, while for Pfizer, 2% of the vials are
ineffective. The hospital tests 30 randomly selected vials from a shipment and finds that one
vial is ineffective. What is the probability that this shipment came from Moderna?
Solution:
Let M be the event that shipment is from Moderna, P be the event that shipment is from
Pfizer, while I be the event that shipment is ineffective.
We are given P( M ) = 1/5; P( P) = 1 ´ 1/5 = 4/5; n = 30. Let X be the number of
ineffective vials in the sample of size 30.

30
1. P( I|M) = 0.10; P( X = 1|M ) = 0.101 (1 ´ 0.10)30´1 = 0.141
1
69

30
2. P( I|P) = 0.02; P( X = 1|P) = 0.021 (1 ´ 0.02)30´1 = 0.334
1
P( M|I ) is asking to get an updated probability of having the ineffective shipment from
Moderna.
P( I|M) P( M )
P( M|I ) =
P( I )
P( I X M)
=
P( I X M) + P( P X M)
1/5 ˆ 0.141
=
1/5 ˆ 0.141 + 4/5 ˆ 0.334
= 0.0954
There is a 9.54% chance that the ineffective vial is from Moderna.

Example 4.5.5
Example 4.5.6
The probability of a student passing an exam is 0.2. How many students must take the exam
to make the probability 0.99 that any number of students will pass the exam?
Solution:
p = 0.2; n =?. Let X be the number of students who pass. Any number of students will
pass, means that at least 1 student will pass the exam.
P ( X ě 1) = 1 ´ P ( X ă 1)

n
0.99 = 1 ´ 0.20 (1 ´ 0.2)n´0
0
0.99 = 1 ´ 0.8n
0.8n = 1 ´ 0.99
n = log(0.01)/log(0.8)
= 20.6377
Therefore n « 21. So 21 students must take an exam so that probability of any passing the
exam is 0.99.
Example 4.5.6
4.5.4 §§ Shape of Binomial Distribution

Binomial distribution is unimodal.
1. If p ă 0.5 the distribution will exhibit POSITIVE SKEWNESS, as shown in Figure

4.5.1
2. for p ą 0.5 the distribution will exhibit NEGATIVE SKEWNESS, (see, Figure 4.5.2).
70
3. if p = 0.5 the distribution will be SYMMETRIC, (see, Figure 4.5.3).
4. if n Ñ 8 binomial distribution becomes symmetric & bell-shaped, (see, Figure 4.5.4).

This is an important result that is central to the central Limit Theorem.
Figure 4.5.1.
n=15, p=0.2
0.25
0.20
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 11 13 15
71
Figure 4.5.2.
n=15, p=0.8
0.25
0.20
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 11 13 15
Figure 4.5.3.
n=15, p=0.5
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 11 13 15
72
Figure 4.5.4.
n=40, p=0.2
0.15
0.10
0.05
0.00
0 3 6 9 12 16 20 24 28 32 36 40
4.5.5 §§ Binomial Distribution: Expectation & Variance
n
ÿ n x
E( X ) = xi p (1 ´ p)n´x
x
i =0
= np
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= np(1 ´ p)
Example 4.5.7
A company is considering drilling four oil wells. The probability of success for each well is
0.40, independent of the results for any other well. The cost of each well is $200,000. Each
well that is successful will be worth $600,000. What is the expected gain?
Solution:
Let X be the number of successful wells, i.e., X = 0, 1, . . . , 4. n = 4; p = 0.4. X is a binomial
random variable. The cost is a fixed constant of $200,000. So the total cost of 4 wells is a
fixed constant, i.e., b = $800, 000. The worth of each successful well is a fixed constant of
a = $600, 000. Let Y be the gain from 4 wells. Then Y = aX ´ b
73
Discrete Distributions 4.6 Poisson Distribution
E( X ) = np
= 4 ˆ 0.4
= 1.6
E(Y ) = E( aX ´ b)
= aE( X ) ´ b
= 600, 000 ˆ 1.6 ´ 800, 000
= 160000
The expected gain from drilling 4 oil wells is $160,000.

Example 4.5.7
4.6 Ĳ Poisson Distribution

Many experimental situations occur in which we observe the counts of events within a set
unit of time, area, volume, length, etc.
§§ Background Example
The number of typing errors made by a typist has a Poisson distribution with an average of
four errors per page. If more than four errors appear on a given page, the typist must retype
the whole page. What is the probability that a randomly selected page needs to be retyped?
The Poisson distribution applies particularly to rare events, i.e., events which occur in-
frequently in time, space, volume or any other dimension.
1. The number of airplanes that come into an airport in 2 hours.
2. The number of phone calls received by a telephone operator in a 10-minute period.
3. The number of typos per page made by a secretary.
4. The number of customers arriving in 1 hour at a shop.
5. The number of defects in a certain sized carpet.
4.6.1 §§ Conditions for Poisson Variable

A random variable X has a Poisson distribution if the following conditions hold
1. X counts the number of events within a specified time or space, etc.
2. The events occur independently of each other.
3. Any 2 events can not happen exactly at the same time.
74
4. A Poisson random variable can take on any positive integer value, i.e., X = 0, 1, 2, . . ..
In contrast, the Binomial distribution always has a finite upper limit, i.e., X =
0, 1, 2, . . . , n.
Figure 4.6.1.
75

Definition 4.6.1 (Poisson Distribution (pm f )).
A random variable X „ Poisson(λ), where λ is the only parameters of the Poisson

distribution. The pm f for Poisson distribution is:
λx
P( X = x ) = e´λ , x = 0, 1, 2, . . .
x!
• Here X is the number of events that occur during the specified 1 unit of time
• λ is the average rate of events that occur during the specified 1 unit of time,
space, volume, etc is the parameter of the Poisson Distribution.
The pm f for Poisson distribution for various values of λ is shown in Figure 4.6.1.
4.6.3 §§ Poisson Distribution: Expectation & Variance
8
ÿ λ xi
E( X ) = xi e´λ
xi !
i =0
=λ
Var ( X ) = E( X 2 ) ´ [ E( X )]2
=λ
Poisson distribution has the property that mean & variance are equal.
§§ Units in Poisson Probability

It is important to use consistent units in the calculation of probabilities, means, and variances
involving Poisson random variables, e.g., if there are 25 imperfections on average in 100 meters
of optical cable, then the
1. average number of imperfections in 10 meters of optical cable is 2.5, and the
2. average number of imperfections in 1000 meters of optical cable is 250.
Example 4.6.2
1. The number of typing errors made by a typist has a Poisson distribution with an average
of four errors per page. If more than four errors appear on a given page, the typist
must retype the whole page. What is the probability that a randomly selected page
needs to be retyped?
76
2. The number of meteors found by a radar system in any 30-second interval under speci-
fied conditions averages 1.81. Assume the meteors appear randomly and independently.
What is the probability that at least one meteor is found in a one-minute interval?
Solution:
1. λ = 4/page. Let X be the number of typing errors made. We need to calculate the
probability of retyping a randomly selected page, i.e., P( X ą 4)
P ( X ą 4) = 1 ´ P ( X ď 4)

= 1 ´ P ( X = 0) + P ( X = 1) + P ( X = 2) + P ( X = 3) + P ( X = 4)
0 1 2 3 4
´4 4 ´4 4 ´4 4 ´4 4 ´4 4
= 1´ e +e +e +e +e
0! 1! 2! 3! 4!
= 0.3711
2. λ = 1.81/30-Second. Let X be the number of meteors. We need to calculate probability

of at least 1 meteor in one minute, i.e., P( X ě 1). Here time specified is one minute,
while unit of λ is per 30-second. We need to make sure that both are in the same units.
P ( X ě 1) = 1 ´ P ( X ă 1)
= 1 ´ P ( X = 0)
0
´1.81ˆ2 1.81 ˆ 2
= 1´ e
0!
= 0.9732
Remember that for calculation of probabilities of at ’least type’ or ’greater than type’ events
for Poisson distribution, you will always have to use ’Complement Rule of Probability’.
Example 4.6.2
4.6.4 §§ Poisson Approximation to the Binomial Distribution

The Poisson distribution is used as an approximation to the binomial distribution when the
parameter n is large, (i.e., n Ñ 8) while p is small (p Ñ 0);
A rule of thumb; when np ă 7, then λ « np, we can use Poisson Approximation to the
Binomial Distribution to find approximate probabilities.
Example 4.6.3
5% of the tools produced by a certain process are defective. Find the probability that in a
sample of 40 tools chosen at random, exactly three will be defective. Calculate a) using the
binomial distribution, and b) using the Poisson distribution as an approximation.
Solution:
p = 0.05; n = 40; np = 2; λ « 2
77
Discrete Distributions 4.7 Geometric Distribution
(a) Binomial Distribution (b) Poisson Distribution

40 23
P ( X = 3) = 0.053 (1 ´ 0.05)40´3 P( X = 3) = e´2
3 3!
= 0.1851 = 0.180447
As np ă 7, so Poisson approximation to Binomial is quite accurate.
Example 4.6.3
4.6.5 §§ Comparison of Binomial & Poisson Distribution

Binomial Poisson
Fixed number of trials (n) A large number of trials
Fixed probability of success ( p) Very small probability of success ( p)
Random variable: X Random variable: X
= Number of successes. = Number of success within
a specified time, space, etc
Possible values: 0 ď X ď n Possible values: X ě 0
Parameters: n, p Parameter: λ
4.7 Ĳ Geometric Distribution

§§ Background: A Case Study
How can we use probability to solve problems involving the expected number of times before
we get 1st success.
At a ’busy time,’ a telephone exchange is very near capacity, so callers have difficulty
placing their calls. It may be of interest to know the number of attempts necessary in order
to make a connection. Let p = 0.05 be the probability of a connection during a busy time.
1. What is the probability you will have to make 5 attempts to make a successful call?
2. How many attempts do you expect to make for a successful call?
4.7.1 §§ Geometric Distribution Conditions

1. Instead of a pre-planned number of trials, we keep conducting Bernoulli trials, until we
finally get 1st success.
2. Other than that, the trials are independent and
3. Each trial can result in either a success (S) or a failure (F)
4. The probability of success p is constant for each trial.
Example 4.7.1
Toss a coin until you get a ’H’
78
1
1. P(H on 1st toss)=
2
1 1
2. P(T on 1st. H on 2nd toss)= .
2 2
1 2 1
3. P(T on 1st 2. H on 3rd toss)= .
2 2
& so on until we get 1st ’H’.
The probabilities of the number of tosses until 1st ’H’ are displayed in Figure 4.7.1.
Example 4.7.1
Figure 4.7.1.
0.5
●
0.4
0.3
P(x)
●
0.2
●
0.1
●
●
●
0.0
● ● ● ● ● ● ● ●
2 4 6 8 10 12 14
x: Number of Tosses up to and including 1st Head
Geometric Distribution: pm f for coin toss until 1st head.
79
Definition 4.7.2 (Geometric Distribution (pm f )).
Let X be the number of trials up to and including the 1st success, i.e., x = 1, 2, 3, . . ..
Then
No of Trials ( X ) P( X )
1 p
2 (1 ´ p ) p
3 (1 ´ p )2 p
.. ..
. .
The terms in this pm f form a geometric sequence as in Figure 4.7.1, that is why
the distribution is called Geometric Distribution. In general,
P( X = n) = (1 ´ p)n´1 p
Some references define Geometric distribution as the number of failures until you
get 1st success, i.e., Number of failures = number of trials - 1 that are followed by
1st success.
Example 4.7.3
A driver is eagerly eyeing a precious parking space some distance down the street. There are
five cars in front of the driver, each of which having a probability 0.2 of taking the space.
What is the probability that the car immediately ahead will enter the parking space?
Solution:
p = 0.2. Five cars in front & the probability that the car immediately ahead will enter the
parking space is P( X = 5)
P( X = 5) = (1 ´ .2)4 0.2 = 0.082
Example 4.7.3
80
Figure 4.7.2.
1.0
● ● ● ● ● ● ● ●
●
●
●
0.9
●
0.8
F(x)
●
0.7
0.6
0.5
2 4 6 8 10 12 14
x: Number of Tosses up to and including 1st Head
Geometric Distribution: cd f for coin toss until 1st head.
4.7.3 §§ Geometric Distribution: Cumulative Distribution Function cd f

There is a useful closed-form formula for the cumulative distribution function (cdf), i.e.,
P( X ď k ). The sample space S can be decomposed into 2 events
1. The event X ď k denotes that the first success occurs within k attempts.
2. Its complement is the event that there are no successes in any of the first k attempts,
which has a probability of qk . You will only need more than k attempts, if the 1st k
attempts all resulted in failure.
6 P( X ď k ) = 1 ´ P( X ą k ) = 1 ´ qk ;
where q = 1 ´ p.
The cd f for coin toss until 1st H is shown in Figure 4.7.2. The cd f is a step function
which follows the properties of cd f .
Example 4.7.4
Assume that the probability of a specimen failing during a given experiment is 0.1. What
is the probability that it will take more than three specimens to have one surviving the
experiment?
81
Solution:
Let X be the number of specimens. Probability of surviving p = 1 ´ 0.1 = 0.9. We are
interested in P( X ą 3) to get one specimen surviving the experiment.
We can calculate the required probability using 2 approaches as explained below:
1. X ą 3 in Geometric means that in the 1st 3 specimens tested, none did survive the
experiment.
2. Let Y be the number of specimen surviving, i.e. success then we can fix n = 3 & find
the probability that none of the specimen did survive in the 1st 3 tested in the binomial
experiment. That is P( X ą 3) = P(Y = 0).
1. Geometric Distribution 2. Binomial Distribution

P ( X ą 3 ) = P (Y = 0 )
P ( X ą 3) = 1 ´ P ( X ď 3)
3
= 0.90 (1 ´ 0.9)3´0
= 1 ´ 1 ´ 0.13 0
= 0.001 = 0.001
Example 4.7.4
4.7.4 §§ Geometric Distribution: Expectation & Variance
ÿ
E( X ) = xi (1 ´ p)i´1 p
i =1
1
=
p
Var ( X ) = E( X 2 ) ´ [ E( x )]2
1´ p
=
p2
1
Expected number of trials required to obtain the 1st success is .
p
82
Definition 4.7.5 (Geometric Distribution: Memoryless Property).
The geometric distribution has the memoryless (forgetfulness) property. The dis-
tribution would be exactly the same regardless of the past, i.e.,
P( X ą n + k X X ą k)
P( X ą n + k|X ą k ) =
P( X ą k)
P( X ą n + k)
=
P( X ą k)
(1 ´ p ) n + k
=
(1 ´ p ) k
= (1 ´ p)n+k´k
= (1 ´ p ) n
= P( X ą n)
P( X ą n) = (1 ´ p)n is the probability that it takes more than n trials to get 1st
success means that all the previous n trials resulted in failure. Use of this property
simplifies conditional probability problems!
Example 4.7.6
The Super Breakfast Challenge (SBC) is known to be very difficult to consume. Only 10%
of people are able to eat all of the SBC.
1. How many people are needed, on average, until the first successful customer?
2. What is the variance of the number of people needed?
3. Given that the first 4 are unsuccessful, what is the probability at least 8 are needed?
Solution:
Let X be the number of people required until the first successful customer
1. E( X ) = 1/p = 1/0.1 = 10
2. Var ( X ) = (1 ´ p)/p2 = (1 ´ 0.1)/0.12 = 90
3. P( X ě 8|X ą 4) = P( X ą 7|X ą 4) = P( X ą 3) = (1 ´ 0.1)3 = 0.729. Remember

that we converted X ě 8 = X ą 7 to use memoryless property.
Example 4.7.6
83
Discrete Distributions 4.8 Negative Binomial Distribution
4.8 Ĳ Negative Binomial Distribution
§§ A Case Study
A coach wants to put together an intramural basketball team, from people living in a large
dorm. She estimates that 12% of people in the dorm like to play basketball. She goes door to
door to ask people if they would be interested in playing on the team. What is the probability
that she needs to
1. interview 20 dorm residents to find 1 willing to play?
2. talk to 20 people, in order to find 5 people who will join the team?
3. How many dorm residents does she expect to interview before finding 5 people to create
the team?
Example 4.8.1 (A Coin Toss Scenario)

We fix a positive integer r ą 1, and toss the coin until the rth head appears. Figure 4.8.1
shows the pm f for different values of r.
Example 4.8.1
84
Figure 4.8.1.
0.25
● ●
● r=2
● r=5
0.20
● r=10
●
0.15
● ●
prob
● ●
●
0.10
●
● ●
● ●
● ● ●
●
●
● ● ●
0.05
● ●
●
●
● ● ●
● ●
● ●
● ●
● ● ●
● ● ●
0.00
● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ● ● ●
5 10 15 20 25 30
x: Number of Tosses to get r Heads
Negative Binomial Distribution pm f for different values of number of heads r
Definition 4.8.2 (Negative Binomial Distribution (pm f )).
The number of trials X required to obtain r successes has a negative binomial

distribution

x´1 r
P( X = x ) = p (1 ´ p) x´r , where x = r, r + 1, . . . ,
r´1
where r & p are the 2 parameters of the negative binomial distribution. The number
of successes r ą 1 & the probability of success p is fixed from trial to trial. Negative
distribution is also known as the Pascal distribution.
85
1. Negative Binomial distribution is a more general version of Geometric probability dis-

tribution. That is we conduct iid1 Bernoulli trials until rth success appears, where
r is specified in advance. For r = 1 the negative binomial distribution converts to
Geometric Distribution.
2. The experiment consists of X number of repeated trials to produce r successes in such

experiment
3. The Negative Binomial can also be defined in terms of the number of failures until the
rth success.
4.8.2 §§ Negative Binomial Distribution: Expected Value & Variance
8
ÿ x´1 r
E( X ) = x p (1 ´ p) x´r
x =r
r ´ 1
r
=
p
Var ( X ) = E( X 2 ) ´ [ E( x )]2
r (1 ´ p )
=
p2
i.e., the expected value and variance of the number of trials that it takes to get r successes.
When r = 1, then mean & variance transform to the mean & variance of the Geometric
Distribution.
Example 4.8.3
1. A curbside parking facility has a capacity for 3 cars. Determine the probability that it
will be full within 10 minutes. It is estimated that 6 cars will pass this parking space
within the time span and, on average, 80% of all cars will want to park there.
2. A public relations intern realizes that she forgot to assemble the consumer panel her
boss asked her to do. She panics and decides to randomly ask (independent) people if
they will work on the panel for an hour. Since she is willing to pay them for their work,
she believes she will have a 75% chance of people agreeing to work with her. Find the
probability that she will need to interview at least 10 people to find 5 willing to work
on the panel?
Solution:
1. The desired probability is simply the probability that the number of cars until the third
success (taking the parking space) is less than or equal to 6, i.e., we need to compute
1 independent & identically distributed
86
the cd f . Let X be the number of cars to the third success, then X has a negative
binomial distribution with r = 3 and p = 0.8.
P ( X ď 6) = P ( X = 3) + P ( X = 4) + P ( X = 5) + P ( X = 6)

3´1
= 0.83 (1 ´ 0.8)3´3
3´1

4´1
+ 0.83 (1 ´ 0.8)4´3
3´1

5´1
+ 0.83 (1 ´ 0.8)5´3
3´1

6´1
+ 0.83 (1 ´ 0.8)6´3
3´1
= 0.983
2. The desired probability is simply the probability that the number of people to ask to
get the fifth success is at least 10, i.e., P( X ě 10). If X is this number, it has a negative
binomial distribution with r = 5 and p = 0.75.
P( X ě 10) = 1 ´ P( X ă 10)

= 1 ´ P ( X = 5) + P ( X = 6) + P ( X = 7) + P ( X = 8) + P ( X = 9)

5´1
= 1´ 0.755 (1 ´ 0.75)5´5
5´1

6´1 5 6´5 7´1
+ 0.75 (1 ´ 0.75) + 0.755 (1 ´ 0.75)7´5
5´1 5´1

8´1 5 8´5 9´1 5 9´5
+ 0.75 (1 ´ 0.75) + 0.75 (1 ´ 0.75)
5´1 5´1
= 0.0489
Alternate Method: This problem can be solved based on the intuitive concept that she
will only need to interview 10 or more people if she fails to get the required number of
people willing to work for her, i.e., less than 5 people are willing to work from the 9
people she would have interviewed. So n = 9. Let Y be the number of people willing
to work.
P( X ě 10) = P(Y ă 5)
= P (Y = 0 ) + P (Y = 1 ) + P (Y = 2 ) + P (Y = 3 ) + P (Y = 4 )

9 0 5´0 9
= 0.75 (1 ´ 0.75) + 0.751 (1 ´ 0.75)5´1
0 1

9 9
+ 0.752 (1 ´ 0.75)5´2 + 0.753 (1 ´ 0.75)5´3
2 3

9
+ 0.754 (1 ´ 0.75)5´4
4
= 0.0489
87
Discrete Distributions 4.9 Hypergeometric Distribution
Example 4.8.3
4.8.3 §§ Comparison of Binomial, & Negative Binomial Models

Binomial Negative Binomial
Fixed number of trials (n) Fixed number of successes (r )
Fixed probability of success ( p) Fixed probability of success ( p)
Random variable: X Random variable: X
= Number of successes. = Number of trials until the
rth success
Possible values: 0 ď X ď n Possible values: r ď X
1. The negative binomial gets its name as it does the opposite of what the binomial does.
2. Geometric distribution is a special case of the negative binomial distribution when you
get the 1st success.
4.9 Ĳ Hypergeometric Distribution

§§ A Case Study: Dependent Bernoulli Trials
Figure 4.9.1.
1. Draw a ball from the urn, note the color of the ball & don’t replace it back in the urn.
P(1st black ball) = 10/20
2. Draw a 2nd ball without replacement from the urn, note the color of the ball,
P(2nd black ball) = 10/20 ˆ 9/19
& so on until n = 10th ball draw,
88
3. Find the probability that 10 black balls are obtained?
The hypergeometric distribution pm f is shown in Figure 4.9.2.
Figure 4.9.2.
0.4
● hypergeom (20, 10, 10)

●
0.3
● ●
P(x)
0.2
0.1
● ●
● ●
0.0
● ● ● ●
0 2 4 6 8 10
x: # of Black Balls from a bag with 10 R & 10 B Balls
1. Useful in experiments where n elements are picked at random without replacement

from a small finite population of size N
2. The population of interest is dichotomized; Succss (S) & Failure (F)
3. It describes the probability of x successes in n draws without replacement from a finite

population of size N containing exactly M successes.
89
Success Failure Total

Samples Drawn x n´x n
Not drawn M´x ( N ´ M) ´ (n ´ x ) Nń
Total M N´M N

Definition 4.9.1 (Hypergeometric Distribution (pm f )).
X „ hypergeom( N, M, n)
(M N´M
x )( n´x )
P( X = x ) = , where X = 0, 1, min( M, n)
( Nn )
Parameters
1. Sample Size: n where (1 ď n ď N )
2. Population Size: N where ( N ě 1)
3. Number of Successes: M where ( M ě 1)
Example 4.9.2
In a group of 25 factory workers, 20 are low-risk and 5 are high-risk. Two of the 25 factory
workers are randomly selected without replacement. Calculate the probability that exactly
one of the two selected factory workers is low-risk.
Solution:
N = 25; M = 5; n = 2
(20 5
1 )(1)
P ( X = 1) =
(25
2)
= 0.3333
Example 4.9.2
4.9.2 §§ Hypergeometric Distribution: Expected Value & Variance

M
E( X ) = n
N

M N´M Nń
Var ( X ) = n
N N N´1
90
Nń
The factor is called finite population correction. For a fixed sample size n, as N Ñ 8
N´1
it is clear that the correction goes to 1, i.e., for infinite populations the hypergeometric
distribution can be approximated by Binomial.
Example 4.9.3
A college student is running late for his class. He has 12 folders on his desk, 4 of which in-
clude assignments due today. Without taking time to look, he accidentally grabs just 3 folders
from his stack. When he gets to class, he counts how many of them contain his homework
assignments. What is the probability at least 2 of the 3 folders contain his assignments?
Solution:
N = 12; M = 4; n = 3
P ( X ě 2) = P ( X = 2) + P ( X = 3)
(42)(81) (43)(80)
= +
(12
3) (12
3)
= 0.2363
Example 4.9.3
4.9.3 §§ Binomial Approximation to Hypergeometric Distribution
Rule of Thumb: For very large population size N, if the sample size n is at most 5% of the
population size and sampling is without replacement, then the experiment may be analyzed
as if it were a binomial experiment. The probability of success p in this case is approximated
as M/N « p.
Example 4.9.4
A nationwide survey of 17,000 college seniors by the University of Michigan revealed that
almost 70% disapprove of daily smoking. If 18 of these seniors are selected at random and
asked their opinion, what is the probability that more than 9 but fewer than 14 disapprove
of smoking daily?
Solution:
N = 17, 000; p = 0.70; M = 4; n = 18; n/N = 18/17000 = 0.001. As n ď 0.05N, so we
can effectively use the binomial approximation to hypergeometric.
91
Discrete Distributions 4.10 Home Work
P(9 ă X ă 14) = P( X = 10) + P( X = 11) + P( X = 12) + P( X = 13)

18
= 0.710 (1 ´ 0.7)18´10
10

18
+ 0.711 (1 ´ 0.7)18´11
11

18
+ 0.712 (1 ´ 0.7)18´12
12

18
+ 0.713 (1 ´ 0.7)18´13
13
= 0.6077
Example 4.9.4
In general, it is a bit difficult to decide the appropriate distribution in a particular scenario.
Students should practice problems that will provide them with some skills for making correct
decisions. Figure 4.9.3 might be useful in making a correct choice.
Figure 4.9.3.
4.10 Ĳ Home Work

In each problem given below, also write down the name & the parameters of a distribution
applicable if any.
1. Ten motors are packaged for sale in a certain warehouse. The motors sell for $100 each,
but a double-your-money-back guarantee is in effect for any defectives the purchaser
92
may receive. Find the expected net gain for the seller if the probability of any one
motor being defective is .08. (Assume that the quality of any one motor is independent
of the others.)
2. The demand for a particular type of pump at an isolated mine is random and indepen-
dent with an average demand of 2.8 pumps in a week (7 days). Further supplies are
ordered each Tuesday morning and arrive on the weekly plane on Friday morning. Last
Tuesday morning only one pump was in stock, so the storesman ordered six more to
come on Friday morning. Find the probability that stock will be exhausted and there
will be unsatisfied demand for at least one pump by Friday morning.
3. A salesperson has found that the probability of a sale on a single contact is approxi-
mately .03. If the salesperson contacts 100 prospects, what is the approximate proba-
bility of making at least one sale?
4. Used watch batteries are tested one at a time until a good battery is found. Let X
denote the number of batteries that need to be tested in order to find the first good
one. Find the expected value of X, given that P( X ą 3) = 0.5
5. A research study is concerned with the side effects of a new drug. The drug is given
to patients, one at a time, until two patients develop side effects. If the probability of
getting a side effect from the drug is 1/6, what is the probability that eight patients
are needed?
6. When drawing cards with replacement and re-shuffling, you bet someone that you can
draw an Ace within k draws. You want your chance of winning this bet to be at least
52%. What is the minimum value of k needed? What is the probability that you will
need at least ten draws to get 4 Aces?
7. A company is interested in evaluating its current inspection procedure for shipments

of 50 identical items. The procedure is to take a sample of 5 and pass the shipment
if no more than 2 are found to be defective. What proportion of shipments with 20%
defectives will be accepted?
§§ Answers
1. $840
2. 0.3374
3. 0.9524
4. « 5
5. « 0.0651
6. « 10; 0.0236
7. 0.9517
93
94
Chapter 5
Continuous Distributions
AS YOU READ . . .
1. What is Continuous Random Variable?
2. What is Continuous Uniform Distribution & its parameters? In which scenario can we
use it to model probabilities?
3. What is Normal Distribution & its parameters? Why is Normal Distribution widely
applicable in practical life?
4. What is the Exponential Distribution & its parameters? How can we use it to model
chances of waiting times?
5.1 Ĳ Continuous Random Variable

§§ A Case Study
Consider daily rainfall in Karachi in July. Theoretically, using measuring equipment with
perfect accuracy, the amount of rainfall could take on any value e.g., between 0 and 5 inches.
Let X represents the amount of rainfall in inches. We might want to calculate probabilities
such as
1. the amount of rainfall in Karachi in July this year would be less than 5 inches, i.e.,
P( X ă 5) or
2. the amount of rainfall in Karachi in July this year would be between 2-inches to 4-
inches, i.e., P(2 ď X ď 4)
The amount of rainfall X being a continuous random variable, includes all values in an interval
of real numbers. This could be an infinite interval such as (´8, 8). You could usually state
the beginning and end points, but you would have infinitely many possibilities of answers
within that range, e.g., 2 ď X ď 4; (see Figure 5.1.1).
95
Continuous Distributions 5.1 Continuous Random Variable
Figure 5.1.1.
f(x)
0 1 2 3 4 5 6
Distribution of Amount of Rainfall.
§§ Continuous Random Variable: Applications

Below are some real life applications of continuous random variables in different fields.
1. Medical Trials: the time until a patient experiences a relapse.
2. Sports: the length of a javelin throw.
3. Ecology: the lifetime of a tree.
4. Manufacturing: the diameter of a ball bearing.
5. Computing: the amount of time a Help Line customer spends on hold.
6. Physics: the time until a uranium atom decays.
7. Oceanography: the temperature of ocean water at a specified latitude, longitude and

depth.
96
Continuous Distributions 5.2 Continuous Probability Distribution
Figure 5.1.2.
20
15
10
f(x)
5
0
0.18 0.22
5.2 Ĳ Continuous Probability Distribution

5.2.1 §§ Background
For continuous random variables, since there is an infinite number of possible values, we
describe the probability distribution with a smooth curve. Probability density function (pdf)
is an analogue of the probability mass function (pmf) for discrete random variable.
1. How likely is it that X falls between 0.18 and 0.22?
2. We can answer this question by considering the area under the curve between 0.18 and
0.22, as shown by shaded area in Figure 5.1.2
3. The overriding concept here for a continuous random variable is that AREA=PROBABILITY.
4. More specifically, the area under the pdf curve between points a and b is the same as
the probability that the random variable will have a value between a and b.
5. If f ( x ) is a known function, then we can answer this question by integration, i.e.,
0.22
ż
P(0.18 ď X ď 0.22) = f ( x )d( x )
0.18
97
S&P % Returns displayed in Figure 5.2.1 show real life application of continuous probability
distribution in Forex.
Figure 5.2.1.
Definition 5.2.1 (Probability Density Function: pd f ).
1. A pd f for a continuous random variable is defined for all real numbers in the
range of the random variable.
2. More specifically, the area under the pd f curve between points a and b is the
same as the probability that the random variable will have a value between a
and b, (see Figure 5.2.2).
żb
P( a ď X ď b) = f ( x )d( x )
a
98
Figure 5.2.2.
§§ Properties of pdf
For a continuous random variable X, a probability density function pd f is a function such
that
1. f ( x ) ě 0, @x; compare with discrete case where, 0 ď P( X = x ) ď 1, @x P S .

ż8 ÿ
2. f ( x )d( x ) = 1; Recap: P( x ) = 1 for a discrete random variable.
´8 i
ża
3. P( X = a) = P( a ď X ď a) = f ( x )d( x ) = 0
a
4. This has a very useful consequence in the continuous case:
P( a ď X ď b) = P( a ă X ď b) = P( a ď X ă b) = P( a ă X ă b)
Example 5.2.2
Let a continuous random variable X has density function
A(1 ´ x2 ) ´1 ă x ă 1,
"
f (x) =
0 elsewhere
1. Find the value of A for which f ( x ) would be a valid density function.
2. Find the probability that X will be more than 1/2 but lesser than 3/4.
3. Find the probability that X will be greater than 1/4.
99
4. Find the cd f , i.e., F ( X ).

Solution:
To find A we require
ż8
f ( x )dx = 1
´8
ż1
= A(1 ´ x2 )dx
´1
1
x3
= A( x ´ )
3 ´1
= A(1 ´ 1/3 ´ (´1 + 1/3))
= A(2 ´ 2/3)
6 A = 3/4
3/4
3
ż
P(1/2 ď X ď 3/4) = (1 ´ x2 )dx
4
1/2
3/4
3 x3
= ( x ´ )
4 3 1/2
= 29/256
ż1
P( X ě 1/4) = 3/4(1 ´ x2 )dx
1/4
1
3 x3
= ( x ´ )
4 3 1/4
= 81/256
An expression for F ( x ) is:
żx
3
F(x) = (1 ´ x2 )dx
4
´1
2 + 3x ´ x3
=
4
$
’
’ 0 x ă ´1,
2 + 3x ´ x
& 3
F(x) = ´1 ă x ă 1
’
’ 4
% 1 xě1
100
Example 5.2.2
Example 5.2.3
The probability density function of the time to failure of an electronic component in a copier
(in hours) is
0 x ă 0,
#
f (x) = 1 ´0.5x
e for x ě 0
2
a. Determine the probability that a component fails in the interval from 1 to 2 hours.
b. At what time do we expect 50% of the components to have failed, i.e., median of the
distribution
Solution:
a.
ż2
1 ´0.5x
P (1 ď X ď 2) = e dx
2
1
2

= ´( e ´0.5x
)
1
= 0.2386
b. For the median of the distribution, we need to find the value of X that divides the
distribution into two halves, e.g., P(0 ď X ď x ) = 0.5.
P(0 ď X ď x ) = 0.5
żx
1 ´0.5x
= e dx
2
0
x

´0.5x
= ´( e )
0
= ´( e ´0.5x
) ´ (e´0.5(0) )
0.5 = 1 ´ (e´0.5x )
Solving for x, we get x = 1.3865.
Example 5.2.3
101

Definition 5.2.4 (Cumulative Distribution Function (cd f )).
We often need to compute the probability that the random variable X will be less
than or equal to a, i.e. P( X ď a), known as cd f
Continuous Case
F ( x ) = P( X ď a)
= F ( a)
ża
= f ( x )d( x )
´8
Discrete Case
F ( x ) = P( X ď a)
= F ( a)
ÿ
= P( x )
Xďa
§§ pdf from cdf

Discrete Case (pm f from cd f ): pm f was the jump size in the step function. The size of the
jump at any x can be written as
PX ( xi ) = F ( xi ) ´ F ( xi´1 )
Continuous Case (pd f from cd f ): The density pd f is the derivative of the cd f

d
f (x) = F ( x ).
dx
This holds at every x at which the derivative of F ( x ), denoted by F1 ( x ), exists.
§§ Use of cdf to find the probabilities

In the continuous case, it is very useful to use the cdf to find probabilities using the formulas:
P( X ą a) = 1 ´ P( X ď a)
= 1 ´ F ( a)
P( a ď X ď b) = F (b) ´ F ( a)
Example 5.2.5
If X is a continuous random variable with cd f given by
"
0 x ă 0,
F(x) = ´0.5x
1é for x ě 0
102
Find the pd f of x.
1 ´0.5x
#
d e xě0
f (x) = F(x) = 2
dx 0 elsewhere
0 x ă 0,
#
6 f (x) = 1 ´0.5x
e for x ě 0
2
Example 5.2.5
5.2.3 §§ Expectation
Definition 5.2.6 (Expectation).
Mathematically, the expected value E( X ) is defined as:
§§ Continuous Case
ż8
E( X ) = x f ( x )d( x )
ż´8
8
E g( X ) = g( x ) f ( x )d( x )
´8
Graphically, E( X ) is the point where the distribution balances as shown in Figure

5.2.3.
§§ Discrete Case
ÿ
E( X ) = x j P( x j )
j
ÿ
E g( X ) = g( x j ) P( x j )
j
103
Figure 5.2.3.
5.2.4 §§ Variance
104
Definition 5.2.7 (Variance σ2 ).
Mathematically, the variance of the random variable X denoted as σ2 or Var ( X ) is

defined as:
§§ Continuous Case
ż8
Var ( X ) = ( x ´ µ )2 f ( x ) d ( x )
´8
= E( X 2 ) ´ [ E( X )]2
ż8
E( X 2 ) = x2 f ( x )d( x )
´8
Graphically, Var ( X ) is the spread of the values of the random variable around it’s
mean as shown in Figure 5.2.4.
§§ Discrete Case
Var ( X ) = E( x ´ µ)2
= E( X 2 ) ´ [ E( X )]2
ÿ
E( X 2 ) = x2j P( x j )
j
105
Figure 5.2.4.
Example 5.2.8
Let a continuous random variable X has density function
A(1 ´ x2 ) ´1 ă x ă 1,
"
f (x) =
0 elsewhere
Find the expected value & the variance of the distribution of X.

Solution:
The value of A was computed as 3/4 in Example 5.2.2.
ż8
E( X ) = x f ( x )d( x )
´8
ż1
3
= x (1 ´ x 2 ) d ( x )
´1 4
=0
ż8
2
E( X ) = x2 f ( x )d( x )
´8
ż1
3 2
= x (1 ´ x 2 ) d ( x )
´1 4
= 0.2
Var ( X ) = E( X 2 ) ´ [ E( X )]2
Var ( X ) = 0.2 ´ (0)2
= 0.2
106
Example 5.2.8
Example 5.2.9
The probability density function of the weight of packages delivered by a post office is
70
#
1 ď x ď 70,
f (x) = 69x2
0 elsewhere
1. If the cost is $2.50 per pound, what is the mean shipping cost of a package?
2. Find the Variance of the distribution of the shipping cost.
Solution:
Let X be the weight of the package. Shipping cost per pound is $2.50. The total cost can be
defined as Y = 2.5X.
ż8
E( X ) = x f ( x )d( x )
´8
ż 70
70
= x d( x )
1 69x2
= 4.31
The total cost Y = 2.5X. Therefore, the mean shipping cost is
E(Y ) = 2.5 ˆ 4.31

= 10.775
ż8
2
E( X ) = x2 f ( x )d( x )
´8
ż 70
70
= x2 d( x )
1 69x2
= 70
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= 70 ´ (4.31)2
= 51.42
Var (Y ) = Var (2.5X )
= 2.52 Var ( X )
= 321.3994
Example 5.2.9
107
Continuous Distributions 5.3 Piecewise Distributions
5.3 Ĳ Piecewise Distributions

Some distributions are not necessarily continuous, but they are continuous over particular
intervals. These types of distributions are known as Piecewise distributions, (see Figure
5.3.1).
Example 5.3.1
The pd f of a random variable X is given below & shown in Figure 5.3.1.
$
3
& 4 0ďxď1
’
’
’
f (x) = 1
3ďxď4
% 4
’
’
’
0 Otherwise
1. Find the cdf
2. Find E( x )
3. Find Var ( x )
Solution:
We integrate to find the cd f in five disjoint regions:
For a ă 0, we have
ża
F ( a) = f ( x )dx = 0
´8
For 0 ď a ď 1, we have
ża ża
3 3
F ( a) = f ( x )dx = dx = a
4 4
0 0
ż1 ża
3 3
F ( a) = dx + 0dx =
4 4
0 1
ż1 ż3 ża
3 1 3 1
F ( a) = dx + 0dx + dx = + 0 + ( a ´ 3)
4 4 4 4
0 1 3
For 4 ď a, we have
ż1 ż3 ż4 ża
3 1 3 1
F ( a) = dx + 0dx + dx + 0.dx = + =1
4 4 4 4
0 1 3 4
Remember cd f by definition is cumulative probability from lower limit.
108
Thus, the cdf is:

$
’
’ 0, for x ă 0;
3/4x, for 0 ď x ď 1;
’
’
’
’
3/4, for 1 ď x ď 3;
&
F(x) =
’
’ 3/4 + 1/4( x ´ 3), for 3 ď x ď 4;
1, for x ą 4
’
’
’
’
0 elsewhere.
%
The cd f for this distribution is shown in Figure 5.3.2.
ż8
E( X ) = x f ( x )d( x )
´8
1 ż3 ż4
1
ż
= x f ( x )dx + x. f ( x )dx + x.dx
4
0 1 3
ż1 ż3 ż4
3 1
= x dx + x.0dx + x.dx
4 4
0 1 3
= 1.25
ż8
2
E( X ) = x2 f ( x )d( x )
´8
1 ż3 ż4
1 2
ż
= x2 f ( x )dx + x2 . f ( x )dx + x dx
4
0 1 3
ż1 ż3 ż4
23 2 1 2
= x dx + x .0dx + x dx
4 4
0 1 3
= 3.33
2
Var ( X ) = E( X 2 ) ´ E( X )
= 1.77
Example 5.3.1
109
Figure 5.3.1.
●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
0.6
0.4
f(x)
●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
0.2
0.0
●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
● ●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●
0 1 2 3 4 5 6
pd f for Piecewise Distribution
110
Continuous Distributions 5.4 Continuous Uniform Distribution
Figure 5.3.2.
1.0
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●
0.8
●
●
●
●●
●
●
●●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
0.6
●
●
●
●
●
●
●●
●
●
●
●
F(x)
●
●
●
●
●●
●
●
●
●
●
●
●
●
0.4
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
0.2
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
0.0
●
●
0 1 2 3 4 5 6
cd f for Piecewise Distribution in Figure 5.3.1.
5.4 Ĳ Continuous Uniform Distribution
§§ A Case Study
The total time to process a passport application by the state department is between 3 and 7
weeks. The interest might be to find out the expected time for an application processing. If
my passport needs renewal what is the probability that my application will be processed in
5 weeks or less? Let X be the processing time, it is important to note that X is equally likely
to fall anywhere in this interval of 3-7 weeks, i.e., X has a constant density on this interval.
111
Figure 5.4.1.
5.4.1 §§ Probability Density Function
Definition 5.4.1 (Uniform Distribution (pd f )).
The probability density of X „ U ( a, b) in the interval a ď X ď b is defined as

$
& 1
aďxďb
f (x) = bá
% 0 otherwise
Uniform distribution has 2 parameters a & b.
Figure 5.4.2 shows the Uniform density over the interval a & b. The random variable X
uniformly distributed on ( a, b), is equally likely to fall anywhere in this interval.
112
Figure 5.4.2.
Uniform Density f(x)
0.5
0.4
0.3
b−a
1
f(x) =
0.2
0.1
0.0
a b
The distribution is also called a Rectangular Distribution. U (0, 1) is the most commonly
used uniform distribution.
Definition 5.4.2 (Uniform Distribution: (cd f )).
By definition, the cd f F ( x ) is the probability that X is at most x, i.e., F ( x )
if x ď a
$
żx ’
& x´ 0
1 a
F ( x ) = P( X ď x ) = dx = aăxăb
a bá % bá
’
1 if x ě b
Graphically, the Uniform cd f is displayed in Figure 5.4.3.
113
Figure 5.4.3.
Uniform CDF F(x)
1.0
0.8
0.6
F(x)
0.4
0.2
0.0
a b
5.4.3 §§ Uniform Distribution: Expectation & Variance

ż8
E( X ) = x f ( x )d( x )
´8
żb
1
= x d( x )
a bá
a+b
= ;
2
Var ( X ) = E( X 2 ) ´ [ E( X )]2
( b ´ a )2
= .
12
Example 5.4.3
114
Suppose a bus always arrives at a particular stop between 8:00 AM and 8:10 AM. The density
is shown in Figure 5.4.4.
• Find the probability that the bus will arrive tomorrow between 8:00 AM and 8:02 AM?
• What is the expected time of the bus arrival?
• Eighty percent of the time, the waiting time of a customer for the bus must fall below
what value?
• If the bus did not arrive in the 1st 5 minutes, what is the probability that it will arrive
in the last 2 minutes?
Figure 5.4.4.
0.25
0.20
0.15
f(x)
0.10
0.05
0.00
0 10
Solution:
Let the random variable X be the waiting time on minutes scale.
• Probability that the bus will arrive tomorrow between 8:00 AM and 8:02 AM, i.e.,
X ă 2).
ż2
1
P ( x ď 2) = dx
0 (b ´ a)
ż2
1
= dx
0 (10 ´ 0)
= 2/10
115
Continuous Distributions 5.5 Normal Distribution
There is a 20% chance that the bus will arrive tomorrow between 8:00 AM and 8:02
AM. It is also clear that, owing to uniformity in the distribution, the solution can be
found simply by taking the ratio of the length from 0 to 2 to the total length of the
distribution interval.
•
a+b
E( x ) =
2
= 10/2
=5
i.e., Bus is expected to arrive at 8:05 AM.
• We need to find 80th Percentile, i.e.,
P( x ď k ) = 0.80
żk
1
= dx
0 10 ´ 0
Solving for k, we get k = 0.8. Therefore, 80% of the time, the waiting time of a
customer for the bus must fall below 8:08 AM.
• Here, the condition that bus did not arrive in the 1st 5 minutes is given.
P ( X ą 8 X X ą 5)
P( X ą 8|X ą= 5) =
P ( X ą 5)
P ( X ą 8)
=
P ( X ą 5)
2/10
=
5/10
= 2/5
Example 5.4.3
5.5 Ĳ Normal Distribution

§§ A Case Study
Smart phone batteries have on average lifetime of 1 year with 1 month margin of error.
You buy a new phone, what is the chance that your phone battery does not work past 1
month? Or it lasts at least 11 months? Such a random variable is expected to have a central
value around which most of the observations cluster; bell-shaped (approximately normal)
also called a Gaussian distribution named after German Mathematician Karl Gauss. Due
to the significance of his work, his picture & normal pd f along with the normal curve are
displayed on German currency.
116
Figure 5.5.1.
5.5.1 §§ Probability Density Function (pd f )
Definition 5.5.1 (Normal Distribution (pd f )).
X „ N (µ, σ2 ). The probability density function ( pd f ) of X is:
1 ( x´µ)2
´
f ( x, µ, σ) = ? e 2σ2 ; ´8 ă x ă 8,
σ 2π
where µ (the mean) is the location & σ (the standard deviation) is the scale param-
eter. The constants π = 3.141593 & e = 2.71828.
żx ( x´µ)2
1
There is no closed form solution for cd f : ? e´ 2σ2 dx
σ 2π
´8
5.5.2 §§ Effect of Mean & Variance
Figure 5.5.2 shows the effect of the location parameter µ on the pd f & Figure 5.5.3 shows the
impact of the location parameter µ on the cd f of the simulated normal distributions. The
scale parameter σ = 1 is kept constant in the 3 distributions simulated. It can be observed
that change in the value of µ shifts the location of the curves.
117
Figure 5.5.2.
0.4
µ=0; σ=1
µ=1; σ=1
µ=−1; σ=1
0.3
f(x)
0.2
0.1
0.0
−4 −2 0 2 4
Figure 5.5.3.
1.0
µ=0; σ=1
µ=1; σ=1
µ=−1; σ=1
0.8
0.6
F(x)
0.4
0.2
0.0
−4 −2 0 2 4
The impact of the scale parameter σ on the pd f is shown in Figure 5.5.4 & the Figure 5.5.5
shows similar impact of σ on the cd f of the normal distributions. The normal distributions
presented in these figures were simulated with the a constant mean µ = 0, but different
variances. The change in σ scales the distribution.
118
Figure 5.5.4.
0.8
µ=0; σ=1
µ=0; σ=2
µ=0; σ=0.5
0.6
f(x)
0.4
0.2
0.0
−4 −2 0 2 4
Figure 5.5.5.
1.0
µ=0; σ=1
µ=0; σ=2
0.8
µ=0; σ=0.5
0.6
F(x)
0.4
0.2
0.0
−4 −2 0 2 4
5.5.3 §§ Properties of Normal Distribution
119
Figure 5.5.6.
• It is a symmetric, bell shaped distribution with total area under the curve being equal
to 1. This property is useful to solve practical application problems.
• The mean, median & mode are all equal & located at the center of the distribution.
• Maximum occurs at µ, 50% area lies to either side of the mean µ.
• The inflection points are located at µ ´ σ and µ + σ as shown by the red points in the
curve in Figure 5.5.6. (An inflection point is a point on the curve where the sign of the
curvature changes.)
• This curve lies entirely above the horizontal axis, and x-axis is an asymptote in both
horizontal directions
• The area between the curve and the horizontal axis is exactly 1. Note that this is the
area of a region that is infinitely wide, since the curve never actually touches the x-axis.
5.5.4 §§ Standard Normal Distribution
§§ Background
A normal distribution with a mean of µ & standard deviation of σ, i.e., X „ N (µ, σ) has
pd f as:
1 ( x´µ)2
´
f ( x, µ, σ) = ? e 2σ2 ; ´8 ă x ă 8
σ 2π
To find the probability that a normal random variable x lies in the interval from a to b, we
need to find the area under the normal curve between the points a and b (see Figure 5.1.1).
However, there are an infinitely large number of normal distributions-one for each different
mean and standard deviation, (e.g., see Figure 5.5.2). A separate table of areas for each
of these curves is obviously impractical. Instead, we use a standardization procedure that
allows us to use the same table for all normal distributions.
120
Definition 5.5.2 (Standard Normal Distribution).
A normal distribution with a mean of µ = 0 & standard deviation of σ = 1 is called

a Standard Normal Distribution. The standard normal random variable is denoted
as z „ N (0, 1). The pd f of z is:
2
e´z /2
f (z) = ? , ´8 ă z ă 8
2π
§§ Standardizing a Normal Random Variable
All normally distributed variables X can be transformed into the standard normal variable
Z.
x´µ
z= ñ X = µ + σz
σ
A z´value tells how many standard deviations above or below the mean a certain value of
X is.
§§ Z-Table
1. The table for the cumulative distribution of the standard normal variable is shown in
Figure 5.5.7. The entries inside the table give the area under the standard normal
curve for any value of z from 0 Ñ 3.49 or so.
2. The table gives values for non-negative z. For negative values of z, the area can be
obtained from the symmetry property of the curve.
3. Before using the table, remember to convert the normal random variable X to Z as:
x´µ
z=
σ
4. Convert Z back to X as X = µ + σZ
121
Figure 5.5.7.
5.5.5 §§ Finding Probabilities Using Table
Example 5.5.3
In each cases, evaluate the required probabilities.
1. Probability to the left of z value
a. P( Z ď 1.96)?
122
b. P( Z ď ´1.96)?
2. Probability to the right of z value
a. P( Z ě 1.96)?
b. P( Z ě ´1.96)?
3. Probability between 2 z values, i.e., P(´1.96 ď Z ď 1.96)?
4. Probability between 2 x values; X „ N (µ = 10, σ = 4), i.e., P(4 ď X ď 16)?
Solution:
1. a. If z ě 0 and we want P( Z ď z), we just directly look up P( Z ď z) in the table.

For instance,
P( Z ď 1.96) = F (1.96) = 0.9750
P( Z ď 1.96) is the shaded area under the curve in Figure 5.5.8.
b. If z ă 0 and we want P( Z ă ´z), we use 2 properties of the normal curve, i.e.,
total area under the curve is 1, symmetry of the normal curve. For instance
P( Z ď ´1.96) = P( Z ě 1.96),
due to symmetry of the curve.
P( Z ě 1.96) = 1 ´ P( Z ď 1.96),
using the Complement Law of Probability. Using the standard normal distribution
table,
P( Z ď ´1.96) = F (´1.96) = P( Z ě 1.96) = 1 ´ 0.975 = 0.025.
P( Z ď ´1.96) is the shaded area under the curve in Figure 5.5.9.
2. Probability to the right of z value
a.
P( Z ě 1.96) = 1 ´ P( Z ď 1.96),
using the Complement Law of Probability. Using the table P( Z ě 1.96) = 1 ´
0.975 = 0.025. P( Z ě 1.96) is the shaded area under the curve in Figure 5.5.10.
b.
P( Z ě ´1.96) = P( Z ď 1.96),
using the symmetry property. Using the table P( Z ě ´1.96) = 0.975 that is the
shaded area under the curve in Figure 5.5.11.
3. Probability between 2 z values, i.e.,
P(´1.96 ď Z ď 1.96) = F (1.96) ´ F (´1.96) = 0.975 ´ 0.025 = 0.950
is the shaded area under the curve in Figure 5.5.12.
123
4. Probability between 2 x values, i.e., P(4 ď X ď 16) if X „ N (µ = 10, σ = 4)
• Convert x into z as

4 ´ 10 16 ´ 10
P ďXď = P(´1.5 ď Z ď 1.5)
4 4
P(´1.5 ď Z ď 1.5) = F (1.5) ´ F (´1.5)

= 0.9332 ´ (1 ´ 0.9332)
= 0.8664
• 6 P(4 ď X ď 16) = 0.8664
Example 5.5.3
Figure 5.5.8.
0.4
0.3
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
P( Z ď 1.96)
124
Figure 5.5.9.
0.4
0.3
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
P( Z ď ´1.96)
Figure 5.5.10.
0.4
0.3
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
P( Z ě 1.96)
125
Figure 5.5.11.
0.4
0.3
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
P(z ě ´1.96).
Figure 5.5.12.
0.4
0.3
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
P(´1.96 ď z ď 1.96).
Example 5.5.4
The achievement scores for a college entrance examination are normally distributed with
mean 75 and standard deviation 10. What % of the students will score:
1. above 90?
2. below 70?
126
3. between 80 and 90?

Solution:
Let X be the achievement score with µ = 75 & σ = 10. Convert x into z & use the Normal
Distribution Table to find out the required probabilities.
1. above 90?

X´µ 90 ´ 75
P( X ą 90) = P ą
σ 10
= P( Z ą 1.5)
= 1 ´ 0.9332
= 0.0668
2. below 70?

X´µ 70 ´ 75
P( X ă 70) = P ă
σ 10
= P( Z ă ´0.5)
= 0.3085
3. between 80 and 90?

80 ´ 75 X´µ 90 ´ 75
P(80 ă X ă 90) = P ă ă
10 σ 10
= P(0.5 ă Z ă 1.5)
= F (1.5) ´ F (0.5)
= 0.9332 ´ 0.6914
= 0.2417
Example 5.5.4
5.5.6 §§ Finding Probabilities & Percentiles

In the previous problems, we knew the x´value and we wanted to find the probability
P( X ď x ) or P( X ě x ). What if you know the probability or percentile (e.g., ‘top quarter’ or
‘bottom tenth’ or ‘middle 50%’), but you don’t know the cut-off x´value that will give you
this probability? We will call this situation a ‘backward’ Normal problem because you solve
for the x´value in a procedure that is backward from what you did in the previous types of
problems in this chapter.
Example 5.5.5
Manufactured items have a strength that has a normal distribution with a standard deviation
of 4.2. The mean strength can be altered by the operator. At what value should the mean
strength be set so that exactly 95% of the items have a strength less than 100? For a random
127
sample of ten items, what is the probability that exactly two will have strength more than
100?
Solution:
Let X be the strength with σ = 4.2.
P( X ă 100) = 0.95

X´µ 100 ´ µ
P ă = 0.95
σ 4.2
Now we need to find the z value corresponding to the probability of 0.95, i.e., 95th percentile.
Looking inside the Table we see that z value is 1.645, i.e., P( Z ď 1.645) = 0.95
X´µ
z=
σ
100 ´ µ
1.645 =
4.2
µ = 100 ´ 1.645 ˆ 4.2
= 93.091
Therefore the mean strength µ should be set at 93.091.

Let Y be the number of items in a random sample of size 10 that will have strength less
than 100. To find the probability that exactly two will have strength more than 100 in a
random sample of 10 items, we need to use the Binomial distribution. We are given the
probability of having strength less than 100 is 0.95, so P( X ą 100) = 1 ´ 0.95 = 0.05 =
p; n = 10. The items are independent in the strength.

10
P (Y = 2 ) = 0.052 (1 ´ 0.05)10´2
2
= 0.075
Example 5.5.5
5.5.7 §§ Empirical Rule

§§ The 68-95-99.7 Rule for the Normal Curve
It is helpful to know the probability that X is within one or two or three standard deviations
of its expected value, µ. The Empirical Rule also called as 68-95-99.7 Rule for the Normal
Curve is shown in Figure 5.5.13. Basically the rule states that
1. approximately 68% of observations fall within 1 standard deviation of the mean, i.e.,
µ ˘ σ. The probability that X is within one standard deviation of its mean µ is 0.68
128
2. approximately 95% of observations fall within 2 standard deviation of the mean, i.e.,
µ ˘ 2σ. The probability that X is within two standard deviation of its mean µ is 0.95
3. approximately 99.7% of observations fall within 3 standard deviation of the mean, i.e.,
µ ˘ 3σ. The probability that X is within three standard deviation of its mean µ is
0.997
Figure 5.5.13.
Example 5.5.6
What’s a normal pulse rate? That depends on a variety of factors. Pulse rates between 60
and 100 beats per minute are considered normal for children over 10 and adults. Suppose that
these pulse rates are approximately normally distributed with a mean of 72 and a standard
deviation of 12.
1. What proportion of adults will have pulse rates between 60 and 84?
2. 16% of the adults have pulse rate below what value?
3. 2.5% of the adults will have their pulse rates exceeding x. Find x?
Solution:
Let X be the pulse rate that has Normal Distribution with µ = 72 & σ = 12. Convert x into
z & use the Normal Distribution Table to find out the required probabilities.
1. P(60 ă X ă 84) = P(72 ´ 12 ă X ă 72 + 12) = P(µ ´ σ ă X ă µ + σ) is 68% due

to empirical rule. Therefore 68% of children over 10 & adults have pulse rates between
60 & 84.
2. P( X ă x ) = 0.16, i.e., we need to find out the 16th Percentile of X. As 68% of

observations fall between µ ˘ σ, means that 32% fall outside µ ˘ σ or 16% fall below
µ ´ σ = 72 ´ 12 = 60. Therefore 16% of children over 10 & adults have pulse rates
below 60.
129
3. P( X ą x ) = 0.025, i.e., we need to find out the 97.5th Percentile of X. As 95% of

observations fall between µ ˘ 2σ, this means that 5% fall outside µ ˘ 2σ or 2.5% fall
above µ + 2σ = 72 + 2(12) = 96. Therefore 2.5% of children over 10 & adults have
pulse rates above 96.
Example 5.5.6
5.5.8 §§ Normal Distribution: Moment Generating Function
Moment generating functions will be an exciting new tool to make probabilistic computations
very efficient.
130
Definition 5.5.7 (Moment Generating Function Mg f ).
We call Mx (t) the moment generating function because all of the moments of X can
be obtained by successively differentiating this function Mx (t) and then evaluating
the result at t = 0.
Mx (t) = E(etx )
ż8 ( x´µ)2
tx ?1 ´
= e e 2σ2 dx
σ 2π
´8
x´µ
Let z = ñ X = µ + σz
σ
ż8
1 1 2
Mx (t) = etσz+tµ ? e´ 2 z σdz
σ 2π
´8
ż8
tµ 1 1 2
=e ? etσz´ 2 z dz
2π
´8
tµ+ 12 σ2 t2
=e

B 1 2 2
Mx (t) = (µ + σ t)e 2
2 tµ + σ t
Bt t =0 t =0
=µ
= E( X )

B2

Mx (t) 2 tµ+ 12 σ2 t2
=σ e 2 2 tµ+ 21 σ2 t2
+ (µ + σ t) e
Bt 2
t =0 t =0
= σ 2 + µ2
= E( X 2 )
5.5.9 §§ Sums of Independent Normal Random Variables

§§ Background
What is the total amount of food eaten by all individuals in PDC at LUMS on a given day?
In a given month? In a given year? How can we scale our results to compare the average,
variance, and standard deviation of these food totals to have an estimate of the total amount
of food required to cater for the needs of individuals?
1. If X1 , X2 , . . . , Xn are independent Normal random variables, with means µ1 , µ2 , . . . , µn ,
131
n
ÿ
respectively, and variances σ12 , σ22 , . . . , σn2 , respectively, then X i = X1 + X2 + ¨ ¨ ¨ +
i =1
n
ÿ n
ÿ
Xn is normal with mean µi & variance σi2 , i.e.,
i =1 i =1
n n
µi )
ř ř
n n n X i ´ (
i =1 i =1
ÿ ÿ ÿ
2
Xi „ N ( µ i , σi ) ñ Z = d „ N (0, 1)
n
i =1 i =1 i =1
σi2
ř
i =1
2. If X1 , X2 , . . . , Xn are independent Normal random variables, with each having common

ÿn
mean µ, and common variance σ2 , then Xi = X1 + X2 + ¨ ¨ ¨ + Xn is normal with
i =1
n
ÿ n
ÿ
mean µi = nµ & variance σi2 = nσ2 , i.e.,
i =1 i =1
n
ř
n Xi ´ nµ
i =1
ÿ
2
Xi „ N (nµ, nσ ) ñ Z = ? „ N (0, 1)
i =1 nσ2
Example 5.5.8
The weight of each of the eight individuals is approximately normally distributed with a mean
equal to 150 pounds and a standard deviation of 35 pounds each. What is the probability
that the total weight of eight people who occupy an elevator exceeds 1300 pounds?
Solution:
Let X be the weight of a single individual that is Normal with µ = 150 & σ = 35. As individ-
8
ÿ
ual weights are independent & Normal, so are the total weights Xi „ N (8(150), 8(35)2 ).
i =1
8
ÿ
Convert Xi into z & use the Normal Distribution Table to find out the required probabil-
i =1
ities.
n
ÿ
ř
8 Xi ´ nµ
i =1 1300 ´ 8(150)
P Xi ą 1300 =P ? ą a
i =1 nσ2 8(35)2
= P( Z ą 1.01)
= 1 ´ 0.8438
= 0.1562
There is a 15.62% chance that the total weight of eight individuals in the elevator would
exceed 1300 pounds
Example 5.5.8
132
Continuous Distributions 5.6 Exponential Distribution
5.6 Ĳ Exponential Distribution

Waiting is painful. What is the expected time until an air conditioning system fails as shown
in Figure 5.6.1? When a mother is waiting for her three children to call her, what is the
probability that the first call will arrive within the next 5 minutes?
§§ Background
Figure 5.6.1.
How to model the time to events X, e.g.,
1. Customer service: time on hold on a help line
2. Medicine: remaining years of life for a cancer patient
3. Ecology: dispersal distance of a seed
4. Seismology: Time elapsed before an earthquake occurs in a given region
In such cases, let X be the time between successive occurrences. Clearly, X is a continuous
random variable whose range consists of the non-negative real numbers.
• It is expected that most calls, times or distances will be short and a few will be long.
So the density should be large near x = 0 and decreasing as x increases.
5.6.1 §§ Link between Poisson & Exponential Distribution

§§ Poisson Process
The number of events that occur in a window of time or region in space
133
1. Events occur randomly, but with a long-term average rate of λ per unit time. e.g.,
λ = 10 per hour or λ = 24 ˆ 10 per day
2. The events are rare enough that in a very short time interval, there is a negligible
chance of more than one event.
3. Poisson distribution provides an appropriate description of the number of events per

time interval
4. Exponential distribution provides a description of the length of time between events
5. The important point is we know the average time between events but they are randomly
spaced (stochastic). Let X be the wait time until the first call at a Customer Centre
from any start point in this setting.
6. If the wait time for a call is at least t minutes, then how many calls occurred in the
first t minutes?
Definition 5.6.1 (Exponential Random Variable).
• Let X be the time elapsed after a Poisson event.
• Let Y be the number of events in a time interval [0, t ), i.e., Y „ Poisson(λt)
P( X ą t) = P(no event occurred in the time interval t)

= P (Y = 0 ) ;
e´λt (λt)0
=
0!
´λt
=e
P( X ď t) = 1 ´ P( X ą t)
= 1 ´ e´λt
The time gap between successive events from a Poisson process (with mean number
of events λ ą 0 per unit interval) is an exponential random variable with rate
parameter λ.
134
5.6.2 §§ Exponential Distribution: (cd f )
Definition 5.6.2 (Exponential Distribution: (cd f )).
The cd f is shown in Figure 5.6.2.
żx
F(x) = λe´λx dx
0
$
&
= 1 ´ e´λx , if x ě 0,
0, otherwise.
%
Figure 5.6.2.
Exponential cdf with λ=1 30

1.0
0.8
0.6
F(t)
0.4
0.2
0.0
0 50 100 150 200 250 300
135
5.6.3 §§ Exponential Distribution: (pd f )
Definition 5.6.3 (Exponential Distribution: (pd f )).
A random variable X „ Exp(λ), if its density is given by
d
f (x) = F(X )
dx
$
&
= λe´λx , if x ě 0,
0, otherwise.
%
Exponential distribution has only 1 parameter λ.
The pd f is shown in Figure 5.6.3.
Figure 5.6.3.
Exponential Density with λ=1 30

0.030
0.025
0.020
f(t)
0.015
0.010
0.005
0.000
0 50 100 150 200
Some examples of the densities pd f s and cd f s for Exponential random variables with
various values of λ are given in Figure 5.6.4 & Figure 5.6.5 respectively.
136
Figure 5.6.4.
λ = 0.05
λ=1
λ=2
λ=4
f(t)
Time
Figure 5.6.5.
F(t)
λ = 0.05
λ=1
λ=2
λ=4
Time
These examples show that no matter what the λ parameter is, the density starts at λ
when x = 0 and then quickly moves closer to 0 as x Ñ 8. The cd f starts at 0 but quickly
climbs close to 1 as x Ñ 8. The pd f and cd f curves are steeper for larger values of λ.
5.6.4 §§ Exponential Distribution: Expectation & Variance

• E( X ) = 1/λ is the expected time between successive occurrences
137
• Var ( X ) = 1/λ2
That is, for exponential distribution, mean & standard deviation are equal.
Example 5.6.4 (Arrival Time of Factory Workers)

The arrival times of workers at a factory first-aid room satisfy a Poisson process with an
average of 1.8 per hour.
1. What is the expectation of the time between two arrivals at the first-aid room?
2. What is the probability that there is at least 1 hour between two arrivals at the first-aid
room?
3. What is the distribution of the number of workers visiting the first-aid room during a
4-hour period?
4. What is the probability that at least four workers visit the first-aid room during a
4-hour period?
Solution:
Let X be the time between 2 arrivals, then X „ exp(λ), with λ = 1.8
1. E( X ) = 1/1.8 = 0.5556.
2. P( X ě 1) = 1 ´ P( X ă 1) = 1 ´ (1 ´ e´1.81 ) = e´1.8 = 0.1653
3. Number of workers Y visiting the first-aid room during a 4-hour period is Poisson with
parameter λt = 1.8 ˆ 4 = 7.2
4.
P (Y ě 4 ) = 1 ´ P (Y ă 4 )
= 1 ´ P (Y = 0 ) ´ P (Y = 1 ) ´ P (Y = 2 ) ´ P (Y = 3 )
e´7.2 ˆ 7.20 e´7.2 ˆ 7.21 e´7.2 ˆ 7.22 e´7.2 ˆ 7.23
= 1´ ´ ´ ´
0! 1! 2! 3!
= 0.9281
Example 5.6.4
138
Definition 5.6.5 (Exponential Distribution: Memoryless Property).
P( T ą t1 + t2 |T ą t1 ) = P( T ą t2 ); for all t1 , t2 ě 0
• From the point of view of waiting time, the memoryless property means that
it does not matter how long you have waited so far. If you have waited for
at least t1 time, the distribution of waiting time (from time t1 ) until t2 is the
same as when you started at time zero.
• The likelihood of an event is completely independent of the past history (mem-

oryless). Both the exponential density and the geometric distribution share
this ’memoryless’ property
A memoryless wait for a bus would mean that the probability that a bus arrived in
the next minute is the same whether you just got to the station or if you’ve been
sitting there for twenty minutes already.
Example 5.6.6 (Arrival Time of Factory Workers Cont’d)

What is the probability that there is at least 1 hour between two arrivals at the first-aid room,
if no worker arrived in the 1st half an hour?
Solution:
Here we need conditional probability which in the current scenario can be computed by using
the Memoryless Property
P( X ą 1|X ą 0.5) = P( X ą 0.5)

= 1 ´ P( X ă 0.5)
= 1 ´ (1 ´ e´1.80.5 )
= e´0.9
= 0.4066
Example 5.6.6
Summary: Relationship between Poisson & Exponential

Poisson Distribution Exponential Distribution
X: Number of successes in X: time between successive
unit time successes
discrete Random Variable continuous Random Variable
λ: mean number of β = 1/λ = expected time between
successes in unit time successive successes
139
Continuous Distributions 5.7 Home Work
5.7 Ĳ Home Work

1. For the cdf given below
0 x ď ´1,
$
’
x + 1
’
’
´1 ă x ď 1
’
’
& 4
’
’
’
F(x) = x
’
’ 1ăxď2
’
’
’
’ 2
’
’
%
1 xą2
(a) Find the pdf

(b) Find E( x )
(c) Find Var ( x )
2. Ninety identical electrical circuits are monitored at an extreme temperature to see how
long they last before failing. The 50th failure occurs after 263 minutes. If the failure
times are modeled with an exponential distribution,
a. when would you predict that the 80th failure will occur?
b. At what time will only 5% of the circuits fail?
3. In a study of the bone disease osteoporosis, heights of 351 elderly women were measured.
Suppose that their heights follow a normal distribution with µ = 160cm, but unknown
σ. Suppose that 2.27% of those women are taller than 170 cm, what is the standard
deviation? For a random sample of ten women, what is the probability that exactly
two will be shorter than 155cm?
4. A soft-drink machine can be regulated so that it discharges an average of µ ounces per

cup. If the ounces of fill are normally distributed with standard deviation 0.3 ounce,
give the setting for µ so that 8-ounce cups will overflow only 1% of the time.
5. The operator of a pumping station has observed that demand for water during early
afternoon hours has an approximately exponential distribution with mean 100 cfs (cubic
feet per second). Find the probability that the
a. demand will exceed 200 cfs during the early afternoon on a randomly selected day.
b. demand will exceed 200 cfs on a given day, given that previous demand was at
least 150 cfs?
What water-pumping capacity should the station maintain during early afternoons so
that the probability that demand will exceed capacity on a randomly selected day is
only .01?
6. Five students are waiting to talk to the TA when office hours begin. The TA talks
to the students one at a time, starting with the first student and ending with the
fifth student, with no breaks between students. Suppose that the time taken by the
140
TA to talk to a student has a normal distribution with a mean of 8 minutes and a

standard deviation of 2 minutes, and suppose that the times taken by the students are
independent of each other. What is the probability that the total time taken by the
TA to talk to all five students is longer than 45 minutes?
§§ Answers
$
1
& 4 ´1 ă x ď 1
’
’
’
1. f ( x ) = 1
1ăxď2
% 2
’
’
’
0 otherwise
3/4; 37/48
2. 732.4; 16.6537
3. σ = 5; 0.2844
4. µ = 6.953 ounce
5. 0.1353; 0.6065; 460.52 cfs
6. 0.132
141
142
Chapter 6
Limit Theorems
AS YOU READ . . .
1. What is the law of large numbers?

2. What is Chebyshev’s Inequality?
3. What are limit theorems?
4. What are the conditions under which the sum of large number of variables is approxi-
mately normally distributed?
6.1 Ĳ Limit Theorems

This is a very important result in Probability Theory with several different variations.
• Laws of Large Numbers: For large n, the average of a large number of iid1 random
variables converges to the expected value.
• Central Limit Theorems: Determining conditions under which the sum of a large num-
ber of random variables has an approximately normal distribution.
6.1.1 §§ Chebyshev Inequality

§§ Background
If we know the probability distribution of a random variable X (either the pd f in the con-
tinuous case or the pm f in the discrete case), we may then compute E( X ) and Var ( X ), if
these exist. However, for unknown Probability Distribution of X, we cannot compute quan-
tities such as P(|X ´ E( X )| ď C ). Chebyshev Inequality is used to find an approximation of
the distribution in this scenario, when both E( X ) and Var ( X ) are unknown. It is a way of
1 independent & identically distributed
143
Limit Theorems 6.1 Limit Theorems
quantifying the fact that a random variable is ‘relatively close’ to its expected value ‘most
of the time’. It gives bounds that quantify both ‘how close’ and ‘how much of the time’ the
random variable is to its expected value.
Definition 6.1.1 (Chebyshev’s Inequality).
§§ Bounding Probabilities using Expectations & Variance

Let X be a random variable with mean µ and standard deviation σ. Then, for any
k ě 1,
1
P(|X ´ µ| ď kσ ) ě 1 ´ 2 ;
k
1
That is at least 1 ´ 2 of observations fall within µ ˘ kσ.
k
Compare this inequality with Empirical Rule that also gives bound for probabilities for
k = 1, 2, 3. However, the difference between the entities is that Chebyshev’s Inequality is
applicable when distribution is not known. Figure 6.1.1 shows 1 ´ 1/k2 as the shaded area
that falls between µ ˘ kσ.
144
Limit Theorems 6.1 Limit Theorems
Figure 6.1.1.
Relative Frequency
1
1−
k2
µ− kσ µ+ kσ
Chebyshev Inequality
1
• For k = 2, approximately 1 ´ 2 = 0.75 of observations fall within k = 2 standard
2
deviation of the mean
1
• For k = 3, approximately 1 ´ 2 = 0.89 of observations fall within k = 3 standard
3
deviation of the mean
Example 6.1.2
1. The number of customers per day (Y) at a sales counter, has been observed for a long
period of time and found to have mean 20 and standard deviation 2. The probability
distribution of Y is not known. What can be said about the probability that, tomorrow
Y will be greater than 16 but less than 24?
145
Limit Theorems 6.2 Central Limit Theorem (CLT)
2. A mail-order computer business has six telephone lines. Let X denote the number of
lines in use at a specified time. Compute µ & σ for the distribution below. Using
k = 2, 3, what does Chebyshev Inequality suggest about the upper bound relative to
the corresponding probability? Interpret.
x 0 1 2 3 4 5 6
p(x) .10 .15 .20 .25 .20 .06 .04
Solution:
1. µ = 20; σ = 2
P(16 ă Y ă 24) = P(20 ´ 2 ˆ 2 ă Y ă 20 + 2 ˆ 2)

= P(µ ´ 2σ ă Y ă µ + 2σ)
6 k = 2. This suggests that 3/4 of the observations fall between 16 & 24.
2. µ = 2.64; σ = 1.53961
a. For k = 2, approximately 75% of observations fall within -0.43922 & 5.71922

b. For k = 3, approximately 89% of observations fall within -1.97883 & 7.25883
Example 6.1.2
6.2 Ĳ Central Limit Theorem (CLT)

§§ Background
How to model the chance behavior for:
1. the electricity consumption in a city at any given time that is the sum of the demands
of a large number of individual consumers
2. the quantity of water in a reservoir may be thought of as representing the sum of a

very large number of individual contributions.
3. the error of measurement in a physical experiment is composed of many unobservable

small errors which may be considered additive.
In these examples, the interest is to model the sum of either demands or quantity of water
as a sum of individual contributions or the measurement error as the sum of unobservable
small errors. What will be the distribution of the sum in these examples?
146
6.2.1 §§ Sample Total (CLT)

Definition 6.2.1 (Central Limit Theorem (CLT): Sample Total).
Let X be a random variable, with finite mean µ and finite variance σ2 . Suppose you
repeatedly draw independent samples of size n from the distribution of X. Then as
ÿn
n ÝÑ 8, the distribution of the sample total Xi = X1 + X2 + ¨ ¨ ¨ + Xn becomes
i =1
approximately normal, i.e.,
n
ř
n Xi ´ nµ
i =1
ÿ
Xi « N (nµ, nσ2 ), while z = ? « N (0, 1)
i =1 nσ2
?
nσ2 is called the standard error of the total.
In other words
 
n
ř
 Xi ´ nµ  ż a e´z2 /2
 i=1? 
lim P  ď a = ? dz
nÑ8  nσ 2  ´8 2π
= F ( a)
Example 6.2.2
1. When a batch of a certain chemical product is prepared, the amount of a particular

impurity in the batch is a random variable with mean value 4.0 g and standard deviation
1.5 g. If 50 batches are independently prepared, what is the (approximate) probability
that the total amount of impurity is between 175 and 190 g?
2. Consider the volumes of soda remaining in 100 cans of soda that are nearly empty. Let
X1 , . . . , X100 , denote the volumes (in ounces) of cans one through one hundred, respec-
tively. Suppose that the volumes X j are independent, and that each X j is Uniformly
distributed between 0 and 2. Find the probability that the 100 cans of soda contain
less than 90 ounces of soda in total.
Solution:
1. Let X be the amount of a particular impurity. X „ N (µ = 4; σ = 1.5); n = 50
ř
175 ´ 50(4) X ´ nµ 190 ´ 50(4)
P(175 ď T ď 190) = P a ă ? ă a
50(1.52 ) nσ2 50(1.52 )
= F (´0.94) ´ F (´2.36)
= 0.1645
147
In other words, there is 16.45% chance that the total amount of impurity in 50 batches
is between 175 & 190g.
2. The volumes X j „ U (0, 2); 6 µ = (0 + 2)/2 = 1; σ2 = (2 ´ 0)2 /12 = 1/3; n = 100

ř
X ´ nµ 90 ´ 100(1)
P( T ď 90) = P ? ăa
nσ2 100(1/3)
= P( Z ď ´1.73)
= 1 ´ 0.9582
= 0.0418
In other words, the probability that the total volume of soda in 100 cans of soda is less
than 90 ounces is approximately 4.18%.
Example 6.2.2
6.2.2 §§ Sample Mean (CLT)

Figure 6.2.1 shows the main idea of Central Limit Theorem using a hypothetical example.
Figure 6.2.1.
6.2.3 §§ Simulations
§§§ Simulation Study 1: Normal Population
Figure 6.2.2 displays the distribution of 10,000 data points simulated from a N (µ = 60; σ =
1).
148
Figure 6.2.2.
Population of Heights
400
300
Frequency
200
100
µ = 60
0
56 58 60 62 64
Heights
Parent Distribution: Normal
A random sample of size n = 30 was drawn from the population data simulated &
sample mean x̄ was computed. This procedure was repeated 100,000 times. The distribution
of 100,000 sample means x̄ is displayed in Figure 6.2.3.
149
Figure 6.2.3.
Histogram of x
20000
15000
Frequency
10000
5000
µ = 60
0
58 60 62 64
Distribution of the Sample Means x̄
Both the distributions are centered at µ = 60. But pay attention to the variability of the
2 distributions, distribution of 100,000 sample means is much narrower, showing much less
variability.
§§§ Simulation Study 2: Uniform Population
Figure 6.2.4 displays the 10,000 data points simulated from a U (0, 1).
150
Figure 6.2.4.
Population Distribution
120
100
80
Frequency
60
40
20
0
0.0 0.2 0.4 0.6 0.8 1.0
Parent Distribution U(0,1)
A random sample of size n = 30 was drawn from the population data simulated &
sample mean x̄ was computed. This procedure was repeated 100,000 times. The distribution
of 100,000 sample means x̄ is displayed in Figure 6.2.5.
151
Figure 6.2.5.
Histogram of x
15000
10000
Frequency
5000
0
0.0 0.2 0.4 0.6 0.8 1.0
Distribution of the Sample Means x̄
Both the distributions are centered at µ = 0.5. This shows that regardless of the parent
population, the distribution of the sample means is approximately normal with a mean of
µ = 0.5. Again the same phenomenon is observed in the variability of the 2 distributions,
distribution of 100,000 sample means is much narrower, showing much less variability.
Definition 6.2.3 (Central Limit Theorem (CLT): Sample Mean X̄).
The Central Limit Theorem basically says that for non-normal data, the distribution
of the sample means has an approximate normal distribution, no matter what the
distribution of the original data looks like, as long as the sample size is large enough
(usually at least 30) and all samples have the same size.
2
x´µ
X „ N (µ, σ2 ) Z= „ N (0, 1)
σ
σ2 x̄ ´ µ
X̄ „ N (µ, ) Z= ? „ N (0, 1)
n σ/ n
?
σ/ n is called the standard error of the mean.
152
Example 6.2.4
A coffee dispensing machine is supposed to dispense a mean of 7.00 fluid ounces of coffee
per cup with standard deviation 0.25 fluid ounces. The distribution approximates a normal
distribution. What is the probability that, when 12 cups are dispensed, their mean volume
is more than 7.15 fluid ounces?
Solution:
Let X be the amount of coffee dispensed, X „ N (µ = 7; σ = 0.25); n = 12

X̄ ´ µ 7.15 ´ 7
P( x̄ ą 7.15) = P ? ą ?
σ/ n 0.25/ 12
= P( Z ą 2.08)
= 1 ´ P( Z ă 2.08)
= 1 ´ 0.9811665
= 0.01883
In other words, there is 1.88% chance that the average amount of coffee dispensed exceeds
7.15 ounces.
Example 6.2.4
Example 6.2.5
The fracture strength of tempered glass averages 14 (measured in thousands of pounds per
square inch) and has standard deviation 2. What is the probability that the average fracture
strength of 100 randomly selected pieces of this glass exceeds 14.5? Find an interval that
includes, with probability 0.95, the average fracture strength of 100 randomly selected pieces
of this glass.
Solution:

X̄ ´ µ 14.5 ´ 14
P( x̄ ą 14.5) = P ? ą ?
σ/ n 2/ 100
= P( Z ą 2.5)
= 1 ´ P( Z ă 2.5)
= 1 ´ 0.9938
= 0.0062
There is 0.6% chance that the average fracture strength of 100 randomly selected pieces
of this glass exceeds 14.5.
The central 95% means the area of 5% is divided equally in the 2 tails of the normal
curve. Therefore, P( Z ď z2 ) = 0.95 + 0.05/2 = 0.975 gives the cd f corresponding to z2 .
2 Statistics For Dummies, 2nd Edition, Deborah J. Rumsey
153
Looking inside the Normal Distribution table, we find the corresponding z-value as 1.96.
P(z1 ă Z ă z2 ) = 0.95
P(´1.96 ă Z ă 1.96) = 0.95
X̄ ´ µ
Z= ?
σ/ n
X̄2 ´ 14
1.96 = ?
2/ 100
X̄2 = 1.96 ˆ 1/5 + 14
= 14.392
X̄1 = ´1.96 ˆ 1/5 + 14
= 13.608
P(13.608 ă x̄ ă 14.392) = 0.95
Example 6.2.5
6.2.4 §§ Normal Approximation to Binomial
If X „ bin(n, p), then E( X ) = np & Var ( X ) = np(1 ´ p)

If the binomial probability histogram is not too skewed, then as n Ñ 8,
X « N (np, np(1 ´ p));
X ´ np
while Z = a « N (0, 1),
np(1 ´ p)
i.e., the binomial distribution approaches to normal for large n. This phenomenon is shown
in simualted data distribution in Figure 6.2.6.
154
Figure 6.2.6.
n=40, p=0.2
0.15
0.10
0.05
0.00
0 3 6 9 12 16 20 24 28 32 36 40
Binomial Distribution for n = 40 & p = 0.2
1. A general rule: approximation is reasonable as long as both np ě 5 and n(1 ´ p) ě 5

2. The normal approximation accuracy improves as n Ñ 8.
3. Caveat: Suppose is X = 12, then P( X = 12) ą 0 for a discrete random variable, but
when approximating it with normal, P( X = 12) = 0. Therefore, some care must be
taken with the endpoints of the intervals involved.
To use the normal distribution to approximate the probability of obtaining exactly 12 (i.e.,
P( X = 12)), we would find the area under the normal curve from X = 11.5 to X = 12.5, the
lower and upper boundaries of 12, (see Figure 6.2.7). The small correction of 0.5 is used to
allow for the fact of using normal distribution to approximate binomial probabilities.
155
Figure 6.2.7.
0.20
0.15
0.10
0.05
0.00
0 2 4 6 8 10 13 16 19 22 25
Continuity Correction
§§ Continuity Correction
To find the binomial probabilities in the left hand column of the expressions, calculate the
approximation using normal distribution as shown in the right hand side of expressions given
below .
Binomial « Normal
1 1
P( X = k) « P(k ´ ď X ď k + )
2 2
1 1
P( a ď X ď b) « P( a ´ ď X ď b + )
2 2
1
P( X ą k) « P( X ą k + )
2
1
P( X ě k) « P( X ą k ´ )
2
1
P( X ă k) « P( X ă k ´ )
2
1
P( X ď k) « P( X ă k + )
2
156
Limit Theorems 6.3 Home Work
Example 6.2.6
At a certain local restaurant, students are known to prefer Japanese pan noodles 40% of the
time. Consider 2000 randomly chosen students, what is the probability that at most 840 of
the students eat Japanese pan noodles there?
Solution:
p = 0.4; n = 2000; np = 2000(0.4) = 800; n(1 ´ p) = 2000(1 ´ 0.4) = 1200; np(1 ´ p) =
480. As both np & n(1 ´ p) ă 5, so normal approximation to binomial is appropriate here.

P( X ď 840) = P X ď 840 + 0.5
840.5 ´ 1200
= P( Z ď ? )
480
= P( Z ď 1.85)
= 0.9677
Probability that at most 840 students eat Japanese pan noodles is 0.9677.
Example 6.2.6
6.3 Ĳ Home Work

1. Ali is pursuing a major in computer science. He notices that a memory chip containing
212 = 4096 bits is full of data that seems to have been generated, bit-by-bit, at random,
with 0’s and 1’s equally likely, and the bits are stored independently. If each bit is
equally likely to be a 0 or 1, estimate the probability that there are actually 2140 or
more 1’s stored in the memory chip?
2. The service times for customers coming through a checkout counter in a retail store are
independent random variables with mean 1.5 minutes and variance 1.0. Approximate
the probability that 100 customers can be served in less than 2 hours of total service
time.
3. The quality of computer disks is measured by the number of missing pulses. Brand
X is such that 80% of the disks have no missing pulses. If 100 disks of brand X are
inspected, what is the probability that 15 or more contain missing pulses?
4. Consider the lengths of calls handled by Zahir in a call center. The calls are indepen-
dent Exponential random variables, and each call lasts, on average, 1/3 of an hour.
On a particular day, Zahir records the lengths of 24 consecutive calls. What is the
approximate probability that the average of these 24 calls exceeds 1/4 of an hour?
5. At an auction, exactly 282 people place requests for an item. The bids are placed
‘blindly,’ which means that they are placed independently, without knowledge of the
actions of any other bidders. Assume that each bid (measured in dollars) is a continuous
random variable with a mean of $14.9 and a standard deviation of $2.54. Find the
probability that the sum of all the bids exceeds $4150.
157
Limit Theorems 6.3 Home Work
6. A machine is shut down for repairs if a random sample of 100 items selected from the
daily output of the machine reveals at least 15% defectives. (Assume that the daily
output is a large number of items.) If on a given day the machine is producing only
10% defective items, what is the probability that it will be shut down?
§§ Answers
1. 0.0021
2. 0.0013
3. 0.9162
4. 0.8888
5. 0.8869
6. 0.0668
158
Chapter 7
Joint Distributions
AS YOU READ . . .
1. What is a Joint Distribution?
2. How do you model the joint chance behavior of more than one random variable?
3. What are marginal distributions?
4. What is convolution? How is it useful to find the distribution of sums of independent

random variables?
7.1 Ĳ Bivariate Distributions

§§ Real Life Examples
In science and in real life, we are often interested in two (or more) random variables at
the same time, e.g., the interest might be in the joint distribution of the values of various
physiological variables in medical studies
1. Stress Index & Blood Pressure
2. Heights of parents & of offsprings
3. Frequency of exercise and the rate of heart disease in adults
4. Level of air pollution and rate of respiratory illness in Lahore
5. Census: Studying several variables, such as income, age, and gender, etc., provide
detailed information on the society where the census is performed.
In general, if X and Y are two random variables, a joint probability distribution defines their
simultaneous chance behavior.
159
Joint Distributions 7.2 Joint Distributions: Discrete case
Example 7.1.1 (Rolling of 2 Dice)
1. Let X = the outcome on the 1st die D1 = t1, 2, 3, 4, 5, 6u

2. Let Y = the outcome on the 2nd die D2 = t1, 2, 3, 4, 5, 6u
What is the probability that X takes on a particular value x, and Y takes on a particular
value y? i.e., what is P( X = x, Y = y)?
The entries in the cells of the Table 7.1 show the joint probabilities associated with X & Y.
x/y 1 2 3 4 5 6 PX ( x )
1 1/36 1/36 1/36 1/36 1/36 1/36 1/6
2 1/36 1/36 1/36 1/36 1/36 1/36 1/6
3 1/36 1/36 1/36 1/36 1/36 1/36 1/6
4 1/36 1/36 1/36 1/36 1/36 1/36 1/6
5 1/36 1/36 1/36 1/36 1/36 1/36 1/6
6 1/36 1/36 1/36 1/36 1/36 1/36 1/6
PY (y) 1/6 1/6 1/6 1/6 1/6 1/6
Table 7.1
Example 7.1.1
7.2 Ĳ Joint Distributions: Discrete case

Definition 7.2.1 (Joint Probability Mass Function).
The joint probability mass function of a pair of discrete random variables X & Y is:
PX,Y ( x, y) = P( X = x & Y = y)
Properties:
1. 0 ď p X,Y ( x, y) ď 1 @ x, y
ÿÿ
2. PX,Y ( x, y) = 1
x y
The joint probability mass function for the roll of 2 dice is shown in Figure 7.2.1. A
nonzero probability is assigned to a point ( x, y) in the plane if and only if x = 1, 2, . . . , 6
and y = 1, 2, . . . , 6. Thus, exactly 36 points in the plane are assigned nonzero probabilities
of 1/36. Further, the probabilities are assigned in such a way that the sum of the nonzero
probabilities is equal to 1
160
Figure 7.2.1.
P(X,Y)
6 6
5 5
4 4
3 3
Die 2 Die 1
2 2
1 1
Joint Probability Mass Function for Two Dice Roll
161
Definition 7.2.2 (Marginal Distributions).
If we are given a joint probability distribution for X & Y, we can obtain the indi-
vidual probability distribution for X or for Y.
1. The probability mass function of X alone, called the marginal probability

mass function of X, is defined by:
ÿ
PX ( x ) = p( x ) = P( x, y)
y
2. The probability mass function of Y alone, called the marginal probability mass
function of Y, is defined by:
ÿ
PY (y) = p(y) = P( x, y)
x
The term marginal, as applied to the univariate probability functions of X and Y,

has intuitive meaning. To find PX ( x ), we sum p( x, y) over all values of y and hence
accumulate the probabilities on the x axis (or margin).
7.2.1 §§ Joint Cumulative Distribution Function (cd f )
Example 7.2.3 (Rolling of 2 Dice Cont’d)
FX,Y ( a, b) = P( X ď a & Y ď b)
The joint cd f for 2 dice rolls is given in Table 7.2 below. The Table entries can be filled in
by cumulating the probabilities in Table 7.1 from the lower end to a certain value of X & Y
Fxy 1 2 3 4 5 6
1 1/36 2/36 . . . .
2 2/36 4/36 . . . .
3 3/36 6/36 . . . .
4 . . . . . .
5 . . . . . .
6 . . . . . 36/36
Table 7.2
Example 7.2.3
162
Figure 7.2.2.
F(X,Y)
6 6
5 5
4 4
3 3
Die 2 Die 1
2 2
1 1
Joint Cumulative Distribution Function
The joint Cumulative Distribution Function for roll of 2 dice is shown in Figure 7.2.2.
A nonzero cumulative probability is assigned to a point ( x, y) in the plane if and only if
x = 1, 2, . . . , 6 and y = 1, 2, . . . , 6. These cumulative probabilities are increasing functions of
x, y and approach to maximum value of 1.
163
Definition 7.2.4 (Joint Cumulative Distribution Function: Discrete Case).
The joint cd f of a pair of discrete random variables X & Y is
FX,Y ( a, b) = P( X ď a & Y ď b)
The joint cd f satisfies the following properties:
1. 0 ď FX,Y ( a, b) ď 1 @ a, b
2. lim FXY ( a, b) = 0
aÑ´8,bÑ´8
3. lim FXY ( a, b) = 1
aÑ8,bÑ8
4. If X and Y are independent, then FX,Y ( a, b) = FX ( a).FY (b)
5. For a ă b & c ă d ùñ FX,Y ( a, c) ď FX,Y (b, d)
Example 7.2.5 (Association of Gender & CHD)

An association study of CHD1 with 2 possible outcomes: Present coded as ( X ) = 1 & Absent
coded as ( X ) = 0, with gender coded as (Y ) = 1 for males & (Y ) = 0 for females. The joint
frequency distribution of X & Y is given in Table:
CHD Absent CHD Present Total

( X = 0) ( X = 1) Total
Female (Y = 0) 977 23 1000
Male (Y = 1) 948 52 1000
Total 1925 75 2000
1. Find FX,Y (1, 0)

2. Find FX,Y (0, 1)
Solution:
Before starting to use the contingency table, convert it into joint pm f by dividing cell fre-
quencies by overall total of 2000. Remember that your Table should be arranged with values
of X, Y along with corresponding joint probabilities in ascending order for X & Y.
1. FX,Y (1, 0) = P( X ď 1, Y ď 0) = P( X = 0, Y = 0) + P( X = 1, Y = 0) = 977/2000 +
23/2000 = 1/2
2. FX,Y (0, 1) = P( X ď 0, Y ď 1) = P( X = 0, Y = 0) + P( X = 0, Y = 1) = 977/2000 +
948/2000 = 1925/2000
Example 7.2.5
1 CHD: Coronary heart disease
164
7.2.2 §§ Independent Random Variables
Example 7.2.6 (Rolling of 2 Dice Cont’d)
x/y 1 2 3 4 5 6 PX ( x )
1 1/36 1/36 1/36 1/36 1/36 1/36 1/6
2 1/36 1/36 1/36 1/36 1/36 1/36 1/6
3 1/36 1/36 1/36 1/36 1/36 1/36 1/6
4 1/36 1/36 1/36 1/36 1/36 1/36 1/6
5 1/36 1/36 1/36 1/36 1/36 1/36 1/6
6 1/36 1/36 1/36 1/36 1/36 1/36 1/6
PY (y) 1/6 1/6 1/6 1/6 1/6 1/6
PX,Y ( x, y) = 1/36 = P( X = x ).P(Y = y) = 1/6 ˆ 1/6

This holds true for @ x & y.
Example 7.2.6
Definition 7.2.7 (Independent Random Variables).
Two discrete random variables are independent if either
1. p X,Y ( x, y) = p X ( x ).pY (y) @ x & y
2. FX,Y ( x, y) = FX ( x ).FY (y) @ x & y
(a) p X|Y ( x|y) = p X ( x )

(b) pY|X (y|x ) = pY (y)
Example 7.2.8 (Association of Gender & CHD Cont’d)

The joint probability distribution of X & Y is given in Table:
CHD Absent CHD Present

( X = 0) ( X = 1)
Female (Y = 0) 977/2000 23/2000
Male (Y = 1) 948/2000 52/2000
Are CHD & Gender independent random variables?

Solution:
For independence p X,Y ( x, y) = p X ( x ).pY (y) @ x& y
165
Joint Distributions 7.3 Joint Distributions: Continuous Case
CHD Absent CHD Present P (Y )

( X = 0) ( X = 1)
Female 1925/2000 ˆ 1/2 ‰ 977/2000 75/2000 ˆ 1/2 ‰ 23/2000 1/2
(Y = 0 )
Male 1925/2000 ˆ 1/2 ‰ 948/2000 75/2000 ˆ 1/2 ‰ 52/2000 1/2
(Y = 1 )
P( X ) 1925/2000 75/2000
Table 7.3
The condition for independence, i.e., p X,Y ( x, y) = p X ( x ).pY (y) @ x& y does not hold
true, (see Table 7.3). So X & Y are not independent. There is association between Gender
& CHD.
Example 7.2.8
7.3 Ĳ Joint Distributions: Continuous Case
Definition 7.3.1 (Joint Probability Density Function (pd f )).
When X and Y are continuous random variables, the joint density function f ( x, y)
describes the likelihood that the pair ( X, Y ) belongs to the neighborhood of the
point ( x, y). The joint pd f of X, Y „ U (0, 1) is visualized as a surface lying above
the xy plane (see Figure 7.3.1).
Properties:
1. The joint density is always nonnegative, i.e.,
f X,Y ( x, y) ě 0
2.
ż8 ż8
f ( x, y)dx.dy = 1
´8 ´8
3. The joint density can be integrated to get probabilities, i.e., if A and B are
sets of real numbers, then
ż ż
PXPA,YPB = f ( x, y)dx.dy
B A
166
Figure 7.3.1.
f(x,y)
0.8 0.8
0.6 0.6
y 0.4 0.4 x
0.2 0.2
Joint Probability Density Function
Definition 7.3.2 (Marginal Probability Density Function).
If X and Y are continuous random variables with joint probability density function
f XY ( x, y), then the marginal density functions for X can be retrieved by integrating
over all ys: ż
f X (x) = f ( x, y)dy.
y
Similarly, the marginal density functions for Y can be retrieved by integrating over
all xs: ż
f Y (y) = f ( x, y)dx.
x
Example 7.3.3
167
Consider the joint pd f for X and Y:
12 2
f ( x, y) = ( x + xy) for 0 ď X ď 1; 0 ď Y ď 1
7
= 0 elsewhere
1. Find the marginal pd f of X & Y
2. Find P( X ą Y ).
Solution:
ż1
12 2
f X (x) = ( x + xy)dy
0 7
1
12 2
= ( x y + xy2 /2)
7 0
12 6
= x2 + x
7 7
ż1
12 2
fY (y) = ( x + xy)dx
0 7
1
12 3
= ( x /3 + x y/2)
2
7 0
1
= (4 + 6y)
7
ż1ż1
12 2
P( X ą Y ) = ( x + xy)dxdy
0 y 7
1

ż1
12 3
= ( x /3 + x y/2) dy
2
0 7 y
ż1
12 3 3
= + 12/14y ´ 12/21y ´ 12/14y dy
0 21
9
=
14
Example 7.3.3
168
7.3.1 §§ Joint Cumulative Distribution Function (cd f )
Definition 7.3.4 (Joint Cumulative Distribution Function (cd f ): Continuous Case).
Let X, Y be jointly continuous random variables with joint density f (X,Y ) ( x, y), then
the joint cumulative distribution function F ( a, b) is defined as:
żb ża
P( X ď a and Y ď b) = F ( a, b) = f ( x, y)dx.dy
´8 ´8
Properties:
1. 0 ď FX,Y ( a, b) ď 1 @ a, b
2. lim FXY ( a, b) = 0
a,bÑ´8
3. lim FXY ( a, b) = 1
a,bÑ8
4. For a ă b & c ă d ùñ FX,Y ( a, c) ď FX,Y (b, d)
We can obtain joint pd f from joint cd f by:
B2
f ( x, y) = F ( x, y)
Bx.By
wherever the derivative is defined.
Figure 7.3.2 shows the joint cd f of X, Y „ U (0, 1). The probability F ( x, y) corresponds
to the volume under f ( x, y) = 1, which is shaded. F ( x, y) is an increasing function of X, Y
is also evident from the shaded part.
169
Figure 7.3.2.
F(x,y)
0.8 0.8
0.6 0.6
y 0.4 0.4 x
0.2 0.2
Joint Cumulative Distribution Function
Example 7.3.5 (Joint pd f from Joint cd f )

The joint cd f of X and Y
1
F ( x, y) = xy( x + y); for 0 ď x ď 2 & 0 ď y ď 2
16
1. Find the joint pd f f ( x, y)
2. Find the marginal pd f ’s of X & Y respectively.
Solution:
B2 1
f ( x, y) = xy( x + y); for 0 ď x ď 2 & 0 ď y ď 2
Bx.By 16
1 B 2
= ( x + 2xy); for 0 ď x ď 2 & 0 ď y ď 2
16 Bx
1
= (2x + 2y); for 0 ď x ď 2 & 0 ď y ď 2
16
1
= ( x + y); for 0 ď x ď 2 & 0 ď y ď 2
8
170
1
f ( x, y) = ( x + y) for 0 ď X ď 2; 0 ď Y ď 2
8
= 0 elsewhere
ż2
1
f X (x) = ( x + y)dy
0 8
1
= ( x + 1)
4
ż2
1
fY (y) = ( x + y)dx
0 8
1
= ( y + 1)
4
The marginal pd f of X is:

1
f (x) = ( x + 1) for 0 ď X ď 2
4
= 0 elsewhere
The marginal pd f of Y is:
1
f (y) = (y + 1) for 0 ď Y ď 2
4
= 0 elsewhere
Example 7.3.5
Example 7.3.6
Consider the joint pd f for X and Y:
12 2
7
= 0 elsewhere
Find the joint cd f of X & Y
Solution:
żyżx
12 2
FX,Y ( x, y) = ( x + xy)dxdy
0 0 7
x
12 y 3
ż
= ( x /3 + x y/2) dy
2
7 0
y0
12
= ( x3 y/3 + x2 y2 /4)
7 0
1 2
= x y(4x + 3y)
7
171
Example 7.3.6
7.3.2 §§ Independent Random Variables
Let X, Y be jointly continuous random variables with joint density f (X,Y ) ( x, y) and
marginal densities f X ( x ), f Y (y). We say they are independent if
1. f X,Y ( x, y) = f X ( x ). f Y (y) @ x & y
(a) f X|Y ( x|y) = f X ( x )

(b) f Y|X (y|x ) = f Y (y)
2. FX,Y ( x, y) = FX ( x ).FY (y) @ x & y
If we know the joint density of X and Y , then we can use the definition to see if
they are independent.
Example 7.3.8
For the Example 7.3.3
12 2
7
= 0 elsewhere
Are X & Y are independent?

Solution:
12 2 6
f X (x) = x + x
7 7
1
f Y (y) = (4 + 6y)
7
As f ( x, y) ‰ f X ( x ) ˆ f Y (y), so X & Y are not independent.

Example 7.3.8
172
Joint Distributions 7.4 Convolution
7.4 Ĳ Convolution
Ĳ Distribution of Sums of Independent Random Variables
Example 7.4.1
Let S be the sum that appears on the roll of 2 dice, i.e., S = X1 + X2 ;
P(S = 2) = 1/36
P(S = 3) = P( X1 = 1, X2 = 2) + P( X1 = 2, X2 = 1)
= P ( X1 = 1 ) P ( X2 = 2 ) + P ( X1 = 2 ) P ( X2 = 1 )
= 2/36
Example 7.4.1
Figure 7.4.1.
173
Definition 7.4.2 (Convolution).
1. We have independent random variables X and Y with known distributions.
2. It is often important to be able to calculate the distribution of Z = X + Y

from the distributions of X and Y when X and Y are independent.
In probability theory, convolution is a mathematical operation that allows the

derivation of the distribution of a sum of two random variables from the distri-
butions of the two summands.
7.4.1 §§ Convolution: Discrete Case
Definition 7.4.3 (Convolution: Discrete Case).
Suppose that X and Y are independent, integer valued random variables having
probability mass functions Px and Py , then Z = X + Y is also an integer-valued
random variable with probability mass function. Using the Law of Total Probability,
and independence
PX +Y (z) = P( X + Y = z)
ÿ
= P( X = k, Y = z ´ k). Law of Total Probability
k
ÿ
= P( X = k ).P(Y = z ´ k ). Independence
k
ÿ
= PX (k ) PY (z ´ k )
k
174
Figure 7.4.2.
Rolling of 3 Dice
Example 7.4.4
Let X1 and X2 be the outcomes from the 2 dice roll, and let S2 = X1 + X2 be the sum of
these outcomes & S3 = X1 + X2 + X3 the sample space for 3 dice rolls. The distribution
for S3 would then be the convolution of the distribution for S2 with the distribution for X3 .
Find the distribution of S3 = 7.
Solution:
P ( S3 = 7 ) = P ( S2 = 6 ) P ( X3 = 1 )
+ P ( S2 = 5 ) P ( X3 = 2 )
+ P ( S2 = 4 ) P ( X3 = 3 )
+ P ( S2 = 3 ) P ( X3 = 4 )
+ P ( S2 = 2 ) P ( X3 = 5 )
= 5/36 + 1/6
+ 4/36 + 1/6
+ 3/36 + 1/6
+ 2/36 + 1/6
+ 1/36 + 1/6
= 15/216
Example 7.4.4
175
Example 7.4.5
n
ÿ
1. Let Xi „ Bernoulli ( p); i = 1, . . . , n. Then the distribution of Xi „ Bin(n; p).
i
2. Let X „ Bin(n1 ; p) and Y „ Bin(n2 ; p) are independent. Then the distribution of

X + Y is X + Y „ Bin(n1 + n2 ; p).
3. Let X „ Poi (λ) and Y „ Poi (µ) are independent. Then the distribution of X + Y, i.e.,
X + Y „ Poi (λ + µ).
4. Let Xi ; i = 1, . . . , r be independent random variables having the geometric distribution

with parameter p, i.e., P[ Xi = k ] = p(1 ´ p)k´1 for k = 1, 2, . . . (k: # of trials until
ÿr
1st success). Then Xi is Negative Binomial with parameters p & the fixed number
i
of successes r
Example 7.4.5
7.4.2 §§ Convolution: Continuous Case

Definition 7.4.6 (Convolution: Continuous Case).
Suppose that X and Y are independent, continuous random variables having proba-
bility density functions f x and f y . Then the density of their sum is the convolution
of their densities. i.e., let sum Z = X + Y is a continuous random variable with
density
ż8
f X +Y ( z ) = f X (z ´ y) f Y (y)dy
´8
ż8
= f X ( x ) f Y (z ´ x )dx
´8
Example 7.4.7 (Sums of Independent Uniform Random Variables)

Let X & Y be independent uniform random variables each on [0, 1].
"
1 if 0 ď x, y ď 1;
f X ( x ) = fY (y) =
0 otherwise
Let Z = X + Y. The minimum possible value of Z is zero when x = 0 & y = 0, mid-

interval value of z = 1 when x + y = 1, and the maximum possible value is two, when
176
x = 1 & y = 1. Thus the sum Z is defined only on the interval (0, 2) since the probability
of z ă 0 or z ą 2 is zero, that is,
P( Z|z ă 0) = 0 & P( Z|z ą 2) = 0.
ż8
f Z (z) = f X ( x ) f Y (z ´ x )dx
´8
ż8
Finding f X ( x ) f Y (z ´ x )dx comes to the same as finding the area of set: t( x, y)(0, 1) ˆ
´8
(0, 1)|x + y ď zu. As f X ( x ) = 1 if 0 ď x ď 1 and 0 otherwise
ż1
f Z (z) = f Y (z ´ x )dx
0
Plotting the region defined by the limits 0 ă x ă 1; & 0 ă z ´ x ă 1 in the ( x, z) plane, we

get the integrand as shown in Figure 7.4.3. The limits of integration on x depend on the
value of z: The integrand is 1 when 0 ă x ă 1 and 0 ă z ´ x ă 1 Ñ z ´ 1 ă x ă z, and zero
otherwise. There are three cases (as in Figure 7.4.3);
żz
1. When 0 ă z ă 1, the limits run from x = 0 to x = z, so f Z (z) = 1.dx = z.
0
ż1
2. When 1 ă z ă 2, the limits run from x = z ´ 1 to x = 1, so f Z (z) = .1.dx = 2z.
z´1
3. When z ă 0 or z ą 2, the integrand is zero, so f Z (z) = 0.
$
& z if 0 ď z ď 1,
f Z (z) = 2 ´ z if 1 ă z ď 2
0 otherwise
%
(See the pdf of Z in Figure 7.4.4).

Example 7.4.7
177
Figure 7.4.3.
178
Figure 7.4.4.
1.0
0.8
0.6
f(z)
0.4
0.2
0.0
0.0 0.5 1.0 1.5 2.0
Convolution of two independent Uniform Densities: Triangular Distribution
Example 7.4.8 (Sums of Independent Normal Distribution)

If X1 , X2 , . . . , Xn are independent Normal random variables, with expected values µ1 , µ2 , . . . , µn ,
n
ÿ
2 2 2
respectively, and with variances σ1 , σ2 , . . . , σn , respectively, then X i = X1 + X2 + ¨ ¨ ¨ + X n
i =1
n
ÿ
is also a Normal random variable with expected value µi = µ1 + µ2 + ¨ ¨ ¨ + µn , and vari-
i =1
n
ÿ
ance σi2 = σ12 + σ22 + ¨ ¨ ¨ + σn2 , i.e.,
i =1
n
ÿ n
ÿ n
ÿ
Xi „ N µi , σi2
i =1 i =1 i =1
Example 7.4.8
Example 7.4.9 (Sums of Independent Exponential Distribution)

Let X and Y be independent exponential random variables, each having parameter λ. Let
179
Joint Distributions 7.5 Home Work
Z = X + Y, the density of their sums Z is

f X +Y ( z ) = f ( X + Y = z )
ż8
= f X ( x ) f Y (z ´ x )dx
´8
żz
= λe´λx λe´λ(z´x) dx
0
= λ2 e´λz z
This density is called the Gamma(2, λ) density. The convolution of n = 2 i.i.d.2 Exponential
distributions results in the Gamma(n = 2, λ) density.
Example 7.4.9
7.5 Ĳ Home Work

1. An association study of color blindness ( X ) with 2 possible outcomes: Yes=1 for color
blind & No=0 for not color blinded with gender (Y ) coded as 1 for females & 0 for
males. The joint frequency distribution of X & Y is given in Table:
Color blinded/Gender Male=0 Female=1 Total

Yes=1 16 2 18
No=0 240 254 494
Total 256 256 512
(a) Find FX,Y (1, 0)

(b) Find FX,Y (0, 1)
2. Let ( X, Y ) have the joint pmf given in Table below:
X=1 X=2 X=3

Y=1 0.3 0.1
Y=2 0.2 0.2
Y=3 0.1 0.1
(a) Find P( X ) & P(Y )

(b) Find P( X = Y ); & P( X ą Y ),
(c) Find P(Y = 2|X = 2)
3. Consider the following joint pm f , f (0, 0) = 1/12; f (1, 0) = 5/12; f (0, 1) = f (1, 1) =
3/12; f ( x, y) = 0 for all other values. Find the marginal distributions of X & Y
respectively.
2 independent identically distributed
180
4. Suppose that a radioactive particle is randomly located in a square with sides of unit
length. Let X and Y denote the coordinates of the particle’s location. A reasonable
model for the relative frequency histogram for X and Y is the bivariate analogue of the
univariate uniform density function:
f ( x, y) = 1 for 0 ď X ď 1; 0 ď Y ď 1
= 0 elsewhere
a. Verify that f ( x; y) is a valid pd f .

b. Find F (0.2, 0.4).
c. Find P(0.1 ď X ď 0.3; 0 ď Y ď 0.5)
5. Suppose that two continuous random variables X and Y have a joint probability density
function
f X,Y ( x, y) = A( x ´ 3)y for ´ 2 ď X ď 3 & 4 ď Y ď 6

= 0 otherwise
a. Find A.
b. Construct the marginal probability density functions f X ( x ) and f Y (y).
c. Are the random variables X and Y independent?
§§ Answers
1. 256/512; 494/512
2. p X (1) = 0.1, p X (2) = 0.6; p X (3) = 0.3

pY (1) = 0.4, pY (2) = 0.4; & pY (3) = 0.2
P( X = Y ) = 0.2; & P( X ą Y ) = 0.6, P( X = 2|Y = 2) = 1/2
3. f X (0) = 1/3, f X (1) = 2/3

f Y (0) = 1/2, f Y (1) = 1/2.
4. 0.08; 0.10
1 2(3 ´ x ) y
5. A = ´ ; f X (x) = ; for ´ 2 ď x ď 3; f Y (y) = ; for 4 ď y ď 6; The
125 25 10
random variables X and Y are independent.
181
182
Chapter 8
Properties of Expectation
AS YOU READ . . .
1. What is expectation for jointly distributed random variables?
2. What are the conditional distributions & the respective expectations?
3. What are the Covariance & Correlation measures?
8.1 Ĳ Jointly Distributed Variables: Expectation for Discrete Case
Definition 8.1.1 (Expectation: Discrete Case).
If X and Y are jointly distributed discrete random variables, then the marginal pm f
of X, PX ( x ) is defined as: ÿ
PX ( x ) = PXY ( x, y)
y
The mean of X is still given by:
ÿ
E[ X ] = x ¨ P( x )
x
ÿ ÿ
= x ¨ PXY ( x, y)
x y
183
Properties of Expectation
8.2 Jointly Distributed Variables: Expectation for Continuous Case
8.2 Ĳ Jointly Distributed Variables: Expectation for Continuous Case
Definition 8.2.1 (Expectation: Continuous Case).
If X and Y are jointly continuous random variables, then the marginal pd f of X is

defined as:
ż
f X ( x ) = f XY ( x, y)dy
y
The mean of X is given by:

ż8
E[ X ] = x ¨ f ( x )dx
´8
ż8 ż8
= x¨ f XY ( x, y)dydx
´8 ´8
§§ Expectation: Properties
E[ X ] is a weighted average of the possible values of X, it follows that if a ď X ď b, then so

must its expected value. That is, if
P( a ď X ď b) = 1,
then a ď E[ X ] ď b.
184
Properties of Expectation 8.3 Some Function of Jointly Distributed Random Variable
8.3 Ĳ Some Function of Jointly Distributed Random Variable

Definition 8.3.1 (Expectation: Some Function of Random Variables).
Suppose that X and Y are random variables and g( x, y) is a function of two variables.
1. For discrete X and Y

ÿÿ
E g( X, Y ) = g( x, y) P( x, y)
y x
2. For continuous X and Y

ż ż
E g( X, Y ) = g( x, y) f ( x, y)dxdy
y x
A fundamental property of the expectation operator is that it is linear. If X and Y

are jointly distributed random variables and a, b are real numbers, then
E[ aX + bY ] = aE[ X ] + bE[Y ]
Example 8.3.2 (Association between Gender & Color-blindness)

The joint pmf of Color-blindness is coded as X = 0 for No, X = 1 for Yes & Gender coded
as Y = 0 for Males, Y = 1 for Females is given below:
X=Color blinded/Y=Gender Male=0 Female=1 Total

No=0 240/512 254/512 494/512
Yes=1 16/512 2/512 18/512
Total 256/512 256/512 1

Let g( X, Y ) = XY. Find E g( X, Y )
Solution:
ÿÿ
E g( X, Y ) = g( x, y) P( x, y)
y x
= (0) ˆ 240/512 + (0 ˆ 1) ˆ 254/512
+ (1 ˆ 0) ˆ 16/512 + (1 ˆ 1) ˆ 2/512
= 0 + 0 + 0 + 2/512
= 2/512
Example 8.3.2
185

Example 8.3.3 Expectation & Variance of X
Let X1 , . . . , Xn be iid1 random variables having distribution function F and expected value
n
2
ÿ Xi
µ & variance σ . Let X =
n
i =1
ÿ
n
Xi
E( X ) = E
n
i =1
ÿ n
1
= E Xi
n
i =1
n
1ÿ
= E Xi
n
i =1
1
= nµ
n
=µ
ÿ
n
Xi
Var ( X ) = Var
n
i =1
2 ÿ n
1
= ¨ Var Xi
n
i =1
2 ÿ n
1
= ¨ Var Xi
n
i =1
2
1
= ¨ nσ2
n
σ2
=
n
The same results were also displayed in the Central Limit Theorem (see Figure 6.2.3 &
Figure 6.2.5). The reason for the much smaller variability is now mathematically evident.
The variance of the distribution of the sample means x̄ is scaled down by a factor of size n,
the sample size.
Example 8.3.3
1 independent identically distributed
186
8.3.1 §§ Expectation of Sums of Jointly Distributed Random Variables
Definition 8.3.4 (Expectation of Sums of Jointly Distributed Random Variables).
Suppose that X and Y are random variables with joint mass function PXY and
marginal probability mass functions PX and PY . Then E[ X + Y ] is given by
ÿÿ
E[ X + Y ] = ( x + y) P( x, y)
x y
= E( x ) + E(y)
Example 8.3.5 (Association between Gender & Color-blindness Cont’d)

For Example 8.3.2, find E[ X + Y ] and show that E[ X + Y ] = E( x ) + E(y).
Solution:
Let g( X, Y ) = X + Y
187
ÿÿ
E g( X, Y ) = g( x, y) P( x, y)
y x
ÿÿ
= ( x + y) ¨ P( x, y)
y x
= (0 + 0) ˆ 240/512 + (0 + 1) ˆ 254/512
+ (1 + 0) ˆ 16/512 + (1 + 1) ˆ 2/512
= 0 + 254/512 + 16/512 + 2(2/512)
= 274/512
ÿ
E X = xP( x, y)
x
= 0 ˆ 240/512 + 0 ˆ 254/512
+ 1 ˆ 16/512 + 1 ˆ 2/512
= 18/512
ÿ
E Y = yP( x, y)
y
= 0 ˆ 240/512 + 1 ˆ 254/512
+ 0 ˆ 16/512 + 1 ˆ 2/512
= 256/512
E[ X + Y ] = 274/512
= 18/512 + 256/512
= E ( X ) + E (Y )
E [ X + Y ] = E ( X ) + E (Y )
Another result is also evident that as X & Y are indicator random variables, so their expec-
tation is equal to their respective marginal probability at X = 1 or at Y = 1.
Example 8.3.5
8.3.2 §§ Expectation of Sums of Functions of Jointly Distributed Random Variables
Definition 8.3.6 (Expectation of Sums of Functions of Random Variables).
Suppose that X and Y are random variables and g( x, y) and h( x, y) are some func-
tions of the two variables, then:

E g( X, Y ) ˘ h( X, Y ) = E g( X, Y ) ˘ E h( X, Y )
188

For Example 8.3.2, let g( X, Y ) = X + Y and h( X, Y ) = X ´ Y find E[ X ´ Y ] and show that

E g( X, Y ) + h( X, Y ) = E g( X, Y ) + E h( X, Y )
Solution:
Let g( X, Y ) = X + Y and h( X, Y ) = X ´ Y

From Example 8.3.5, E g( X, Y ) = 274/512
ÿÿ
E h( X, Y ) = h( x, y) P( x, y)
y x
ÿÿ
= ( x ´ y) ¨ P( x, y)
y x
= (0 ´ 0) ˆ 240/512 + (0 ´ 1) ˆ 254/512
+ (1 ´ 0) ˆ 16/512 + (1 ´ 1) ˆ 2/512
= 0 ´ 254/512 + 16/512 + 0
= ´238/512

E g( X, Y ) + h( X, Y ) = E ( X + Y ) + ( X ´ Y )

= E 2X

= 2E X
= 2 ˆ 18/512
= 18/256

E g( X, Y ) + E h( X, Y ) = E ( X + Y ) + E ( X ´ Y )
= 274/512 + (´238/512)
= 18/256

6 E g( X, Y ) + h( X, Y ) = E g( X, Y ) + E h( X, Y )
Example 8.3.7
189
Properties of Expectation 8.4 Conditional Distribution
Two discrete random variables are independent if:
PX,Y ( x, y) = PX ( x ) ¨ PY (y) @ x& y
If two random variables are independent, then the expectation of the product factors
into a product of expectations, i.e.,

E g ( X ) h (Y ) = E g ( X ) ¨ E h (Y )
In particular,
E( XY ) = E( X ) ¨ E(Y )
8.4 Ĳ Conditional Distribution
8.4.1 §§ Conditional Distributions: Discrete Case
Definition 8.4.1 (Discrete Case: Conditional Distribution).
The conditional probability mass function of a discrete random variable X, given

the value of the other random variable Y, for all values of y such that pY (y) ą 0 is
PX|Y ( x|y) = P( X = x|Y = y)

p X,Y ( x, y)
=
pY ( y )
§§ Discrete Case: Conditional Distribution Properties
A conditional probability distribution PX|y ( x ) has the following properties: For discrete
random variables ( X, Y )
1. PX|y ( x ) ě 0
ÿ
2. PX|y ( x ) = 1
x
3. PX|y ( x ) = P( X = x|Y = y)
190
8.4.2 §§ Conditional Expectation: Discrete Case
Definition 8.4.2 (Discrete Case: Conditional Expectation).
The expectation of a function of two random variables conditioned on one of them

taking a certain value can be computed using the conditional pmf or pdf.
The conditional mean of Y given X = x, denoted as E(Y|X ) is:
ÿ
E(Y|x ) = yP(Y|X = x )
y

For the Example 8.3.2
(a). Find the conditional distribution of X given Y = 0 and X given Y = 1.
(b). Find the E( X|Y = 0)
(c). Find the E( X|Y = 1)
Solution:
(a).
P( X, Y = 0)
P( X|Y = 0) =
P (Y = 0 )
240/512
P( X = 0|Y = 0) =
256/512
15
=
16
16/512
P( X = 1|Y = 0) =
256/512
1
=
16
The conditional distribution of X|Y = 0 is
15 1
P( X|Y = 0) = ;
16 16
The conditional probability distribution P( X|Y = 0) is a probability mass function as

it satisfies the properties of pm f .
191
P( X, Y = 1)
P( X|Y = 1) =
P (Y = 1 )
254/512
P( X = 0|Y = 1) =
256/512
127
=
128
2/512
P( X = 1|Y = 1) =
256/512
1
=
128
Similarly, the conditional probability distribution P( X|Y = 1) is also a probability

mass function.
(b).
ÿ
E X|Y = 0 = xP( X|Y = 0)
x
= 0 ˆ 15/16 + 1 ˆ 1/16
= 1/16
(c).
ÿ
E X|Y = 1 = xP( X|Y = 1)
x
= 0 ˆ 127/128 + 1 ˆ 1/128
= 1/128
E( X|Y ) is a random variable that is a function of Y. So the pm f of E( X|Y ) is:
E( X|Y ) 1/16 1/128 Sum

p (Y ) 1/2 1/2 1
Example 8.4.3
192
8.4.3 §§ Conditional Distribution: Continuous Case

Definition 8.4.4 (Continuous Case: Conditional Distribution).
The conditional probability density function of a continuous random variable X,

given the value of the other random variable Y = y for all values of y such that
f (Y ) ą 0, is
f X|Y ( x|y) = f ( X = x|Y = y)

f X,Y ( x, y)
=
fY (y)
§§ Continuous Case: Conditional Distribution Properties

A conditional probability distribution f X|y ( x ) has the following properties: For continuous
random variables ( X, Y )
1. f X|y ( x ) ě 0
ż
2. f X|y ( x|y)dx = 1
x
3. f X|y ( x ) = f ( X = x|Y = y)
8.4.4 §§ Conditional Expectation: Continuous Case

Definition 8.4.5 (Continuous Case: Conditional Expectation).
The conditional mean of Y given X = x, denoted as E(Y|X ), is
ż
E(Y|X = x ) = y f (Y|X = x )dy
The conditional mean reduces distributions to single summary measure.
Example 8.4.6
The joint cd f of X and Y
1
F ( x, y) =xy( x + y); for 0 ď x ď 2 & 0 ď y ď 2
16
Find the conditional expectation E(Y|X ).
193
Solution:
The joint pdf, marginal pdf of X & Y were computed in Example 7.3.5.
1
f ( x, y) = ( x + y); for 0 ď X ď 2; 0 ď Y ď 2
8
= 0 elsewhere
1
f (x) = ( x + 1); for 0 ď X ď 2
4
= 0 elsewhere
1
f (y) = (y + 1); for 0 ď Y ď 2
4
= 0 elsewhere
1
8 ( x + y)
f (Y|X ) = 1
4 ( x + 1)

x+y
= 1/2
x+1
ż2
x+y
E(Y|X ) = 1/2 y dy
0 x+1
2
1
= ( xy /2 + y /3)
2 3
2( x + 1) 0
x + 4/3
=
x+1
Example 8.4.6
8.4.5 §§ Properties of Conditional Expectation

Definition 8.4.7 (Conditional Expectation of Some Function of Random Variable).
Suppose that X and Y are random variables and g( x, y) is a function of two variables.
• For discrete X&Y

ÿ
E g( X )|Y = y = g( x ) PX|Y ( x|y)
x
• For continuous X&Y

ż
E g( X )|Y = y = g( x ) f X|Y ( x|y)dx
x
194
Definition 8.4.8 (Properties of Conditional Expectation).
1. ÿ ÿ
E Xi |Y = y = E Xi |Y = y
i i
2. Law of iterated Expectations (Total Expectation Theorem): Iterated expec-

tation is a useful tool to compute the expected value of a certain quantity that
depends on several random variables. As we have seen previously in Example
8.4.3 E( X|Y ) is a random
variable
that is a function of Y, so its expectation
can be calculated as E E[ X|Y ] . The idea is that the expected value can be
obtained as the expectation of the conditional expectation.
$ ÿ
’ E[ X|Y ] ¨ P(Y = y) Discrete
& y
’
E E[ X|Y ] = ż
’
’
% E[ X|Y ] ¨ f (Y )dy Continuous
y

=E X
3. If X and Y are independent random variables, then

E X|Y = y = E X

E Y|X = x = E Y
195
Properties of Expectation 8.5 Covariance

Find E E[X|Y] for this dataset.
Solution:
The pm f of E[ X|Y ] was evaluated in Example 8.4.3.
E( X|Y ) 1/16 1/128 Sum
p (Y ) 1/2 1/2 1
As E[ X|Y ] is a random variable, its expectation can be computed using the law of iterated
expectations as:
ÿ
E E[ X|Y ] = E[ X|Y ] P(Y = y)
y
= 1/16 ˆ 1/2 + 1/128 ˆ 1/2
= 9/256
= E( X )
Example 8.4.9
Example 8.4.10 (Wage Distribution)

Suppose in a firm of 100 employees, 40 have a University degree. Let Y be the average income.
The average income of University degree employees is twice the average income of the non-
University degree employees. Suppose that the average income of the non-University degree
employees is Rs.100,000, what is the expected monthly income Y? Identify the expectation
property/law used if any.
Solution:
Let X be degree status with X = 0 for non-University degree employee & X = 1 for University
degree employee. We are given that P( X = 0) = 0.4; P( X = 1) = 0.6.
We need to find the conditional expectation of income given degree status.

E E(Y ) = E Y|X = 0 P( X = 0) + E Y|X = 1 P( X = 1)
= 100000 ˆ 0.6 + 200000 ˆ 0.4
= 60, 000 + 80, 000
= 140, 000
Using the law of iterated expectations.
Example 8.4.10
8.5 Ĳ Covariance
The covariance matrix of a random vector captures the interaction between the components
of the vector. It contains the variance of each variable in the diagonal and the covariances
between the different variables in the off diagonals.
196
Example 8.5.1 (Iris Data Set: Covariance Matrix)

The covariance matrix between Petal Length & Petal Width in Fisher’s Iris dataset is given
below:
" #
σx2 σxy
Σ=
σxy σy2

0.68 0.29
=
0.29 0.18
Figure 8.5.1 shows the covariance matrix between Petal Length, Petal Width, Sepal Length
& Sepal Width in Fisher’s Iris dataset.
Figure 8.5.1.
4.5 5.5 6.5 7.5

7.5
6.5
Sepal.Length
5.5
4.5
●
●
●
7 2.0 2.5 3.0 3.5 4.0
●
●
● ● ●●
● ●●
● ●● ●
●●● ●
● ● ●●● ● ● ●●
● ●● ●
● ●●
●●
●
● ●
●● ●●● ●
● ●●
●● ●●● ● ●● ●●● ●●●● ●● ●●
Sepal.Width
● ●● ●●●●● ● ●
●●● ●●●●● ● ● ●
● ● ● ● ●●
● ●● ● ●
● ● ●●● ● ●
● ●
● ● ● ●
●●
●
● ●
●● ● ● ●
● ● ● ●
● ●
●● ● ● ● ●● ●
6
● ●●● ●● ● ● ●
●
● ●
● ●● ● ● ●●●●●●
● ●
● ●● ●● ● ●●● ●
●●● ●●●●●● ●● ●
●● ●● ●● ● ● ●●●
5
● ●
●●●● ●● ● ● ●●
● ●●
●●●
●●
●● ●●
● ●●●● ●●●●●●●●●
●
●● ●● ●●●
●●● ●
●●● ● ● ●●
●●
●● ● ●●
●●●
●●●
●● ● ●●
●
Petal.Length
4
● ●●● ●
●● ●
● ● ● ● ●
●● ●●
● ●
3
2
● ● ● ●
● ● ● ●●● ●●●
●●●
●●● ●●
●●
●●●●● ● ●●
●●●●
●●●●● ●● ●
● ●●●●
●●● ●● ● ●●
● ●
● ●●
● ●●
● ● ● ● ● ● ●
1
0.5 1.0 1.5 2.0 2.5
0.5 1.0 1.5 2.0 2.5
● ● ● ● ● ● ●●
● ● ● ● ● ● ● ●
● ● ●●● ● ● ●●● ● ●●●● ●●● ●
●● ● ● ● ● ●● ●
● ●●● ● ● ● ●● ● ●●●●● ●
●● ● ●● ● ● ● ● ● ●●●● ●●
● ●● ● ● ●● ●●● ●
●●●●●●● ● ●● ● ●●●●●● ●●● ●●●● ●
● ● ● ● ● ●
● ● ● ● ● ●● ●● ● ●
● ● ●● ●●●● ● ● ● ● ●●●●● ● ●●●●●●
● ● ●●● ● ●●●●●●● ● ●●●● ●
● ●●
●●● ●●●● ●
● ●● ●
● ● ●●●●
●●
●●● ●
●
● ●●●●●●●
●●●● ●
●●
Petal.Width
●● ● ●● ● ● ●●● ●● ●●● ●●
● ● ●
● ● ●
●● ● ● ● ●●● ● ●●●●●
●● ● ●● ● ● ● ●● ● ●●●●
● ●●●●●●●●●● ● ●●●●
●●●●●●●● ● ● ●●●
●●
●●
●●●●●
● ●● ● ●● ● ● ● ●●
4.5 5.5 6.5 7.5 2.0 2.5 3.0 3.5 4.0 1 2 3 4 5 6 7 0.5 1.0 1.5 2.0 2.5
Scatterplot Matrix for Iris Dataset
Example 8.5.1
Example 8.5.2 (Covariance: Shape of the Data)
197
The covariance matrix defines the shape of the data. Diagonal spread is captured by the
covariance, while axis-aligned spread is captured by the variance, (see Figure 8.5.2).
Figure 8.5.2.
Link Between Shape of the Data & the Covariance Matrix:
https://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/
Example 8.5.2
198
Definition 8.5.3 (Covariance).
The covariance is a measure of joint variability between two random variables X

and Y & denoted as σXY is:

σXY = E X ´ E( X ) . Y ´ E(Y )

= E XY ´ E X ¨ E Y
If X and Y are independent, then

σXY = E XY ´ E X ¨ E Y

= E X Ë Y É X Ë Y
=0
However, the converse is not true.
Example 8.5.4
Let X and Y be two independent Bernoulli random variables with parameter p = 1/2. Con-
sider the random variables
U = X+Y
V = XÝ
Note that
PU (0) = P( X = 0; Y = 0)
= 1/4
PV (0) = P( X = 1; Y = 1) + P( X = 0; Y = 0)
= 1/2
PU,V (0, 0) = P( X = 0; Y = 0)
= 1/4
PU,V (0, 0) ‰ PU (0) PV (0)
so U and V are not independent. However, they are uncorrelated as

σUV = E UV ´ E U ¨ E V

= E ( X + Y )( X ´ Y ) ´ E ( X + Y ) ¨ E ( X ´ Y )
= E ( X 2 ) ´ E (Y 2 ) ´ E ( X ) 2 + E (Y ) 2
=0
The final equality holds because X and Y have the same distribution.
Example 8.5.4
199
Properties of Expectation 8.6 Correlation
8.5.1 §§ Properties
Definition 8.5.5 (Covariance: Properties).
Cov( X, Y ) = Cov(Y, X ); Symmetry

Cov( X, X ) = Var ( X ) ě 0;
Cov( aX, Y ) = aCov( X, Y )
ÿ ÿ ÿÿ
Cov Xi , Yj = Cov( Xi , Yj )
i j i j

Find the covariance σXY between Color-blindness X & Gender Y in Example 8.3.2.
Solution:

σXY = E X ´ E( X ) . Y ´ E(Y )

= E XY ´ E X .E Y

E XY = 2/512 was computed in Example 8.3.2, while E X = 18/512 & E Y = 256/512
in Example 8.3.5.

σXY = E XY ´ E X ¨ E Y
= 2/512 ´ 18/512 ˆ 256/512
= ´7/512
There is weak negative association between Gender & color-blindness.

Example 8.5.6
8.6 Ĳ Correlation
§§ Background
The covariance does not take into account the magnitude of the variances of the random
variables involved. Correlation quantifies the strength of the linear relationship between a
pair of random variables.
200
Figure 8.6.1.
Definition 8.6.1 (Correlation: Strength of Linear Relationship).
The correlation is a measure of mutual relationship between two random variables

X and Y, denoted by ρ( X, Y ). Correlation is a scaled version of covariance and
is obtained by normalizing the covariance using the standard deviations of both
variables. Mathematically, ρ( X, Y ) is defined as:
Cov( X, Y )
ρ( X, Y ) = a
Var ( X ).Var (Y )
Note: The variance is only zero when a random variable is constant. So, as long as
X and Y are not constant, then the correlation between them is well-defined.
´1 ď ρ( X, Y ) ď 1
• A value of ρ( X, Y ) « 1 indicates a high degree of linear relationship between

X and Y,
• ρ( X, Y ) « 0 means weak linear relationship
• A value of ρ( X, Y ) « ´1 indicates a high degree of negative linear relationship

between X and Y.
201
The scatter plots below illustrate the cases of strong and weak (positive or negative) cor-
relation. Figure 8.6.2 shows the strong positive association between duration of the eruption
and waiting time between eruptions for the Old Faithful geyser in Yellowstone National Park,
Wyoming, USA.
Figure 8.6.2.
●
●
● ●
●
●
90
● ●● ●●●
● ● ●
● ● ● ● ●●
● ●
● ●● ● ●
● ● ● ●●●
●● ●● ●●● ●● ●
● ● ● ● ● ●●● ● ●
● ● ●●
●●
● ●● ● ● ●
● ● ● ●● ● ●● ● ●● ●
80
● ● ● ● ●● ●
● ● ● ● ●● ● ● ●●
● ●
● ● ● ●● ● ● ●● ●● ●●
●● ●● ● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ●
● ● ●
●●
● ● ● ● ●●
● ● ● ●● ●
●
waiting
● ● ● ● ●
70
● ● ●●
● ●
●
●
● ●
● ● ●
● ● ● ●
● ● ●
● ● ● ●
60
● ● ● ●● ●
● ●● ● ● ●●
● ●● ●
● ●
●● ●
● ●● ● ● ●
●
●●●●● ● ●
●● ●● ● ●
● ● ● ● ●
●●●● ● ●
50
● ● ● ●
●●● ● ●
● ● ●
● ● ●
●●
● ●
●● ●
●
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
eruptions
Scatterplot for Old Faithful Geyser Dataset; r = 0.90
202
Figure 8.6.3 shows the relationship between weight of patient after study period (lbs)
(Postwt) and weight of patient before study period (lbs) (Prewt) for young female anorexia
patients. There seems to be no strong linear relationship between Postwt & Prewt.
Figure 8.6.3.
●
100
● ●
●
●
●
●
●
95
● ●
●
●
●
●
●
● ●
●
● ●
90
●
●
Postwt
● ●
● ● ●
●
● ●
85
●
● ●
●
● ● ●
● ● ●
● ● ●●
● ● ● ●
80
● ●
● ●
●
● ●
●●
● ●●
●
● ●
75
● ●
●●
●
●
●
70 75 80 85 90 95
Prewt
Scatterplot between Prewt & Postwt in Anorexia Dataset; r = 0.33.
203
Figure 8.6.4 shows the relationship between Miles/gallon (mpg) and Displacement (cu.in.)
(disp) in cars dataset. There seems to be a negative relationship between mpg & disp.
Figure 8.6.4.
●
●
●
400
● ●
● ●
●
300
●●
disp
● ● ●
●
200
● ●
●
● ●
●
●
● ●
●
100
●
● ● ●
●
10 15 20 25 30
mpg
Scatterplot between mpg and disp in cars dataset; r = ´0.85

Find the correlation ρ XY between Color-blindness X & Gender Y in Example 8.3.2.
Solution:
204
Cov( X, Y )
ρ( X, Y ) = a
Var ( X ) ¨ Var (Y )
2
Var ( X ) = E X ´ ( E X )2

Var (Y ) = E Y 2 ´ ( E Y )2
ÿ 2
E X2 = x P( x, y)
x
= 0 ˆ 240/512 + 02 ˆ 254/512
2
+ 12 ˆ 16/512 + 12 ˆ 2/512
= 18/512
Var ( X ) = 18/512 ´ (18/512)2
= 0.0339
2 ÿ 2
E Y = y P( x, y)
y
= 02 ˆ 240/512 + 12 ˆ 254/512
+ 02 ˆ 16/512 + 12 ˆ 2/512
= 256/512
= 1/2
Var (Y ) = 1/2 ´ (1/2)2
= 0.25
From Example 8.5.6, Cov( X, Y ) = ´7/512.
Cov( X, Y )
ρ( X, Y ) = a
´7/512
=a
0.0339 ˆ 0.25)
= ´0.1485
There is weak negative correlation.
Example 8.6.2
Example 8.6.3
Find a simplified expression for correlation between 10X, Y + 4
Solution:
205
Certain properties of Covariance & variance are in effect here.
Cov(10X, Y + 4) = 10Cov( X, Y )
Var (10X ) = 100Var ( X )
Var (Y + 4) = Var (Y )
Cov(10X, Y + 4)
ρ( X, Y ) = a
Var (10X ) ¨ Var (Y + 4)
10Cov( X, Y )
=
10SD ( X ) ¨ SD (Y )
Cov( X, Y )
=
SD ( X ) ¨ SD (Y )
=ρ
Example 8.6.3
Example 8.6.4
Let X be a Uniform random variable on the interval [0, 1], and let Y = X 2 . Find the corre-
lation between X and Y .
Solution:
206
Cov( X, Y )
ρ( X, Y ) = a
2
Var ( X ) = E X ´ ( E X )2

Var (Y ) = E Y 2 ´ ( E Y )2
ż1

E X = ( x )1dx
0
= 1/2
ż1

E X 2 = ( x2 )1dx
0
= 1/3
Var ( X ) = 1/3 ´ (1/2)2
= 1/12

E Y = E X2
= 1/3
2

E Y = E X4
ż1
= ( x4 )1dx
0
= 1/5
Var (Y ) = 1/5 ´ (1/3)2
= 4/45

Cov( X, Y ) = E XY ´ E( X ) E(Y )

= E X 3 ´ E ( X ) E (Y )
ż1
3

E X = ( x3 )1dx
0
= 1/4

Cov( X, Y ) = E X 3 ´ E( X ) E(Y )
= 1/4 ´ 1/2(1/3)
= 1/12
1/12
ρ( X, Y ) = ?
1/12 ˆ 4/45
= 0.968
There is a strong linear relationship between X and Y = X 2 .

Example 8.6.4
207
Properties of Expectation 8.7 Home Work
8.7 Ĳ Home Work

1. Let ( X, Y ) have the joint pmf given in Table:
X=1 X=2 X=3

Y=1 0.3 0.1
Y=2 0.2 0.2
Y=3 0.1 0.1
• Find the covariance σXY

• Find the correlation ρ XY . Interpret the results
2. Consider the following joint pmf, f (0, 0) = 1/12; f (1, 0) = 5/12; f (0, 1) = f (1, 1) =
3/12
f ( x, y) = 0 for all other values.

3. Let ( X, Y ) have the joint pd f that is given below:

12 2
7
= 0 otherwise

4. Let X and Y be continuous random variables with joint pdf

"
3x 0 ď y ď x ď 1
f ( x, y) =
0 Otherwise
Find Cov( X, Y ), Cor ( X, Y )
5. Let X and Y correspond to the horizontal and vertical coordinates in the triangle with
corners at (2, 0), (0, 2), and the origin. Let
15
f ( x, y) = ( xy2 + y) for (x, y) inside the triangle
28
= 0 otherwise

208
§§ Answers
1. -0.16; -0.3563
2. -1/12; -0.3535
3. 0.4048; -0.056
4. 0.01875; 0.397
5. -0.0986; -0.5958
209
210
Bibliography
[1] DeCoursey W. J., Statistics and Probability for Engineering Applications With Mi-
crosoft Excel, Newnes, 2003.
[2] Devore J. L., Probability and Statistics for Engineering & Sciences, Brooks/Cole, 2012.
[3] Forsyth D., Probability and Statistics for Computer Science, Springer, 2018.
[4] Hayter A., Probability and Statistics for Engineers & Scientists, Brooks/Cole, 2012.
[5] Montgomery, Douglas C., Runger George C., Applied Statistics and Probability for
Engineers, John Wiley & Sons, Inc, 2011.
[6] Mendenhall W., Beaver R.J., and Beaver, B. M., Introduction to Probability and Statis-
tics, 14th Edition , Brooks/Cole, 2013.
[7] Meyer P.L., Introductory Probability And Statistical Applications, Addison-Wesley,

1970
[8] Rice J. A., Mathematical Statistics and Data Analysis, 3rd Edition, 2007
[9] Ross S., A First Course in Probability, 9th Edition, 2012
[10] Ross S., Introduction to Probability Models, 10th Edition, 2010
[11] Ross S., Introduction to Probability and Statistics For Engineers And Scientists , 3rd
Edition, 2004
[12] Triola E., M., Elementary Statistics, Pearson Education, New York 2005.
[13] Walpole R. E., Myers, R. H., Myers, S. L. and Ye, K.,Probability and Statistics for
Engineers & Scientists, Brooks/Cole, 2012.
211
Index
Axioms, 27 Events, 10, 26

At least Events, 26
Bayes’ Theorem, 45 Dependent Events, 40
Birthday Paradox, 35 Disjoint Events, 26
Equally Likely Events, 26
Central Limit Theorem (CLT) Independent, 39
Sample Mean, 148 Null Events, 26
Sample Total, 147 Expectation, 57
Chebyshev Inequality, 144 Continuous, 103
Conditional Probability, 36 Discrete, 58
Continuity Correction, 156
Convolution, 174 Indicator Random Variable, 60
Continuous Case, 176
Joint Distribution, 160
Discrete Case, 174
Conditional
Correlation, 201
expectation properties, 195
Counting Rules, 13 Continuous
Combinations, 21 Conditional, 193
Multinomial Coefficient, 18 Expectation, 184
Multiplication Rule, 15 Discrete
Permutation, 17 Conditional, 190
Covariance, 199 Expectation, 183
Cumulative Distribution Function (cdf), 54 Independent, 190
Marginal pdf, 167
Distribution
Marginal pmf, 162
Bernoulli, 66 pdf, 166
Binomial, 67, 68 pmf, 160
Exponential, 133
Geometric, 80 Median, 60
Hypergeometric, 90 Memoryless Property
Negative Binomial, 84 Exponential Distribution, 139
Normal, 116 Geometric Distribution, 83
Poisson, 73, 76 Moment, 58
Uniform, 111 Moment Generating Function, 130
212
INDEX INDEX
Normal approximation
Binomial, 154
Odds, 34
pdf, 98
Poker Hand, 11
Probability
Equally-likely events, 33
Probability Laws
Complement Rule, 27
Inclusion-exclusion principle, 29
Law-of-Total Probability, 43
Multiplication Law, 38
Dependent Events, 39, 42
Independent Events, 39
Probability Mass Function (pmf), 52
Random Variable, 49
Continuous, 51
Discrete, 50
Randomness, 7
Sample Space, 9
Tree Diagram, 10
Variance
Continuous, 105
Discrete, 61
213

Course Ware

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Course Ware

Uploaded by

Copyright:

Available Formats

Mathematics 230

Probability 2021 Version

Probability Math 230 2021

This document was typeset on Tuesday 7th September, 2021.

• This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

• Links to the source files can be found at the text webpage

3 Basic Concepts & Laws of Probability 25

5.2.4 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6 Limit Theorems 143

7 Joint Distributions 159

8 Properties of Expectation 183

1.1 Ĳ Course Description

1.2 Ĳ Units of Instruction

b. Basic Concepts of Probability & Laws

c. Discrete Random Variables & Discrete Distributions

d. Continuous Random Variables & Continuous Distributions

Course Instructor Assignments

1.3 Ĳ Reference Materials

1. Sheldon Ross, A First Course in Probability, 9th Edition, 2012.

2. Sheldon Ross, Introduction to Probability Models, 10th Edition, 2010.

3. Freund, John, E. Modern Elementary Statistics, Twelfth Edition, Prentice-Hall, 2007.

4. TeXstudio 2.10.8 website: http://texstudio.sourceforge.net/.

Best 5 out of 7 Quizzes 35%

1.5 Ĳ Prerequisite Content

The following are a list of topics you may need to review:

Content for Review

1. Set Theory; DeMorgan’s Laws; Venn Diagram

2. A clear understanding of the cards in a standard deck of 52 playing cards

3. Basic understanding of differentiation & integration

1.6 Ĳ Detailed Syllabus

Navigation of content during the Semester

Week Topics Approximate Length

September 6 Introduction to Randomness; Combinatorics 2 weeks

September 20 Axioms of Probability 1 week

September 27 Conditional Probability & Independence 1 week

October 4 Discrete Random Variables, pmf, cdf, expectation,

October 11 Discrete Distributions examples, problems, exercises 3 weeks

November 01 Continuous Random Variables; pdf, cdf, expectation,

Uniform, Normal & Exponential Distributions examples,

November 22 Limit Theorems examples, problems, exercises 1 week

November 29 Jointly Distributed Random Variables 1 week

December 06 Properties of Expectations, Covariance & Correlations 1 week

2.1 Ĳ Randomness & Probability

Definition 2.1.1 (Randomness).

• Statistical uncertainties due to limited availability of data; a random sample

• Lack of knowledge of the behaviour of elements in real conditions

• The Cambridge Dictionary of Statistics defines random as governed by chance;

Some real life examples of Randomness are

Ĳ Why Study Probability: Background

§§ Why Study Probability: Real Life

§§ Why Study Probability: Sciences

• To answer a research question (involving a sample to draw a conclusion about some

Ĳ Link Between Set Theory & Probability Terminology

Definition 2.1.3 (sample space S).

Definition 2.1.4 (Tree Diagram).

A tree diagram is a graphical representation of Sample spaces. It is a picture

Tree Diagram for 3 Flips of a Fair Coin

Definition 2.1.5 (Events).

Random Experiment Sample Space Event: Example

Standard Deck of 52 Playing Cards

Definition 2.1.6 (Poker Hands).

Poker Hand Rankings

Poker hand rankings from strongest to weakest are given below:

2. As each person at the meeting shook hands with every otherperson,