Download as pdf or txt
Download as pdf or txt
You are on page 1of 192

Learning Module in EDA 101

Luciano M. Medrano Jr.

College of Engineering
President Ramon Magsaysay State University

January 3, 2021
Contents

Preface ix

I. Probability and Random Variables 1

1. Sample Spaces and Events 3


1.1. Random Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. Sample Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3. Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4. Counting Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2. Probability 13
2.1. Probability of an Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2. Addition Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3. Conditional Probability, Independence and the Product Rule . . . . . . . . . . 17
2.3.1. Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2. Product Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.3. Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4. Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3. Discrete Random Variables and Probability Distributions 29


3.1. Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2. Discrete Probability Distribution Functions . . . . . . . . . . . . . . . . . . . . 30
3.3. Cumulative Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4. Mean and Variance of a Random Variable . . . . . . . . . . . . . . . . . . . . . 34
3.5. Discrete Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6. Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.7. Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.8. Other Distributions (Optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.8.1. Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . 43
3.8.2. Negative Binomial and Geometric Distributions . . . . . . . . . . . . . . 44
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4. Continuous Probability Distributions 51


4.1. Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2. Continuous Probability Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3. Cumulative Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4. Mean and Variance of a Continuous Random Variable . . . . . . . . . . . . . . 54
4.5. Continuous Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.6. Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7. Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

iii
Contents

5. Joint Probability Distributions 69


5.1. Two Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1.1. Joint Probability Function . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1.2. Marginal Probability Distribution . . . . . . . . . . . . . . . . . . . . . 72
5.1.3. Conditional Probability Distribution . . . . . . . . . . . . . . . . . . . . 73
5.1.4. Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2. More Than Two Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3. Linear Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . 79
5.4. General Function of Random Variables . . . . . . . . . . . . . . . . . . . . . . . 81
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

II. Estimation, Statistical Inference and Model Verification 85

6. Point Estimation and Sampling Distribution 87


6.1. Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2. General Concepts of Point Estimation . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.1. Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2.2. Variance of a Point Estimator . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2.3. Standard Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2.4. Mean Squared Error of an Estimator . . . . . . . . . . . . . . . . . . . . 93
6.3. Sampling Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.1. Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.2. Sample Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3.3. Sample Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7. Statistical Intervals 105


7.1. Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.2. Confidence Interval for a Population Parameter . . . . . . . . . . . . . . . . . . 106
7.2.1. Population Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.2.2. Population Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2.3. Population Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.2.4. One-Sided Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . 115
7.3. Prediction Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.4. Tolerance Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.5. Confidence Interval Comparing Two Population Parameters . . . . . . . . . . . 120
7.5.1. Confidence Interval for the Difference of Two Means . . . . . . . . . . . 120
7.5.2. Confidence Interval for Difference of Two Proportions . . . . . . . . . . 125
7.5.3. Confidence Interval for the Ratio of Two Variances . . . . . . . . . . . . 126
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

8. Test of Hypothesis for a Single Population 131


8.1. Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8.1.1. Statistical Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8.1.2. Testing a Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.1.3. Connection Between Hypothesis Tests and Confidence Intervals . . . . . 137
8.1.4. General Procedure for Hypothesis Tests . . . . . . . . . . . . . . . . . . 137
8.2. Test on the Mean of a Population . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.2.1. Variance Known . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.2.2. Variance Unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.3. Test on the Population Proportion . . . . . . . . . . . . . . . . . . . . . . . . . 144

iv
Contents

8.4. Test on the Variance of a Population . . . . . . . . . . . . . . . . . . . . . . . . 146


Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

A. Statistical Tables 149


A-1. Cumulative Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . 149
A-2. Percentage Points t α,ν of the t Distribution . . . . . . . . . . . . . . . . . . . . 151
A-3. Percentage Points χ2α,ν of the Chi-Squared Distribution . . . . . . . . . . . . . 152
A-4. Critical Values of the F-distribution . . . . . . . . . . . . . . . . . . . . . . . . 153
A-5. Factors for Tolerance Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
A-6. Statistical Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

B. Statistical Calculations with Casio fx-570/991ES PLUS 163

C. Answers to Exercises 169

D. Proofs 171

Bibliography 177

Index 180

v
List of Figures
3.1. Probability mass function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2. Probability histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4. Discrete cumulative distribution function . . . . . . . . . . . . . . . . . . . . . 34
3.5. Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6. Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1. Probability density function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52


4.2. Continuous cumulative distribution function . . . . . . . . . . . . . . . . . . . . 53
4.3. Continuous uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.8. Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.10. Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.4. χ2 distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.4. Student’s t-distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


7.7. F density curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

8.4. R software printout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142


8.5. R software printout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

vii
Preface

Intended Audience
This is a self-learning module for the undergraduate students of the College of Engineering,
President Ramon Magsaysay State University. It is a compilation of materials from different
textbooks for a one-semester course in engineering data analysis.

Organization of the Module


This module is presented in two parts. The first part covers the essential theories of probability
and the distributions of a random variable and serves as the foundation for the development
of the second part. The second part deals with linear regression and statistical inference. A
criticism to this module is that it is very long. There is a great variety in both content and
level of these topics.
The first five chapters cover the basic concepts of sample and event spaces, probability,
discrete and continuous random variables, expected value, and joint probability distributions.
The mathematical and theoretical details are avoided to address the essential topics needed in
the development of statistical inference.
Chapter 6 begins the statistical inference with the sampling distribution, the central limit
theorem, and point estimation of parameters. This chapter also discusses some important
properties of estimators.
Chapter 7 deals with interval estimation. Topics include confidence intervals for the means,
variances, standard deviations and proportions, prediction intervals and tolerance intervals.
This chapter also covers interval estimation for two samples from different populations.
Chapter 8 and 9 discuss hypothesis tests for the means, proportions, and variances of one
sample and two samples from two different populations, respectively. Methods for determining
appropriate sample sizes are discussed to enable the students to solve real-life engineering
problems.
Chapters 10 and 11 present simple and multiple linear regression including model adequacy
checking and regression model diagnostics. Matrix algebra is used to present multiple linear
regressions. This enables the students without access to computer software or statistical
packages to perform multiple regression. Matrix calculation may be skipped if the student has
access to Microsoft Excel® , Minitab® , R software or other statistical packages. A tutorial on
how to use the free R software can be found at the appendix.
Chapter 12 deals with single- and 2-factor experiments. The notions of randomization,
blocking, two-factor designs and interactions are emphasized.

Featured in the Module

Key Definitions and Concepts Definition 6.5


Throughout the module, key definitions and con- If all unbiased estimators of θ are considered, the one with
cepts are enclosed in boxes to highlight their im- the smallest variance is called the minimum variance un-
portance. biased estimator (MVUE).

ix
Preface

Partitions Equations and Formulas


The number of ways of partitioning a set of n objects into Important equations are numbered on the right.
r cells with n1 elements in the first cell, n2 elements in the Named equations, expressions and formulas are
second, and so forth, is enclosed in boxes with appropriate titles.
 
n n!
= (1.8)
n1 , n2 , . . . , nr n1 !n2 ! · · · nr !

Red Links The tree diagram in Example 1.2 describes the sample space
The soft copy of this handout abound with of all combinations of the three messages. The size of the
numbers and texts in red. These are links to an sample space is equal to the number of branches in the low-
equation, example, exercise, table, figure, section est level of the tree, and this quantity equals 2 × 2 × 2 = 8.
or page number. Click the link to jump to that
particular location in the handout.

Example 2.5
John is going to graduate from an industrial engineering de- Example Problems
partment in a university by the end of the semester. After A set of example problems provides the stu-
being interviewed at two companies he likes, he assesses that dent with (detailed) solutions. Comments are
his probability of getting an offer from company A is 0.8, and included as interpretation of the numerical val-
his probability of getting an offer from company B is 0.6. If ues sought for in the problem.
he believes that the probability that he will get offers from both
companies is 0.5, what is the probability that he will get at least
one offer from these two companies?
Let A be the event that John will get an offer from company A,
B be the event that he will get an offer from company B. Then
A ∩ B is the event that he will get offers from both companies.
We have

P[A] = 0.8
P[B] = 0.6
P[A ∩ B] = 0.5
P[A ∪ B] = P[A] + P[B] − P[A ∩ B]
= 0.8 + 0.6 − 0.5 = 0.9

Based on his assessment, he has a 90% chance that he will get at


least one offer from the two companies.

Learning Objectives
Learning Objectives
Learning objectives at the start of each chapter
guide the students in what they are expected to At the end of this chapter, you should be able to:
take away from this chapter.
1. Apply counting techniques to calculate the probabil-
ities of events
2. Calculate the probabilities of joint events such as
unions and intersections from the probabilities of in-
dividual events
3. Interpret probabilities and use the probabilities of
outcomes to calculate probabilities of events in dis-
crete sample spaces
4. Interpret and calculate conditional probabilities of
events

x
Chapter Summary Chapter Summary
Chapters that are heavy on explanation, partic-
• A random variable X assigns a numerical value to each el-
ularly the later ones, have a chapter summary to
ement of the sample space S. It defines mutually exclusive
highlight specific ideas needed for the attainment
events that are exhaustive of S.
of the learning objectives. Important formulas
• Every discrete random variable has a corresponding prob- can be found here.
ability distribution called a probability mass function f (x)
which is the probability that the random variable X is
equal to the value x, written as P[X = x].
• A cumulative distribution function F (x) can be obtained
from a probability mass function f (x). Conversely, the
probability mass function can be determined from a cumu-
lative distribution function.
• The mean or expected value µX or E[X] is defined as
X
µ = E[X] = xf (x)
all x

Exercises
Each chapter has a sufficient number of exercises 3-2. A random experiment dom, list the elements of the
consists of flipping two coins. sample space S, using the
that cover the scope of the chapter topics.
Let the random variable X de- letters B and N for blem-
note the number of coins that ished and nonblemished, re-
landed face up. Enumerate spectively; then to each sam-
the outcomes of each event ple point assign a value x of
arising from X. the random variable X repre-
senting the number of automo-
3-3. In a random experiment, biles with paint blemishes pur-
two six-sided dice are rolled. chased by the agency.
Let Y be the total of the out-
comes of the dice. Enumer-
ate the outcomes of each event 3-5. Let W be a random vari-
arising from Y . able giving the number of
heads minus the number of
3-4. An overseas shipment of tails in three tosses of a coin.
5 foreign automobiles contains List the elements of the sam-
2 that have slight paint blem- ple space S for the three tosses
ishes. If an agency receives 3 of the coin and to each sample
of these automobiles at ran- point assign a value w of W .

To the Students
The topics included in this module are the minimum competency requirements in an Engineer-
ing Data Analysis course set by the Commission of Higher Education (CHED). It is therefore
expected that all topics will be covered in one semester.
Due to the effect of the Covid-19 pandemic, a regular face-to-face class cannot be conducted
to deliver the contents to you. This module is designed to deliver them via distance-learning.
This has an undesirable effect: the burden of teaching has now shifted toward the student.
To ease this burden, it is advised that you post your questions and clarifications of ideas
encountered in your self-study in the Edmodo class for online learning (class code j6frwt, join
url https://edmo.do/j/iwwzrn) or send them through email at lmedranojr@prmsu.edu.ph no
later than Wednesday of every week. This should allow your instructor to prepare additional
learning materials to be delivered on Friday via the Zoom app or as a soft copy (pdf).
The contents of this handout will be available in installment. New chapters will be added
at the end of the allocated period for the previous chapter.
Assessment shall be in the form of assignment, chapter test and major examinations.

xi
Part I.

Probability and Random Variables

1
1. Sample Spaces and Events
Learning Objectives

At the end of this chapter, you should be able to:

1. Understand and describe sample spaces and events for random experiments with graphs,
tables, lists, or tree diagrams

2. Use permutations and combinations to count the number of outcomes in both an event
and the sample space

1.1. Random Experiment


When we conduct a scientific experiment, we take measurements on characteristics of interest.
These characteristics can be the time of freefall of a ball bearing or the current in a copper
wire. Repeated measurements of the same experiment may differ slightly because of the
small variations in variables that are not controlled in our experiment. Consequently, this
experiment is said to have a random component. This random variation, however small, is
almost always present that its magnitude may be large enough that the logical conclusion from
our experiment cannot be drawn from the data.
Our goal therefore is to understand and quantify the type of variations and incorporate
them into our analysis to make informed judgments from our results that are not invalidated
by the variation.
Definition 1.1
An experiment that can result in different outcomes, even though it is repeated in the
same manner every time, is called a random experiment.

For the example of measuring current in a copper wire, variations in measurements of


current can be expected because of uncontrollable inputs like ambient temperature. Ohm’s
Law might be a suitable approximation for small variations in the measurements. However, if
the variations are large, we might need to extend our model to include the variation.

1.2. Sample Spaces


To model and analyze a random experiment, we must understand the set of possible outcomes
from the experiment. In this introduction to probability, we use the basic concepts of sets and
operations on sets. It is assumed that the reader is familiar with these topics.
Definition 1.2
The set of all possible outcomes of a random experiment is called the sample space of
the experiment. The sample space is denoted as S.

A sample space is often defined based on the objectives of the analysis. The following
example illustrates several alternatives.

3
1. Sample Spaces and Events

Example 1.1
Consider an experiment that selects a cell phone camera and records the recycle time of a
flash (the time taken to ready the camera for another flash).
The possible values for this time depend on the resolution of the timer and on the minimum
and maximum recycle times. However, because the time is positive it is convenient to define
the sample space as simply the positive side of the real number line

S = {x|x > 0}

If it is known that all recycle times are between 1.5 and 5 seconds, the sample space can be

S = {x|1.5 ≤ x ≤ 5.0}

If the objective of the analysis is to consider only whether the recycle time is low, medium,
or high, the sample space can be taken to be the set of three outcomes

S = {low, medium, high}

If the objective is only to evaluate whether or not a particular camera conforms to a mini-
mum recycle time specification, the sample space can be simplified to a set of two outcomes

S = {yes, no}

that indicates whether or not the camera conforms.

It is useful to distinguish between two types of sample spaces.


Definition 1.3
A sample space is discrete if it consists of a finite or countably infinite set of outcomes.
A sample space is continuous if it contains an interval (either finite or infinite) of real
numbers.

In Example 1.1, the choice S = {x|x > 0} is an example of a continuous sample space,
whereas S = {yes, no} is a discrete sample space. As mentioned, the best choice of a sample
space depends on the objectives of the study.

Sample spaces can also be described graphically with tree diagrams. When a sample space
can be constructed in several steps or stages, we can represent each of the n1 ways of completing
the first step as a branch of a tree. Each of the ways of completing the second step can be
represented as n2 branches starting from the ends of the original branches, and so forth.
Example 1.2
Each message in a digital communication system is classified as to whether it is received
within the time specified by the system design. If three messages are classified, use a tree
diagram to represent the sample space of possible outcomes.
Each message can be received either on time T or late L. The possible results for three
messages can be displayed by eight branches in the tree diagram below.

4
1.3. Events

Start

T L

T L T L

T L T L T L T L

{T T T } {T T L} {T LT } {T LL} {LT T } {LT L} {LLT } {LLL}

In the tree diagram, the particular branch {LT L} represents a late first message, an on time
second message, and a late third message.

1.3. Events
Often we are interested in a collection of related outcomes from a random experiment. Related
outcomes can be described by subsets of the sample space, and set operations can also be
applied.
Definition 1.4
An event is a subset of the sample space of a random experiment.

If no outcome belong to an event, then it is empty and is denoted by ∅.


We can also be interested in describing new events from combinations of existing events.
Because events are subsets, we can use basic set operations such as unions, intersections, and
complements to form other events of interest. Some of the basic set operations are summarized
here in terms of events:

• The union of two events is the event that consists of all outcomes that are contained in
either of the two events. We denote the union as E1 ∪ E2 .

• The intersection of two events is the event that consists of all outcomes that are con-
tained in both of the two events. We denote the intersection as E1 ∩ E2 .

• The complement of an event in a sample space is the set of outcomes in the sample
space that are not in the event. We denote the complement of the event E as E 0 .

Example 1.3
Consider the sample space {T T T, T T L, T LT, T LL, LT T, LT L, LLT, LLL} in Example 1.2.
Suppose that the subset of outcomes for which two messages are received on time is denoted
E1 , and the subset of outcomes for which the second message is late is denoted by E2 .

E1 = {T T L, T LT, LT T }
E2 = {T LT, T LL, LLT, LLL}
E1 ∪ E2 = {T T L, T LT, LT T, T LL, LLT, LLL}
E1 ∩ E2 = {T LT }
E10 = {T T T, T LL, LT L, LLT, LLL}

The mesages combination {T LT } is an outcome of both events E1 and E2 and is not listed
twice in the event E1 ∪ E2 .

5
1. Sample Spaces and Events

Definition 1.5
Two events, denoted as E1 and E2 , such that

E1 ∩ E2 = ∅

are said to be mutually exclusive.

Additional results involving events are summarized in the following. The definition of the
complement of an event implies that
0
E0 = E (1.1)

The distributive law for set operations implies that

(A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C)
(1.2)
(A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C)

DeMorgan’s laws imply that


(A ∪ B)0 = A0 ∩ B 0
(1.3)
(A ∩ B)0 = A0 ∪ B 0

1.4. Counting Techniques


In many of the examples in this chapter, it is easy to determine the number of outcomes in
each event. In more complicated examples, determining the outcomes in the sample space (or
an event) becomes more difficult. Instead, counts of the numbers of outcomes in the sample
space and various events are used to analyze the random experiments. Simple rules can be
used to simplify the calculations.
The tree diagram in Example 1.2 describes the sample space of all combinations of the three
messages. The size of the sample space is equal to the number of branches in the lowest level
of the tree, and this quantity equals 2 × 2 × 2 = 8.

Multiplication Rule

If an operation can be performed in n1 ways, and if for each of these a second operation
can be performed in n2 ways, and for each of the first two a third operation can be
performed in n3 ways, and so forth, then the sequence of k operations can be performed
in n1 · n2 · · · nk ways.

Example 1.4
The design for a website is to use one of the four colors, a font from among three, and three
different positions for an image. Calculate the number of web designs possible.
From the multiplication rule, 4 × 3 × 3 = 36 web designs are possible.

The use of the multipication rule and other counting techniques enables one to easily deter-
mine the number of outcomes in a sample space or event and this, in turn, allows probabilities
of events to be determined.
Another useful calculation finds the number of ordered sequences of the elements of a set.
Definition 1.6
A permutation of the elements of a set is an ordered sequence of the elements.

6
1.4. Counting Techniques

Permutation of Distinct Elements


The number of permutations of n different elements is n!, where

n! = n(n − 1)(n − 2) · · · 3 · 2 · 1 (1.4)

This result follows from the multiplication rule. A permutation can be constructed by
selecting the element to be placed in the first position of the sequence from the n elements,
then selecting the element for the second position from the remaining n − 1 elements, then
selecting the element for the third position from the remaining n − 2 elements, and so forth.
Permutations such as these are sometimes referred to as linear permutations.
Example 1.5
A food company has four different recipes for a potential new product and wishes to compare
them through consumer taste tests. In these tests, a participant is given the four types of
food to taste in a random order and is asked to rank various aspects of their taste. How
many different rankings of the four types of food are possible?
The four types of food are to be ranked (ordered sequence) by a participant. A permutation
of the four types gives 4! = 4 × 3 × 2 × 1 = 24 different rankings.

In some situations, we are interested in the number of arrangements of only some of the
elements of a set. The following result also follows from the multiplication rule.

Permutation of Subsets
The number of permutations of n distinct objects taken r at a time is
n!
n Pr = (1.5)
(n − r)!

Example 1.6
In one year, three awards (research, teaching, and service) will be given to a class of 25
graduate students in a statistics department. If each student can receive at most one award,
how many possible selections are there?
Since the awards are distinguishable, it is a permutation problem. The total number of
sample points is
25! 25!
25 P3 = = = 25 · 24 · 23 = 13,800
(25 − 3)! 22!

Example 1.7
A president and a treasurer are to be chosen from a student club consisting of 50 people.
How many different choices of officers are possible if

(a) there are no restrictions;


The total number of choices of officers is
50!
50 P2 = = 50 × 49 = 2,450
48!

(b) A will serve only if he is president;


Let E be the event that A will serve only if he is president. E consists of two events (or
situations): (i) E1 where A is the president, and (ii) E2 where A is neither the president
nor the treasurer, or simply A is not an officer. In E1 , there are 49 choices for the treasurer

7
1. Sample Spaces and Events

position. In E2 , the officers will be selected from a pool of 49 students (A will not be an
officer) so that there are 49 P2 = 2352 different choices for the positions. Therefore, the total
number of choices is 49 + 2352 = 2401.

(c) B and C will serve together or not at all;


Let E1 be the event that B and C serve together. The number of choices for them to be
officers is 2! = 2.
Let E2 be the event that B and C do not serve at all. The number of choices for the positions
is 48 P2 = 2256.
Therefore, the total number of choices is 2 + 2256 = 2258.

Counting (and probability) problems can have multiple solutions. Consider Example 1.7(b).
Student A can be the president, the treasurer, or a member of the club. These three situations
cover all the 2,450 possible choices of officers. Let E 0 be the event that A is the treasurer,
then E is the event that A is not the treasurer. The number of elements of E 0 is 49, which is
the number of choices for the position of president. Therefore, the number of choices for the
officers of the club (where A is president or member) is 2450 − 49 = 2401.
Permutations that occur by arranging objects in a circle are called circular permutations.
Two circular permutations are not considered different unless corresponding objects in the two
arrangements are preceded or followed by a different object as we proceed in a clockwise
direction. For example, if we rearrange the 12 numbers of a clock, one possible rearrangement
is the 3 on the north (12 o’clock) position and the numbers 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, and 2
follow in the clockwise order. But this can not be called a rearrangement since the clockwise
order is the same as the standard placement of the twelve numbers in the clock.

Circular Permutation
The number of permutations of n objects arranged in a circle is

(n − 1)! (1.6)

Example 1.8
Eight people are to be seated around a dining table. How many different arrangements are
possible

(a) with no restrictions;


The total number of arrangements is

(8 − 1)! = 7! = 5040

(b) if two people insist on sitting next to each other?


We solve this in several steps. The first step is to count the two as one group. There are
7 units (6 individuals, 1 group) to be arranged around the table. According to the circular
permutation rule, there are (7 − 1)! = 6! = 720 different arrangements.
The second step is to count the number of arrangements of the people within the group. In
this case, the number of ways is 2! = 2.
According to the multiplication rule, the total number of arrangements is

720 × 2 = 1440

Example 1.8(b) illustrates a strategy when certain elements group themselves. Each group
is counted as a unit in a (linear or circular) permutation, and the rearrangment of the elements

8
1.4. Counting Techniques

within each group contribute by a certain factor according to the multiplication rule.
So far we have considered permutations of distinct objects. That is, all the objects were
completely different or distinguishable. Sometimes we are interested in counting the number
of ordered sequences for objects that are not all different. The following result is a useful,
general calculation.

Permutation of Similar Objects

The number of permutations of n = n1 + n2 + · · · + nr objects of which n1 are of one


type, n2 are of a second type, . . ., and nr are of an rth type is
n!
(1.7)
n1 !n2 ! · · · nr !

Example 1.9
Code 39 is a common bar code system that consists of narrow and wide bars (black) sepa-
rated by either wide or narrow spaces (white). Each character contains nine elements (five
bars and four spaces). The code for a character starts and ends with a bar (either narrow
or wide) and a (white) space appears between each bar. The original specification (since
revised) used exactly two wide bars and one wide space in each character. For example, if
b and B denote narrow and wide (black) bars, respectively, and w and W denote narrow
and wide (white) spaces, a valid character is bwBwBW bwb (the number 6). One character
is held back as a start and stop delimiter. How many other characters can be coded by this
system? Can you explain the name of the system?
The four white spaces occur between the five black bars. In the first step, focus on the bars.
The number of permutations of five black bars when two are B and three are b is
5!
= 10
2!3!
In the second step, consider the white spaces. A code has three narrow spaces w and one
wide space W so there are
4!
=4
3!1!
possible locations for the wide space. Therefore, the number of possible codes is 10×4 = 40.
If one code is held back as a start/stop delimiter, then 39 other characters can be coded by
this system (and the name comes from this result).

Often we are concerned with the number of ways of partitioning a set of n objects into r
subsets called cells. A partition has been achieved if the intersection of every possible pair of
the r subsets is the empty set ∅ and if the union of all subsets gives the original set. The
order of the elements within a cell is of no importance.

Partitions
The number of ways of partitioning a set of n objects into r cells with n1 elements in
the first cell, n2 elements in the second, and so forth, is
!
n n!
= (1.8)
n1 , n2 , . . . , nr n1 !n2 ! · · · nr !

9
1. Sample Spaces and Events

Example 1.10
In how many ways can 7 graduate students be assigned to 1 triple and 2 double hotel rooms
during a conference?
The total number of possible partitions would be

7 7!
!
= = 210
3, 2, 2 3!2!2!

We can think of a partition as a permutation of similar objects. In the case of the 7


graduate students, we can reclassify them as T, B or D where T represents a student assigned
to a triple room, and B and D represent a student assigned to one of the double hotel rooms.
The rearrangement of the code consisting of 3 T ’s, 2 B’s and 2 D’s yield
7!
= 210
3!2!2!
different room asignments for the graduate students.
We can also count the number of ways the three rooms can be filled in this manner: (1) Assign
3 students to the triple room, (2) assign two (of the remaining 4) students to a double room,
and (3) assign the (remaining) 2 students to the other double room. By the multiplication
rule, the total number of assignments is
7 4 2
! ! !
= 35(6)(1) = 210
3 2 2
In many problems, the n objects are partitioned into two cells, r of these having a specific
trait and the remaining n − r not having that trait.
Definition 1.7
A partition of a set into two cells is called a combination. The number of possible
partitions into two sets is usually shortened to nr .


Combinations
The number of combinations of n distinct objects taken r at a time is
! !
n n n!
= = (1.9)
r, n − r r r!(n − r)!

The calculator notation for a combination is n Cr .


Example 1.11
A mother-participant samples eight food products and is asked to pick the best, the second
best, and the third best. She buys the three products she likes best to take home as
pasalubong for the family.

(a) How many different rankings are possible?


Only three of the eight products were ranked. Thus, the number of ways to rank them is
8!
8 P3 = = 336
5!

(b) How many different pasalubong are possible?


Since her three best choices are now reclassified as pasalubong, the other 5 products are

10
Exercises

considered not pasalubong. This is a partition into two cells, or a combination. The number
of ways of choosing a pasalubong of three items is
8!
8 C3 = = 56
3!5!
Example 1.12
A lot containing 7 components is sampled by a quality inspector; the lot contains 4 good
components and 3 defective components. A sample of 3 is taken by the inspector. How
many different ways can the inspector get

(a) all good components?


There are 43 = 4 ways to get all 3 samples to be in good condition.

(b) 2 defective components?


For this event to occur, the inspector must be able to get 2 (of 3) bad components and 1 (of
4) good component. There are 32 = 3 different ways to get two defective components and

4
1 = 4 ways to get good components. By multiplication rule, the total number of ways is
3 × 4 = 12.

Exercises
Give the sample space of each random experiment. 1-7. A wireless garage door opener has a code deter-
State whether it is discrete or continuous. mined by the up or down setting of 12 switches. How
many outcomes are in the sample space of the codes?
1-1. counting the number of hits (visits) in a day at
a high-volume website 1-8. A manufacturing process consists of 10 opera-
tions that can be completed in any order. How many
1-2. An order for a computer system can specify different production sequences are possible?
memory of 4, 8 or 12 gigabytes and disk storage of
300, 500 or 1000 gigabytes. 1-9. A manufacturing operation consists of 10 oper-
ations. However, five operations must be completed
1-3. Calls are repeatedly placed to a busy phone line before any of the remaining five assembly operations
until a connection is achieved. can begin. Within each set of five, operations can be
1-4. A sample of two items is selected without re- operated in any order. How many operation sequences
placement from a batch. Describe the ordered sample are possible?
space, if the batch contains the items labeled a, b, c, d. 1-10. A batch of 140 semiconductor chips is inspected
by choosing a sample of 5 chips. Assume 10 chips do
1-5. A set of two samples is selected from a batch.
not conform to customer requirements.
Describe the unordered sample space, if the batch con-
tains the items labeled a, b, c, d, e. a) How many different samples are possible?
b) How many samples of five contain exactly one
Determine the number of outcomes in each sample or non-conforming chip?
event space.
c) How many samples of five contain at least three
1-6. The following table summarizes 204 endothermic non-conforming chips?
reactions involving sodium bicarbonate. 1-11. In the design of an electromechanical product,
Final Temp 12 components are to be stacked in a cylindrical cas-
Conditions (K) Heat Absorbed (cal) ing in a manner that minimizes the impact of shocks.
Below Above One end of the casing is designated as bottom and the
Target Target other is the top.
266 12 40 a) If all components are different, how many different
271 44 16 designs are possible?
274 56 36
b) If seven components are identical to one another,
Let A denote the event that a reaction’s final temper- but the others are different, how many different
ature is 271 K or less. Let B denote the event that the designs are possible?
heat absorbed is below target. Determine the number c) If three components are of one type and identical
of reactions in each of the following events. to one another, and four components are of an-
a) A ∩ B b) A0 c) A ∪ B other type and identical to one another, but the
others are different, how many different designs
d) A ∪ B 0 e) A0 ∩ B 0 are possible?

11
1. Sample Spaces and Events

1-12. Consider the design of a communication system. 1-19. In how many different ways can a true-false test
a) How many three-digit phone prefixes that are used consisting of 9 questions be answered?
to represent a particular geographical area (such 1-20. In how many ways can 4 boys and 5 girls sit in
as an area code) can be created from the digits 0 a row if the boys and girls must alternate?
through 9?
1-21. In how many ways can 3 oaks, 4 pines, and 2
b) How many three-digit phone prefixes are possible
maples be arranged along a property line if one does
in which no digit appears more than once in each
not distinguish among trees of the same kind?
prefix?
c) As in part (a), how many three-digit phone pre- 1-22. If a multiple-choice test consists of 5 questions,
fixes are possible that do not start with 0 or 1, each with 4 possible answers of which only 1 is correct,
but contain 0 or 1 as the middle digit? a) in how many different ways can a student check
1-13. A bin of 50 parts contains 5 that are defective. off one answer to each question?
A sample of 10 parts is selected at random, without b) in how many ways can a student check off one
replacement. How many samples contain at least four answer to each question and get all the answers
defective parts? wrong?
1-14. A California study concluded that following 7 1-23. Find the number of ways that 6 teachers can
simple health rules can extend a person’s life by as be assigned to 4 sections of an introductory psychol-
much as 11 years. These 7 rules are: no smoking, get ogy course if no teacher is assigned to more than one
regular exercise, use alcohol only in moderation, get 7- section.
8 hours of sleep, maintain proper weight, eat breafast,
1-24. How many ways are there to select 3 candidates
and do not eat between meals. In how many ways can
from 8 equally qualified recent graduates for openings
a person adopt 5 of these rules to follow
in an accounting firm?
a) if the person presently violates all 7 rules?
1-25. In how many ways can 5 different trees be
b) if the person never drinks and always eats break-
planted in a circle?
fast?
1-15. Four married couples have bought 8 seats in the 1-26. A contractor wishes to build 9 houses, each dif-
same row for a concert. In how many different ways ferent in design. In how many ways can he place these
can they be seated houses on a street if 6 lots are on one side of the street
and 3 lots are on the opposite side?
a) with no restrictions?
b) if each couple is to sit together? 1-27. How many bridge hands (consisting of 13 cards)
c) if all the men sit together to the right of all the are possible containing 4 spades, 6 diamonds, 1 club
women? and 2 hearts?
1-16. How may three-digit numbers can be formed 1-28. The following circuit operates if and only if
from the digits 0 through 6 if each digit can be used there is a path of functional devices from left to right.
only once? Let the event/outcome {ABcDe} denote that devices
b) How many of these are odd numbers? 1, 2 and 4 are functional (uppercase A, B and D re-
c) How many are greater than 330? spectively) and devices 3 and 5 are not functional (low-
ercase e and d, respectively).
1-17. Six people lined up to get on a bus. In how
many different ways are possible 1 2
a) without restriction on these people? l r
b) if 3 specific persons, among 6, insist on following 3 4 5
each other?
c) if 2 specific persons, among 6, refuse to follow each a) Enumerate the outcomes of the sample space con-
other? sisting of all possible combinations of the five de-
1-18. A certain brand of shoes comes in 5 different
vices.
styles, with each style available in 4 distinct colors. If b) How many of those outcomes have the circuit op-
the store wishes to display pairs of these shoes showing erational?
all of its various styles and colors, how many different c) How many of those outcomes render the circuit
pairs will the store have on display? not functional?

12
2. Probability
Learning Objectives
At the end of this chapter, you should be able to do the following:
1. Apply counting techniques to calculate the probabilities of events
2. Calculate the probabilities of joint events such as unions and intersections from the
probabilities of individual events
3. Interpret probabilities and use the probabilities of outcomes to calculate probabilities of
events in discrete sample spaces
4. Interpret and calculate conditional probabilities of events
5. Determine the independence of events and use independence to calculate probabilities
6. Use Bayes’ theorem to calculate conditional probabilities

2.1. Probability of an Event


Perhaps it was humankind’s unquenchable thirst for gambling that led to the early devel-
opment of probability theory. In an effort to increase their winnings, gamblers called upon
mathematicians to provide optimum strategies for various games of chance. Some of the math-
ematicians providing these strategies were Pascal, Leibniz, Fermat, and James Bernoulli. As
a result of this development of probability theory, statistical inference, with all its predictions
and generalizations, has branched out far beyond games of chance to encompass many other
fields associated with chance occurrences, such as politics, business, weather forecasting, and
scientific research. For these predictions and generalizations to be reasonably accurate, an
understanding of basic probability theory is essential.
Probability is used to quantify the likelihood, or chance, that an outcome of a random
experiment will occur. “The chance of rain today is 30%” is a statement that quantifies our
feeling about the possibility of rain. The likelihood of an outcome is quantified by assigning
a number from the interval [0, 1] to the outcome (or a percentage from 0 to 100%). Higher
numbers indicate that the outcome is more likely than lower numbers. A zero (0) indicates an
outcome will not occur. A probability of 1 indicates that an outcome will occur with certainty.
Probabilities for a random experiment are often assigned on the basis of a reasonable model
of the system under study. One approach is to base probability assignments on the simple
concept of equally likely outcomes. When the model of equally likely outcomes is assumed,
the probabilities are chosen to be equal.
Definition 2.1 (Equally Likely Outcomes)

Whenever a sample space consists of N possible outcomes that are equally likely, the
probability of each outcome is 1/N .

Definition 2.2
For a discrete sample space, the probability of an event E, denoted as P[E], equals
the sum of the probabilities of the outcomes in E.

13
2. Probability

Example 2.1
A coin is tossed twice. What is the probability that at least 1 head occurs?
Here, S = {HH, HT, T L, T T } for the experiment. By assuming the model of equally likely
outcomes,
1
P[{HH}] = P[{HT }] = P[{T H}] = P[{T T }] =
4
Let E be the event that at least 1 head occurs, then E = {HH, HT, T H} and
1 1 1 3
P[E] = P[{HH}] + P[{HT }] + P[{T H}] = + + =
4 4 4 4
A typical probability problem consists of determining the outcomes of the experiment and
the outcomes of a particular event. You will find the counting techniques of Section 1.4 to be
useful in determining the probability of an event.
Definition 2.3
For a random experiment with a finite sample space S, the probability of an event E is
the fraction
n(E)
P[E] = (2.1)
n(S)
where n(E) and n(S) denote the number of outcomes in the event space E and sample
space S.

Example 2.2
A statistics class for engineers consists of 25 industrial, 10 mechanical, 10 electrical, and 8
civil engineering students. If a person is randomly selected by the instructor to answer a
question, find the probability that the student chosen is

(a) an industrial engineering major;


The size of the sample space is n(S) = 25 + 10 + 10 + 8 = 53. If I is the event that the
student randomly selected is an industrial engineering major, n(I) = 25 and

n(I) 25
P[I] = =
n(S) 53

(b) a civil engineering or an electrical engineering major.


If C ∪ E is the event that the student randomly selected is a civil engineering or an electrical
engineering major, n(C ∪ E) = 8 + 10 = 18 and
18
P[C ∪ E] =
53
Example 2.3
In a poker hand consisting of 5 cards, find the probability of holding 2 aces and 3 jacks.
The number of ways of being dealt with 2 aces from 4 cards is
4 4!
!
= =6
2 2!2!
and the number of ways of being dealt with 3 jacks from 4 cards is
4 4!
!
= =4
3 3!1!

14
2.2. Addition Rules

The number of ways of being dealt with 2 aces and 3 jacks is 6×4 = 24 by the multiplication
rule.
The total number of a poker hand consisting of any 5 cards from 52 cards is

52 52!
!
= = 2,598,960
5 5!47!

Therefore the desired probability is


24
= 9.2345 × 10−6
2,598,960

Interpretation: Since the probability of this poker hand (called a full house) is small (occuring nine
times in a million), then this hand is rare and considered a strong poker hand.

Example 2.4
A bin of 50 manufactured parts contains 3 defective parts and 47 nondefective parts. A
sample of 6 parts is selected from the 50 parts without replacement. That is, each part can
be selected only once, and the sample is a subset of the 50 parts. What is the probability
that exactly 2 defective parts are selected in the sample?
The sample space consists of all possible (unordered) subsets of 6 parts selected without
replacement,
50
!
n(S) = = 15,890,700
6

The number of subsets of the sample that contain 2 defective parts is 32 = 3 and the


number of subsets of the sample that contain 4 nondefective parts is 47 4 = 178,365




Therefore, the probability that exactly 2 defected parts are selected is


3 × 178,365
= 0.0337
15,890,700

If the outcomes of an experiment are not equally likely to occur, the probabilities must be
assigned on the basis of prior knowledge or experimental evidence. For example, if a coin is
not balanced, we could estimate the probabilities of heads and tails by tossing the coin a large
number of times and recording the outcomes. According to the relative frequency definition
of probability, the true probabilities would be the fractions of heads and tails that occur in
the long run.
In most of the applications of probability in this book, the relative frequency interpretation
of probability is the operative one. Its foundation is the statistical experiment rather than
subjectivity (the use of intuition, personal beliefs, and other indirect information in arriving at
probabilities), and it is best viewed as the limiting relative frequency. As a result, many
applications of probability in science and engineering must be based on experiments that can
be repeated.

2.2. Addition Rules


Now that the probability of an event has been defined, we can collect the assumptions that we
have made concerning probabilities into a set of axioms that the probabilities in any random
experiment must satisfy. The axioms ensure that the probabilities assigned in an experiment
can be interpreted as relative frequencies and that the assignments are consistent with our
intuitive understanding of relationships between relative frequencies. For example, if event A is
contained in event B, we should have P[A] ≤ P[B]. The axioms do not determine probabilities;

15
2. Probability

the probabilities are assigned based on our knowledge of the system under study. However,
the axioms enable us to easily calculate the probabilities of some events from knowledge of the
probabilities of other events.

Axioms of Probability

If S is the sample space of a random experiment and E is an event of S,

(1) P[S] = 1

(2) 0 ≤ P[E] ≤ 1

(3) P[∅] = 0

The following formulas are the addition rules for probability.

Addition Rules
• For any events A and B of a sample space S,

P[A ∪ B] = P[A] + P[B] − P[A ∩ B] (2.2)

• If A and B are exclusive events of a sample space S,

P[A ∪ B] = P[A] + P[B] (2.3)

• For any event E,


P[E] + P E 0 = 1 (2.4)
 

• If E1 , E2 , . . . , Ek are k mutually exclusive events of a sample space S,

P[E1 ∪ E2 ∪ · · · ∪ Ek ] = P[E1 ] + P[E2 ] + · · · + P[Ek ] (2.5)

The collection of events {E1 , E2 , . . . , Ek } of a sample space S is called exhaustive of S if


E1 , E2 , . . . , Ek are mutually exclusive and E1 ∪ E2 ∪ · · · ∪ Ek = S.
Example 2.5
John is going to graduate from an industrial engineering department in a university by the
end of the semester. After being interviewed at two companies he likes, he assesses that his
probability of getting an offer from company A is 0.8, and his probability of getting an offer
from company B is 0.6. If he believes that the probability that he will get offers from both
companies is 0.5, what is the probability that he will get at least one offer from these two
companies?
Let A be the event that John will get an offer from company A, B be the event that he
will get an offer from company B. Then A ∩ B is the event that he will get offers from both
companies. We have

P[A] = 0.8
P[B] = 0.6
P[A ∩ B] = 0.5
P[A ∪ B] = P[A] + P[B] − P[A ∩ B]
= 0.8 + 0.6 − 0.5 = 0.9

16
2.3. Conditional Probability, Independence and the Product Rule

Based on his assessment, he has a 90% chance that he will get at least one offer from the two
companies.

Example 2.6
In a high school graduating class of 100 students, 54 studied mathematics, 69 studied history,
and 35 studied both mathematics and history. If one of these students is selected at random,
find the probability that

(a) the student took mathematics or history;


Let M and H denote that a student studied mathematics and history respectively.

n(S) = 100
n(M ) = 54 P[M ] = 0.54
n(H) = 69 P[H] = 0.69
n(M ∩ H) = 35 P[M ∩ H] = 0.35

Therefore, P[M ∪ H] = P[M ] + P[H] − P[M ∩ H] = 0.54 + 0.69 − 0.35 = 0.88.

(b) the student did not take either of these subjects;


This event is (M ∪ H)0 .

P (M ∪ H)0 = 1 − P[M ∪ H] = 1 − 0.88 = 0.12


 

(c) the student took history but not mathematics.


This event is H ∩ M 0 . The event H is a partition of two events H ∩ M (both history and
math) and H ∩ M 0 (studied history but not math), and we can write

P[H] = P[H ∩ M ] + P H ∩ M 0
 

for which
P H ∩ M 0 = P[H] − P[H ∩ M ] = 0.69 − 0.35 = 0.34
 

More complicated probabilities, such as P[A ∪ B ∪ C] , can be determined by repeated use


of (2.2) and by using some basic set operations. The probability for a union of 3 sets is stated
below.
Addition Rule for 3 Sets
For events A, B and C of a sample space S,

P[A ∪ B ∪ C] = P[A] + P[B] + P[C] − P[A ∩ B]


− P[A ∩ C] − P[B ∩ C] + P[A ∩ B ∩ C] (2.6)

2.3. Conditional Probability, Independence and the Product Rule


2.3.1. Conditional Probability
Sometimes probabilities need to be reevaluated as additional information becomes available.
A useful way to incorporate additional information into a probability model is to assume that
the outcome that will be generated is a member of a given event. This event, say A, defines
the conditions that the outcome is known to satisfy. Then probabilities can be revised to
include this knowledge. In some applications, the practitioner is interested in the probability

17
2. Probability

structure under certain restrictions. For instance, in epidemiology, rather than studying the
chance that a person from the general population has diabetes, it might be of more interest to
know this probability for a distinct group such as Asian women in the age range of 35 to 50
or Hispanic men in the age range of 40 to 60. This type of probability is called a conditional
probability.
Definition 2.4
The probability of an event B under the knowledge that the outcome will be in event A
is denoted as
P B A
 

and this is the conditional probability of B, given A.

Suppose that our sample space S is the population of adults in a small town who have
completed the requirements for a college degree. We shall categorize them according to gender
and employment status. The data are given in Table 2.1.

Table 2.1: Categorization of Adults in a Small Town


Employed Unemployed Total
Male 460 40 500
Female 140 260 400
Total 600 300 900

One of these individuals is to be selected at random for a tour throughout the country to
publicize the advantages of establishing new industries in the town. We shall be concerned
with the following events:

M : a man is chosen;

E : the chosen is employed

The probability that a man is chosen at random is


500 5
P[M ] = =
900 9
Using the reduced sample space E, we find that
460 23
P M E = =
 
600 30
In determining P[M ], we use the Total column since no additional information is given. In
P M E , the probability that the person randomly selected is a man when it is known that
 

the person is employed, the Employed column is used to compute the probability. In a similar
manner, we can compute P E M by using only the values in the Male row.


460 23
P E M = =
 
500 25

Conditional Probability

The conditional probability of B given A, is defined as

P[B ∩ A]
P B A = (2.7)
 
P[A]

18
2.3. Conditional Probability, Independence and the Product Rule

Let us refer to Table 2.1 to determine P[M |E].

P[M ∩ E]
P M E =
 
P[E]

where P[M ∩ E] and P[E] are found from the sample space S.

n(M ∩ E) 460
P[M ∩ E] = =
n(S) 900
n(E) 600
P[E] = =
n(S) 900

We note that in Table 2.1, n(M ∩ E) is the number of employed male. Hence,

460/900 460 23
P M E = = =
 
600/900 600 30
as before.
Example 2.7
The probability that a regularly scheduled flight departs on time is P[D] = 0.83; the proba-
bility that it arrives on time is P[A] = 0.82; and the probability that it departs and arrives
on time is P[D ∩ A] = 0.78. Find the probability that a plane

(a) arrives on time, given that it departed on time,


The probability that it arrives on time, knowing that it departed on time, is

P[A ∩ D] 0.78
P A D = = = 0.94
 
P[D] 0.83

(b) departed on time, given that it has arrived on time.


The probability that a plane departed on time, having known that it arrived on time, is

P[D ∩ A] 0.78
P D A = = = 0.95
 
P[A] 0.82

2.3.2. Product Rule


The conditional probability definition in Equation (2.7) can be rewritten to provide a formula
known as the multiplication rules for probabilities.

Multiplication Rule

P[B ∩ A] = P B A P[A] (2.8)


 

We can also write Equation (2.8) as

P[B ∩ A] = P A B P[B]
 

since P[A ∩ B] = P[B ∩ A].


Example 2.8
The probability that the first stage of a numerically controlled machining operation for
high-rpm pistons meets specifications is 0.90. Failures are due to metal variations, fixture
alignment, cutting blade condition, vibration, and ambient environmental conditions. Given

19
2. Probability

that the first stage meets specifications, the probability that a second stage of machining
meets specifications is 0.95. What is the probability that both stages meet specifications?
Let E1 and E2 denote the events that the first and second stages meet specifications, re-
spectively. The probability requested is

P[E1 ∩ E2 ] = P E2 E1 P[E1 ] = 0.95 × 0.9 = 0.855


 

Although it is also true that P[E1 ∩ E2 ] = P E1 E2 P[E2 ], the information provided in the
 

problem does not match this second formulation.


The multiplication rule for probability can be extended to more than two events. The
formula below gives the probability for the intersection of three events,

Multiplication Rule (3 events)

P[A ∩ B ∩ C] = P[A] P B A P C A ∩ B (2.9)


   

2.3.3. Independence

In some cases, the conditional probability of P B A might equal P[B]. In this special case,
 

knowledge that the outcome of the experiment is in event A does not affect the probability
that the outcome is in event B.
Example 2.9
Consider the inspection described in Example 2.4. Six parts are selected randomly from a
bin of 50 parts, but assume that the selected part is replaced before the next one is selected.
The bin contains 3 defective parts and 47 nondefective parts. What is the probability that
the second part is defective given that the first part is defective?
In shorthand notation, the requested probability is P B A , where A and B denote the
 

events that the first and second parts are defective, respectively. Because the first part is
replaced prior to selecting the second part, the bin still contains 50 parts, of which 3 are
defective. Therefore, the probability of B does not depend on whether or not the first part
is defective. That is,
3
P B A =
 
50
Also, the probability that both parts are defective is
P[B ∩ A] = P B A P[A] = =
 3 3 9
50 × 50 2500

The preceding example illustrates the following conclusions. In the special case that P B A =
 

P[B], we obtain

P[A ∩ B] = P B A P[A] = P[B] P[A]


 

P[A ∩ B] P[B] P[A]


P A B = = = P[A]
 
P[A] P[A]

These conclusions lead to an important definition.

20
2.3. Conditional Probability, Independence and the Product Rule

Definition 2.5
Two events are independent if any one of the following statements is true:

(1) P B A = P[B]
 

(2) P A B = P[A]
 

(3) P[A ∩ B] = P[A] P[B]

Furthermore, if A and B are independent events, the following pairs of events are also
independent:
(i) A0 and B
(ii) A and B 0
(iii) A0 and B 0
Consequently,

P A0 ∩ B = P A0 P[B]
   

P A ∩ B 0 = P[A] P B 0
   

P A0 ∩ B 0 = P A0 P B 0
     

The concept of independence is an important relationship between events and is used


throughout this handout. A mutually exclusive relationship between two events is based only
on the outcomes that compose the events. However, an independence relationship depends on
the probability model used for the random experiment. Often, independence is assumed to be
part of the random experiment that describes the physical system under study.
Example 2.10 (Series Circuit)
The following circuit operates only if there is a path of functional devices from left to right.
The probability that each device functions is shown on the graph. Assume that devices fail
independently. What is the probability that the circuit operates?

0.8 0.9
Let L and R denote the events that the left and right devices operate, respectively. There
is a path only if both operate. The probability that the circuit operates is

P[L ∩ R] = P[L] P[R] = 0.8(0.9) = 0.72

Example 2.11 (Parallel Circuit)


The following circuit operates only if there is a path of functional devices from left to right.
The probability that each device functions is shown on the graph. Assume that devices fail
independently. What is the probability that the circuit operates?

0.8
a b
0.9
Let T and B denote the events that the top and bottom devices operate, respectively. There
is a path if at least one device operates. The probability that the circuit operates is

P[T ∪ B] = P[T ] + P[B] − P[T ∩ B]


= P[T ] + P[B] − P[T ] P[B]

21
2. Probability

= 0.8 + 0.9 − 0.8(0.9) = 0.98

by virtue of independence.

DeMorgan’s law can also be used to compute P[T ∪ B].

P[T ∪ B] = 1 − P (T ∪ B)0 (Eq. 2.4)


 

= 1 − P T 0 ∩ B0 (Eq. 1.3)
 
 0  0
=1−P T P B (Ind. Prop. 3)
= 1 − (0.2)(0.1) = 1 − 0.02 = 0.98

When considering three or more events, we can extend the definition of independence with
the following general result.

Independence (Multiple Events)

The events E1 , E2 , . . . , Ek are independent if and only if

P[E1 ∩ E2 ∩ · · · ∩ Ek ] = P[E1 ] P[E2 ] · · · P[Ek ] (2.10)

Example 2.12 (Advanced Circuit)


The following circuit operates only if there is a path of functional devices from left to right.
The probability that each device functions is shown on the graph. Assume that devices fail
independently. What is the probability that the circuit operates?

0.9
0.95
a 0.9 0.99 b
0.95
0.9

The solution can be obtained from a partition of the graph into three columns. Let L =
L1 ∪ L2 ∪ L3 denote the event that there is a path of functional devices only through the
three units on the left.

P[L] = P[L1 ∪ L2 ∪ L3 ] = 1 − P (L1 ∪ L2 ∪ L3 )0


 

= 1 − P L01 ∩ L02 ∩ L03


 

= 1 − P L01 P L02 P L03 = 1 − (0.1)3 = 0.999


     

Similarly, let M = M1 ∪ M2 denote the event that there is a path of functional devices only
through the two units in the middle.

P[M ] = P[M1 ∪ M2 ] = 1 − P (M1 ∪ M2 )0 = 1 − P M10 ∩ M20


   

= 1 − P M10 P M20 = 1 − (0.05)2 = 0.9975


   

The probability that there is a path of functional devices only through the one unit on the
right is simply the probability that the device functions, namely, P[R] = 0.99. Therefore,
with the independence assumption used again, the solution is

P[L ∩ M ∩ R] = P[L] P[M ] P[R] = 0.999(0.9975)(0.99) = 0.987

22
2.4. Bayes’ Theorem

2.4. Bayes’ Theorem


Sometimes the probability of an event is given under each of several conditions. With enough
of these conditional probabilities, the probability of the event can be recovered. For any event
B, we can write B as the union of the part of B in A and the part of B in A0 . That is,
B = (B ∩ A) ∪ (B ∩ A0 )
Because A and A0 are mutually exclusive, A ∩ B and A0 ∩ B are mutually exclusive. Therefore,
from the probability of the union of mutually exclusive events in Equation (2.3) and the
multiplication rule in Equation (2.8) the following total probability rule is obtained.

Total Probability Rule

P[B] = P B A P[A] + P B A0 P A0 (2.11)


     

For example, in Table 2.1, the event M , that a man is chosen, is a union of two events: (1)
employed male E ∩ M , and (2) unemployed male E 0 ∩ M . Thus,
P[M ] = P[E ∩ M ] + P E 0 ∩ M = P M E P[E] + P M E 0 P E 0
       

460 600 40 300 5


= × + × =
600 900 300 900 9
The result of the above computation can be verified easily from the table.
Example 2.13
One bag contains 4 white balls and 3 black balls, and a second bag contains 3 white balls
and 5 black balls. One ball is drawn from the first bag and placed unseen in the second
bag. What is the probability that a ball now drawn from the second bag is black?
Let B1 , B2 , and W1 represent, respectively, the drawing of a black ball from bag 1, a black
ball from bag 2, and a white ball from bag 1. We are interested in the union of the mutually
exclusive events B1 ∩ B2 and W1 ∩ B2 . The various possibilities and their probabilities are
illustrated in the probability tree diagram.
P[B2 ] = P[(B1 ∩ B2 ) ∪ (W1 ∩ B2 )]
= P[B1 ∩ B2 ] + P[W1 ∩ B2 ]
= P[B1 ] P[B2 |B1 ] + P[W1 ] P[B2 |W1 ]
3 6 4 5 38
   
= + =
7 9 7 9 63

P (B1 ∩ B2 ) = 3
7 · 6
9
B
6
Bag 2 9
3W, 6B W
B
3
3 9 P (B1 ∩ W2 ) = 3
7 · 3
9
Bag 1 7

4W, 3B
W P (W1 ∩ B2 ) = 4
7 · 5
9
B
4
5
7 Bag 2 9
4W, 5B W
4
9 P (W1 ∩ W2 ) = 4
7 · 4
9
Probability tree diagram

23
2. Probability

The reasoning used to develop Equation (2.11) can be applied more generally.

Total Probability Rule (3 or more events)

If the events E1 , E2 , . . . , Ek are mutually exclusive and exhaustive of S, then

P[B] = P B E1 P[E1 ] + P B E2 P[E2 ] + · · · + P B Ek P[Ek ] (2.12)


     

Example 2.14
Assume the following probabilities for product failure subject to levels of contamination in
a semiconductor manufacturing:

Probability of Failure Level of Contamination


0.10 High
0.01 Medium
0.001 Low

In a particular production run, 20% of the chips are subjected to high levels of contami-
nation, 30% to medium levels of contamination, and 50% to low levels of contamination.
What is the probability that a product using one of these chips fails?
We are given the following:

P[H] = 0.2, P[M ] = 0.3, P[L] = 0.5

where H, M and L are events representing High, Medium and Low levels of contamination.
If F is the event that a randomly selected product is using one of these chips fails,

P[F ] = P F H P[H] + P F M P[M ] + P F L P[L]


     

= 0.1(0.2) + 0.01(0.3) + 0.001(0.5) = 0.0235

which can be interpreted as just the weighted average of the three probabilities of failure.

The examples in this chapter indicate that information is often presented in terms of con-
ditional probabilities. These conditional probabilities commonly provide the probability of an
event (such as failure) given a condition (such as high or low contamination). But after a
random experiment generates an outcome, we are naturally interested in the probability that
a condition was present (high contamination) given an outcome (a semiconductor failure).
Thomas Bayes addressed this essential question in the 1700s and developed the fundamental
result known as Bayes’ theorem.
From the definition of conditional probability,

P[A] P B A = P[A ∩ B] = P[B ∩ A] = P[B] P A B


   

for which we can write


P[A] P
 
B A
P A B = (2.13)
 
P[B]
This is a useful result that enables us to solve for P A B in terms of P B A .
   

If the events A1 , A2 , . . . , Ak are mutually exclusive and exhaustive of S, then for any event
Ai ,
P[Ai ] P B Ai
 
P Ai B =
 

P[B]
Expanding P[B] using Equation (2.12) we obtain the general result, which is known as Bayes’
theorem.

24
2.4. Bayes’ Theorem

Bayes’ Theorem

P[Ai ] P B Ai
 
P Ai B = (2.14)
 
P[A1 ] P B A1 + P[A2 ] P B A2 + · · · + P[Ak ] P B Ak
     

Notice that the numerator always equals one of the terms in the sum in the denominator.
Example 2.15
Because a new medical procedure has been shown to be effective in the early detection of
an illness, a medical screening of the population is proposed. The probability that the test
correctly identifies someone with the illness as positive is 0.99, and the probability that the
test correctly identifies someone without the illness as negative is 0.95. The incidence of
the illness in the general population is 0.0001. You take the test, and the result is positive.
What is the probability that you have the illness?
Let I denote the event that a person has the illness, and let T denote the event that the
test signals positive. The event T I isthe event that the test indicates illness of an actually

ill person. This probability is P T I = 0.99. The event T 0 |I 0 is the event that the test
indicates a well person who is indeed well, and this probability is P T 0 I 0 = 0.95. From
 

these, we determine the following conditional probabilities.

P T 0 I = 1 − P T I = 1 − 0.99 = 0.01
   

P T I 0 = 1 − P T 0 I 0 = 1 − 0.95 = 0.05
   

You took the test, and the result is positive, and you want to determine the probability
that you have the illness. This probability is

P[I ∩ T ]
P I T = (Eq. 2.7)
 
P[T ]
P[I] P T I
 
= (Eq. 2.8)
P[T ]
P[I] P T I
 
= (Eq. 2.11)
P[I] P T I + P[I 0 ] P T I 0
   

The incidence of the ilness in the general population is 0.0001. This is equivalent to P[I] =
0.0001 and P[I 0 ] = 1 − P[I] = 0.9999. Therefore,

0.0001(0.99)
P I T = = 0.002
 
0.0001(0.99) + 0.9999(0.95)

Practical Interpretation: The probability of your having the illness given a positive result  from
the test is only 0.002. Surprisingly, even though the test is effective, in the sense that P T I is


high and P T I is low, because the incidence of the illness in the general population is low, the
 0

chances are quite small that you actually have the disease even if the test is positive.

Example 2.16 (Bayesian Network)


Bayesian networks are used on the Web sites of high-technology manufacturers to allow
customers to quickly diagnose problems with products. Here is an oversimplified example.
A printer manufacturer obtained the following probabilities from a database of test results.
Printer failures are associated with three types of problems: hardware, software, and other
(such as connectors) with probabilities of 0.1, 0.6, and 0.3, respectively. The probability
of a printer failure given a hardware problem is 0.9, given a software problem is 0.2, and

25
2. Probability

given any other type of problem is 0.5. If a customer enters the manufacturer’s Web site to
diagnose a printer failure, what is the most likely cause of the problem?
Let A, B and C be events representing hardware, software, and other printer problems
respectively, where
P[A] = 0.1, P[B] = 0.6, P[C] = 0.3
If F is the event of a printer failure,

P F A = 0.9, P F B = 0.2, P F C = 0.5


     

and,

P[F ] = P[A] P F A + P[B] P F B + P[C] P F C


     

= 0.1(0.9) + 0.6(0.2) + 0.3(0.5) = 0.36

Thus,

P[A] P 0.09 3
 
F A
P A F = = =
 
P[F ] 0.36 12
P[B] P 0.12 4
 
F B
P B F = = =
 
P[F ] 0.36 12
P[C] P F C 0.15 5
 

P C F = = =
 
P[F ] 0.36 12

Therefore, problems other than hardware and software are the most likely cause of printer
failure.

Exercises
Probability the sample space for the different amounts of money?
2-1. An injection-molded part is equally likely to be Assign probabilities to the sample points and then find
obtained from any one of the eight cavities on a mold. the probability that the first envelope purchased con-
tains less than $100.
a) What is the sample space?
2-4. A pair of fair dice is tossed. Find the probability
b) What is the probability that a part is from cavity
of getting
1 or 2?
a) a total of 8;
c) What is the probability that a part is from neither
cavity 3 nor 4? b) at most a total of 5.
2-5. If 3 books are picked at random from a shelf con-
2-2. Samples of emissions from three suppliers are
classified for conformance to air-quality specifications. taining 5 novels, 3 books of poems, and a dictionary,
The results from 100 samples are summarized as fol- what is the probability that
lows: a) the dictionary is selected?
Conforms b) 2 novels and 1 book of poems are selected?
Yes No 2-6. The probabilities that a service station will pump
1 22 8 gas into 0, 1, 2, 3, 4, or 5 or more cars during a certain
Supplier 2 25 5 30-minute period are 0.03, 0.18, 0.24, 0.28, 0.10 and
3 30 10 0.17, respectively. Find the probability that in this
Let A denote the event that a sample is from supplier 30-minute period
1, and let B denote the event that a sample conforms a) more than 2 cars receive gas;
to specifications. If a sample is selected at random, b) at most 4 cars receive gas.
determine the following probabilities:
2-7. A shipment of 12 television sets contains 3 de-
a) P[A] b) P[B] c) P[A0 ] fective sets. In how many ways can a hotel purchase
d) P[A ∩ B] e) P[A ∪ B] f) P[A ∩ B]
0 5 of these sets and receive 2 of the defective sets?
2-3. A box contains 500 envelopes, of which 75 con- Addition Rules
tain $100 in cash, 150 contain $25, and 275 contain 2-8. If P[A] = 0.3, P[B] = 0.2, and P[A ∩ B] = 0.1,
$10. An envelope may be purchased for $25. What is determine the following probabilities:

26
Exercises

a) P[A0 ] b) P[A ∪ B] c) P[A0 ∩ B] 2-15. A real estate agent has 8 master keys to open
several new homes. Only 1 master key will open any
d) P[A ∩ B 0 ] e) P[(A ∪ B)0 ] f) P[A0 ∪ B] given house. If 40% of these homes are usually left
2-9. If A, B and C are mutually exclusive events with unlocked, what is the probability that the real estate
P[A] = 0.2, P[B] = 0.3, and P[C] = 0.4, determine agent can get into a specific home if the agent selects
the following probabilities: 3 master keys at random before leaving the office?
a) P[A ∪ B ∪ C] b) P[A ∩ B ∩ C] 2-16. Consider the bar code in Example 1.9. Suppose
that all 40 codes are equally likely (none is held back
c) P[A ∩ B] d) P[(A ∪ B) ∩ C] as a delimiter). Determine the probability for each of
the following:
e) P[A0 ∩ B 0 ∩ C 0 ]
a) The second bar is wide given that the first bar is
2-10. Consider the bar code in Example 1.9. Suppose wide.
that all 40 codes are equally likely (none is held back b) The third bar is wide given that the first two bars
as a delimiter). Determine the probability for each of are not wide.
the following:
c) The first bar is wide given that the last bar is
a) The first bar is wide or the second bar is wide. wide.
b) Neither the first nor the second bar is wide. 2-17. A manufacturer of a flu vaccine is concerned
c) The first bar is wide or the second bar is not wide. about the quality of its flu serum. Batches of serum
d) The first bar is wide or the first space is wide. are processed by three different departments having
rejection rates of 0.10, 0.08, and 0.12, respectively.
2-11. Interest centers around the life of an electronic
The inspections by the three departments are sequen-
component. Suppose it is known that the probabil-
tial and independent.
ity that the component survives for more than 6000
hours is 0.42. Suppose also that the probability that a) What is the probability that a batch of serum sur-
the component survives no longer than 4000 hours is vives the first departmental inspection but is re-
0.04. jected by the second department?
a) What is the probability that the life of the com- b) What is the probability that a batch of serum is
ponent is less than or equal to 6000 hours? rejected by the third department?
b) What is the probability that the life is greater 2-18. Computer keyboard failures are due to faulty
than 4000 hours? electrical connects (12%) or mechanical defects (88%).
Mechanical defects are related to loose keys (27%) or
2-12. Consider the situation of Exercise 2-11. Let A improper assembly (73%). Electrical connect defects
be the event that the component fails a particular test are caused by defective wires (35%), improper connec-
and B be the event that the component displays strain tions (13%), or poorly welded wires (52%).
but does not actually fail. Event A occurs with prob- a) Find the probability that a failure is due to loose
ability 0.20, and event B occurs with probability 0.35. keys.
a) What is the probability that the component does b) Find the probability that a failure is due to im-
not fail the test? properly connected or poorly welded wires.
b) What is the probability that the component works
2-19. Given that P[A] = 0.5 and P[B] = 0.4, deter-
perfectly well (i.e., neither displays strain nor fails
mine P[A ∪ B] if A and B are
the test)?
a) mutually exclusive; b) independent.
c) What is the probability that the component either
fails or shows strain in the test? 2-20. Refer to the table of Exercise 2-2. Let A denote
the event that a sample is from supplier 1, and let B
Conditional Probability, Independent Events denote the event that a sample conforms to specifica-
2-13. A class in advanced physics is composed of 10 tions.
a) Are events A and
juniors, 30 seniors, and 10 graduate students. The fi-  B independent?
nal grades show that 3 of the juniors, 10 of the seniors, b) Determine P B A .
and 5 of the graduate students received an A for the 2-21. In a test of a printed circuit board using a ran-
course. If a student is chosen at random from this dom test pattern, an array of 10 bits is equally likely
class and is found to have earned an A, what is the to be 0 or 1. Assume the bits are independent.
probability that he or she is a senior?
a) What is the probability that all bits are 1s?
2-14. The probability that a married man watches a b) What is the probability that all bits are 0s?
certain television show is 0.4, and the probability that c) What is the probability that exactly 5 bits are 1s
a married woman watches the show is 0.5. The prob- and 5 bits are 0s?
ability that a man watches the show, given that his
wife does, is 0.7. Find the probability that 2-22. The following circuit operates if and only if
there is a path of functional devices from left to right.
a) a married couple watches the show;
The probability that each device functions is as shown.
b) a wife watches the show, given that her husband Assume that the probability that a device is functional
does; does not depend on whether or not other devices are
c) at least one member of a married couple will watch functional. What is the probability that the circuit
the show. operates?

27
2. Probability

A b) Given poor print quality, what problem is most


0.9 0.8 0.7 likely?
a b 2-27. Police plan to enforce speed limits by using
0.95 0.95 0.95 radar traps at four different locations within the city
limits. The radar traps at each of the locations L1 ,
2-23. Consider the bar code in Example 1.9. Suppose L2 , L3 , and L4 will be operated 40%, 30%, 20%, and
that all 40 codes are equally likely (none is held back 30% of the time. A person who is speeding on her
as a delimiter). Let A denote the event that the first way to work has probabilities of 0.2, 0.1, 0.5, and 0.2,
bar is wide and B denote the event that the second respectively, of passing through these locations.
bar is wide. Determine the following: a) What is the probability that she will receive a
a) P[A] b) P[B] c) P[A ∩ B] speeding ticket?
b) If the person received a speeding ticket on her way
d) Are A and B independent events? to work, what is the probability that she passed
2-24. Assume that A and B are independent events. through the radar trap located at L2 ?
Show that the following pairs of events are also inde- 2-28. In the situation of Exercise 2-22, what is the
pendent. probability that device A
a) A0 , B b) A, B 0 c) A0 , B 0 a) does not work if the system does not work?
Bayes’ Theorem b) does not work if the system operates?
2-25. A recreational equipment supplier finds that 2-29. A paint-store chain produces and sells latex and
among orders that include tents, 40% also include semigloss paint. Based on long-range sales, the proba-
sleeping mats. Only 5% of orders that do not include bility that a customer will purchase latex paint is 0.75.
tents do include sleeping mats. Also, 20% of orders Of those that purchase latex paint, 60% also purchase
include tents. Determine the following probabilities: rollers. But only 30% of semigloss paint buyers pur-
chase rollers. A randomly selected buyer purchases a
a) The order includes sleeping mats.
roller and a can of paint. What is the probability that
b) The order includes a tent given it includes sleeping the paint is latex?
mats.
2-30. Denote by A, B, and C the events that a grand
2-26. The probabilities of poor print quality given no prize is behind doors A, B, and C, respectively. Sup-
printer problem, misaligned paper, high ink viscosity, pose you randomly picked a door, say A. The game
or printer-head debris are 0.1, 0.3, 0.4, and 0.6, re- host opened a door, say B, and showed there was no
spectively. The probabilities of no printer problem, prize behind it. Now the host offers you the option
misaligned paper, high ink viscosity, or printer-head of either staying at the door that you picked (A) or
debris are 0.8, 0.02, 0.08, and 0.1, respectively. switching to the remaining unopened door (C). Use
a) Determine the probability of high ink viscosity probability to explain whether you should switch or
given poor print quality. not.

28
3. Discrete Random Variables and Probability
Distributions
Learning Objectives
At the end of this chapter, you should be able to:
1. Understand random variables
2. Determine probabilities from probability mass functions and the reverse
3. Determine probabilities and probability mass functions from cumulative distribution
functions and the reverse
4. Calculate means and variances for discrete random variables
5. Understand the assumptions for some common discrete probability distributions
6. Select an appropriate discrete probability distribution to calculate probabilities in spe-
cific applications
7. Calculate probabilities and determine means and variances for some common discrete
probability distributions

3.1. Discrete Random Variables


Statistics is concerned with making inferences about populations and population character-
istics. Experiments are conducted with results that are subject to chance. The testing of a
number of electronic components is an example of a statistical experiment, a term that is
used to describe any process by which several chance observations are generated. It is often
important to allocate a numerical description to the outcome. For example, the sample space
giving a detailed description of each possible outcome when three electronic components are
tested may be written
S = {N N N, N N D, N DN, N DD, DN N, DN D, DDN, DDD}
where N denotes nondefective and D denotes defective. One is naturally concerned with the
number of defectives that occur. Thus, each point in the sample space will be assigned a
numerical value of 0, 1, 2, or 3. These values are, of course, random quantities determined by
the outcome of the experiment.
Definition 3.1
A random variable is a function that assigns a real number to each outcome in the
sample space of a random experiment.

We shall use a uppercase letter, say X, to denote a random variable and its corresponding
lowercase letter, x in this case, for one of its values. In the electronic component testing
illustration, if the random variable represents the number of defective items, we notice that
the random variable X assumes the value 2 for all outcomes of the event
E = {N DD, DN D, DDN }

29
3. Discrete Random Variables and Probability Distributions

and assumes the value 0 for the event E = {N N N }. In function notation,


X(N DD) = X(DN D) = X(DDN ) = 2
We can think of a random variable as a scheme for partitioning a sample space S into k (or
sometimes countably infinite) mutually exclusive events Ek exhaustive of S. In the electronic
component testing, these mutually exclusive and exhausive events of S arising from the random
variable X are
E0 = {N N N }
E1 = {N N D, N DN, DN N }
E2 = {N DD, DN D, DDN }
E3 = {DDD}

Example 3.1
Two balls are drawn in succession without replacement from an urn containing 4 red balls
and 3 black balls. If the random variable Y is the number of red balls, then
Outcome y
RR 2
RB 1
BR 1
BB 0

Example 3.2
A stockroom clerk returns three safety helmets at random to three steel mill employees who
had previously checked them. If Smith, Jones, and Brown, in that order, receive one of the
three hats, list the sample points for the possible orders of returning the helmets, and find
the value m of the random variable M that represents the number of correct matches.
If S, J, and B stand for Smith’s, Jones’s, and Brown’s helmets, respectively, then the possible
arrangements in which the helmets may be returned and the number of correct matches are

Outcome m
SJB 3
SBJ 1
JSB 1
JBS 0
BSJ 0
BJS 1

A random variable is called a discrete random variable if its set of possible values is
countable. When a random variable can take on values on a continuous scale, it is called a
continuous random variable.
In most practical problems, continuous random variables represent measured data, such as
all possible heights, weights, temperatures, distance, or life periods, whereas discrete random
variables represent count data, such as the number of defectives in a sample of k items, the
number of highway fatalities per year in a city, or the number of coin tosses until a tail turns
up.

3.2. Discrete Probability Distribution Functions


A discrete random variable assumes each of its values with a certain probability. In the
electronic testing illustration in Section 3.1, the probability that X assumes the value 2 is 38

30
3.2. Discrete Probability Distribution Functions

assuming the model of equally likely outcomes, and we write


3
P[X = 2] =
8
In Example 3.2, the probability for each value of the random variable M is summarized by
the table.
m 0 1 3
1 1 1
P[M = m] 3 2 6

Note that the values of m exhaust all possible cases and hence the probabilities add to 1.
Random variables are so important in random experiments that sometimes we essentially
ignore the original sample space of the experiment and focus on the probability distribution
of the random variable.
The probability distribution of a random variable X is a description of the probabilities
associated with the possible values of X. For a discrete random variable, the distribution is
often specified by just a list of the possible values along with the probability of each. In some
cases, it is convenient to express the probability in terms of a formula called the probability
mass function.
Definition 3.2
For a discrete random variable X with possible values x1 , x2 , . . . , xn , a probability
mass function is a function f (x) such that

(1) f (xi ) ≥ 0
n
(2) f (xi ) = 1
X

i=1

(3) f (xi ) = P[X = xi ]

In Example 3.2, we can write the probability mass function of the random variable M as

1
3

 m=0
1

m=1

f (m) = 2
1


 6 m=3
0 otherwise


while the probability mass function for the electronic component testing illustration can be
expressed by the formula  3
f (x) = x
, x = 0, 1, 2, 3
8
Example 3.3
Let the random variable Y denote the number of semiconductor wafers that need to be
analyzed in order to detect a large particle of contamination. Assume that the probability
that a wafer contains a large particle is 0.01 and that the wafers are independent. Determine
the probability distribution of Y .
Let c denote a wafer in which a large particle is present, and let a denote a wafer in which
it is absent. The sample space of the experiment is infinite, and it can be represented as all
possible sequences that start with a string of a’s and end with c. That is,
S = {c, ac, aac, aaac, aaaac, . . .}
Consider a few cases. We have P[Y = 1] = P[c] = 0.01. Also, using the independence

31
3. Discrete Random Variables and Probability Distributions

assumption, P[Y = 2] = P[{ac}] = 0.99(0.01) = 0.0099.


A general formula is

P[Y = y] = P[{aa . . . a} c}] = 0.99(y−1) 0.01,


| {z y = 1, 2, 3, . . .
(y−1) a’s

We can see that Property (1) is satisfied by the random variables M , X and Y of Exam-
ple 3.2, the electronic component testing, and Example 3.3, respectively. Property (2) can be
easily verified for M and X and is left as an exercise for the random variable Y .
It is often helpful to look at a probability distribution in graphic form. One might plot the
points (m, f (m)) of Example 3.2 to obtain Figure 3.1. By joining the points to the horizontal
axis, we obtain a probability mass function plot. Figure 3.1 makes it easy to see what values
of m are most likely to occur.
Instead of plotting the points (m, f (m)), we more frequently construct rectangles, as in
Figure 3.2. Here the rectangles are constructed so that their bases of equal width are centered
at each value m and their heights are equal to the corresponding probabilities given by f (m).
The bases are constructed so as to leave no space between the rectangles. Figure 3.2 is called
a probability histogram.
Since each base in Figure 3.2 has unit width, P[M = m] is equal to the area of the rectangle
centered at m. Even if the bases were not of unit width, we could adjust the heights of the
rectangles to give areas that would still equal the probabilities of M assuming any of its values
m. This concept of using areas to represent probabilities is necessary for our consideration of
the probability distribution of a continuous random variable.
The probability histogram of Example 3.1 is shown in Figure 3.3.

f (m)

1/2

1/3

1/6

m
0 1 3

Figure 3.1: Probability mass function plot

f (m)

1/2

1/3

1/6

m
0 1 3

Figure 3.2: Probability histogram

32
3.3. Cumulative Distribution Functions

f (x)

1/2

1/4

x
0 1 2

Figure 3.3: Probability histogram for Example 3.1

3.3. Cumulative Distribution Functions


An alternate method for describing a random variable’s probability distribution is with cumu-
lative probabilities such as P[X ≤ x]. Furthermore, cumulative probabilities can be used to
find the probability mass function of a discrete random variable.
In general, for any discrete random variable X with possible values x1 , x2 , . . ., the events
X = x1 , X = x2 , . . . are mutually exclusive. Therefore,
P[X ≤ xi ] = P[X = xi ]
X

xi ≤x

This leads to the following definition.


Definition 3.3
The cumulative distribution function of a discrete random variable X, denoted as
F (x), is
F (x) = P[X ≤ x] = P[X = xi ] (3.1)
X

xi ≤x

The cumulative distribution function F (x) of a discrete random variable satisfies the follow-
ing properties:
(1) F (x) = f (xi )
X

xi ≤x

(2) 0 ≤ F (x) ≤ 1
(3) If a ≤ b then F (a) ≤ F (b)
Even if a random variable can assume only integer values, the cumulative distribution func-
tion is defined at noninteger values. In Example 3.2,
1 1 5
P[M ≤ 1.5] = P[M = 0] + P[M = 1] = + =
3 2 6
The cumulative distribution function F (x) is piecewise constant in the interval xi ≤ x <
xi+1 . This characteristic gives the graph similar to the staircase steps. The graph of the
cumulative distribution function of Example 3.2, which appears in Figure 3.4 is called a step
function. The function notation for this cumulative distribution function is



 0 m<0
1

0≤m<1

F (m) = 3
5


 6 1≤m<3
1 3≤m


33
3. Discrete Random Variables and Probability Distributions

1 F (m)
5
6

1
3

m
−2 −1 0 1 2 3 4 5

Figure 3.4: Cumulative distribution function for Example 3.2

Example 3.4
Determine the probability mass function of X from the cumulative distribution function:

x < −2


 0

0.2
−2 ≤ x < 0

F (x) =


 0.7 0 ≤ x < 2
1 2≤x

The domain of the probability mass function are the included endpoints of each interval,
xi = −2, 0, 2. The value of f (x) at each xi is determined by f (xi ) = F (xi ) − F (xi−1 ) for
i = 1, 2, 3 and f (x1 ) is taken to be equal to F (x1 ).

f (x1 ) = f (−2) = F (x2 ) = F (0) = 0.2


f (x2 ) = f (0) = F (x2 ) − F (x1 ) = F (0) − F (−2) = 0.7 − 0.2 = 0.5
f (x3 ) = f (2) = F (x3 ) − F (x2 ) = F (2) − F (0) = 1 − 0.7 = 0.3

Therefore, 
0.2 x = −2


f (x) = 0.5 x=0

0.3

x=2

f (xi ) is the difference between values of F (x) at consecutive subintervals, and the xi ’s are the left
endpoints of each subinterval.

3.4. Mean and Variance of a Random Variable


Consider the following. If two coins are tossed 16 times and X is the number of heads that
occur per toss, then the values of X are 0, 1, and 2. Suppose that the experiment yields no
heads, one head, and two heads a total of 4, 7, and 5 times, respectively. The average number
of heads per toss of the two coins is then

0(4) + 1(7) + 2(5)


= 1.0626
16
This is an average value of the data and yet it is not a possible outcome of {0, 1, 2}. Hence,
an average is not necessarily a possible outcome for the experiment.
Let us now restructure our computation for the average number of heads so as to have the
following equivalent form:
4 7 5
     
0 +1 +2 = 1.0625
16 16 16

34
3.4. Mean and Variance of a Random Variable

4 7 5
The numbers 16 , 16 and 16 are the fractions of the total tosses resulting in 0, 1, and 2 heads,
respectively. These fractions are also the relative frequencies for the different values of X in
our experiment. In fact, then, we can calculate the mean, or average, of a set of data by
knowing the distinct values that occur and their relative frequencies, without any knowledge
4
of the total number of observations in our set of data. Therefore, if 16 of the tosses result in
7 5
no heads, 16 of the tosses result in one head, and 16 of the tosses result in two heads, the mean
number of heads per toss would be 1.0625 no matter what the total number of tosses were.
This method of relative frequencies is used to calculate the average number of heads per
toss of two coins that we might expect in the long run. We shall refer to this average value as
the mean of the random variable X.
Definition 3.4
The mean or expected value of the discrete random variable X with probability mass
function f (x), denoted as µX of E[X] is

µX = E[X] = xf (x) (3.2)


X

all x

We shall write µX simply as µ when it is clear to which random variable we refer.


For the random variable M in Example 3.2, its mean or expected value is
1 1 1
     
µ = E[M ] = 0 +1 +3 =1
3 2 6
On the average, one of the three mill employees will receive his helmet and the other two will
receive helmets switched between them.
Example 3.5
A lot containing 7 components is sampled by a quality inspector; the lot contains 4 good
components and 3 defective components. A sample of 3 is taken by the inspector. Find the
expected value of the number of good components in this sample.
Let X represent the number of good components in the sample. The mutually exclusive
events are X = 0, X = 1, X = 2 and X = 3 exhaustive of S.
x number of waysa f (x) xf (x)
0 1 1/35 0
1 12 12/35 12/35
2 18 18/35 36/35
3 4 4/35 12/35
Total 35 1 60/35
12
The inspector can expect 7 = 1.7 good components on the average.
a
see Example 1.12

Example 3.6
A salesperson for a medical device company has two appointments on a given day. At the
first appointment, he believes that he has a 70% chance to make the deal, from which he
can earn $1000 commission if successful. On the other hand, he thinks he only has a 40%
chance to make the deal at the second appointment, from which, if successful, he can make
$1500. What is his expected commission based on his own probability belief? Assume that
the appointment results are independent of each other.
Let Y denote the total commission of the salesperson in the appointments. The table below
summarizes his total commission and the associated probabilities in parentheses.

35
3. Discrete Random Variables and Probability Distributions

1500 (0.4) 0 (0.6)


1000 (0.7) 2500 (0.28) 1000 (0.42)
0 (0.3) 1500 (0.12) 0 (0.18)
His expected commission is
µ = E[Y ] = 2500(0.28) + 1000(0.42) + 1500(0.12) + 0(0.18) = 1300

Examples 3.5 and 3.6 are designed to allow the reader to gain some insight into what we
mean by the expected value of a random variable.
The mean, or expected value, of a random variable X is of special importance in statistics
because it describes where the probability distribution is centered. By itself, however, the
mean does not give an adequate description of the shape of the distribution. We also need to
characterize the variability in the distribution. The most important measure of variability of
a random variable X is the variance.
Definition 3.5
Let X be a discrete random variable with probability mass function f (x) and mean
2 or V[X], is
µ = E[X]. The variance of X, denoted σX

2
= V[X] = E (X − µ)2 = (x − µ)2 f (x)
h i
(3.3)
X
σX
all x

We shall write the variance simply as σ 2 when it is clear which random variable we refer.
Computing σ 2 can be quite tedious using the definition. We derive another expression for
the variance that will be easier to apply.
2
= V[X] = (x − µ)2 f (x) = (x2 − 2xµ + µ2 )f (x)
X X
σX
= x2 f (x) − 2xµf (x) + µ2 f (x)
X X X

= x2 f (x) − 2µ xf (x) + µ2 f (x)


X X X

2 2
= x f (x) − 2µ(µ) + µ (1)
X

= x2 f (x) − µ2
X

The alternate form of the variance of the random variable X is


2
= V[X] = x2 f (x) − µ2 (3.4)
X
σX
all x

Example 3.7
Compute the variance of the random variable M of Example 3.2.
We show the computation for µ and σ 2 via a table.

x f (x) xf (x) x2 x2 f (x)


0 1/3 0 0 0
1 1/2 1/2 1 1/2
3 1/6 1/2 9 3/2
Total 1 2

xf (x) = 1 and σ 2 =
P 2
We find µ = x f (x) − µ2 = 2 − (1)2 = 1.
P

The variance of Example 3.5 is found to be


2
1 12 18 4 12 24
         
2 2
x f (x) − µ = 0 +1 +4 +9 =
X

35 35 35 35 7 49

36
3.4. Mean and Variance of a Random Variable

It should be noted that the unit of σ 2 is the square of the unit of the random variable X.
The unit of the variance makes the interpretation difficult. A measure of variability that has
the same unit as the random variable is the standard deviation.
Definition 3.6
The standard deviation σX of the random variable X is defined as
q q
σX = 2 =
σX V[X] (3.5)

Chebyshev’s Theorem
We stated that the variance of a random variable tells us something about the variability
of the observations about the mean. If a random variable has a small variance or standard
deviation, we would expect most of the values to be grouped around the mean. Therefore, the
probability that the random variable assumes a value within a certain interval about the mean
is greater than for a similar random variable with a larger standard deviation. If we think of
probability in terms of area, we would expect a distribution with a large value of σ to indicate
a greater variability, and therefore we should expect the area to be more spread out from µ,
and a distribution with a small standard deviation should have most of its area close to µ.
The following theorem, due to the Russian mathematician P. L. Chebyshev, gives a conser-
vative estimate of the probability that a random variable assumes a value within k standard
deviations of its mean for any real number k.

Chebyshev’s Theorem

The probability that any random variable X will assume a value within k standard
deviations of the mean is at least 1 − k12 . That is,

1
P[µ − kσ < X < µ + kσ] ≥ 1 − (3.6)
k2

For k = 2, the theorem states that the random variable X has a probability of at least
1 − 41 = 34 of falling within two standard deviations of the mean. That is, three-fourths or
more of the observations of any distribution lie in the interval µ − 2σ < x < µ + 2σ.
With some manipulation, Equation (3.6) can be reformulated as
1
P[|X − µ| ≥ kσ] ≤ (3.7)
k2
Example 3.8
A random variable X has a mean µ = 8, a variance σ 2 = 9, and an unknown probability
distribution. Find:

(a) P[−4 < X < 20]


We have µ − kσ = −4 and µ + kσ = 20 for µ = 8 and σ = 3, which gives k = 4. Therefore,
1 15
P[−4 < X < 20] ≥ 1 − =
42 16

(b) P[|X − 8| ≥ 6]
Using Equation (3.7), we have kσ = 6 for which k = 2. Therefore,
1 1
P[|X − 8| ≥ 2(3)] ≤ 2
=
2 4

37
3. Discrete Random Variables and Probability Distributions

Chebyshev’s theorem holds for any distribution of observations, and for this reason the
results are usually weak. The value given by the theorem is a lower bound only. That is,
we know that the probability of a random variable falling within two standard deviations
of the mean can be no less than 3/4, but we never know how much more it might actually
be. Only when the probability distribution is known can we determine exact probabilities.
For this reason we call the theorem a distribution-free result. When specific distributions are
assumed, as in later sections and future chapters, the results will be less conservative. The
use of Chebyshev’s theorem is relegated to situations where the form of the distribution is
unknown.

Function of a Random Variable


The variance of a random variable X can be considered to be the expected value of a specific
function of X, namely g(x) = (X − µ)2 . In general, the expected value of any function g(x) is
defined in a similar manner.
Definition 3.7
If X is a discrete random variable with probability mass function f (x), and g is a
2
function of X, the expected value µg(X) and variance σg(X) of the random variable g(X)
are defined as

µg(X) = E[g(X)] = g(x)f (x) (3.8)


X

all x
2
2
X
σg(X) = V[g(X)] = g(x) − µg(X) f (x) (3.9)
all x

Example 3.9
Suppose that the number of cars X that pass through a car wash between 4:00 P.M. and
5:00 P.M. on any sunny Friday has the following probability distribution:
x 4 5 6 7 8 9
1 1 1 1 1 1
f (x) 12 12 4 4 6 6

Let g(X) = 2X − 1 represent the amount of money, in dollars, paid to the attendant by the
manager.

(a) Find P[g(X) ≤ 12].


We simplify the inequality by solving the random variable X.

g(X) ≤ 12 ⇐⇒ 2X − 1 ≤ 12
⇐⇒ 2X ≤ 13
13
⇐⇒ X ≤ 2

Hence,
13
h i
P[g(X) ≤ 12] = P X ≤ 2 = P[X ≤ 6.5]
= P[X = 4] + P[X = 5] + P[X = 6]
1 1 1 5
= 12 + 12 + 4 = 12

The probability that the attendant receives no more than $12 pay is 5
12

(b) Find the expected pay of the attendant.

38
3.5. Discrete Uniform Distribution

The average pay of the attendant is E[g(X)].


9
E[g(X)] = g(x)f (x) = (2x − 1)f (x)
X X

all x x=4
1 1 1 1 1 1
           
=7 +9 + 11 + 13 + 15 + 17 = $12.67
12 12 4 4 6 6

A function of X that is particular interest and will be useful in a later chapter is the linear
function g(X) = aX + b for some constants a and b.

Linear Function of X
If X is a discrete random variable with probability mass function f (x), mean µX , and
variance σX2 , then the mean and variance of the linear function g(X) = aX + b for some

constants a and b are

E[g(X)] = E[aX + b] = aE[X] + b (3.10)


2
V[g(X)] = V[aX + b] = a V[X] (3.11)

3.5. Discrete Uniform Distribution


The simplest discrete random variable is one that assumes only a finite umber of possible
values, each with equal probability. A random variable X that assumes each of the values
x1 , x2 , . . . , xn with equal probability n1 is frequently of interest.
Definition 3.8
A discrete uniform random variable X has a discrete uniform distribution if each
of the n possible values x1 , x2 , . . . , xn has probability mass function f (x) = n1

We state here the mean and variance of a particular case of a discrete uniform distribution.

Discrete Uniform Distribution


Suppose that X is a discrete uniform variable that takes on consecutive integer values
a, a + 1, a + 2, . . . , b. The mean and variance of X are
a+b
µ = E[X] = (3.12)
2
(b − a + 1)2 − 1
σ 2 = V[X] = (3.13)
12

3.6. Binomial Distribution


An experiment often consists of repeated trials, each with two possible outcomes that may be
labeled success or failure. The most obvious application deals with the testing of items as they
come off an assembly line, where each trial may indicate a defective or a nondefective item.
Each trial is called a Bernoulli trial. It is usually assumed that the trials that constitute
the random experiment are independent. This implies that the outcome from one trial has no
effect on the outcome to be obtained from any other trial. Furthermore, it is often reasonable
to assume that the probability of a success in each trial is constant. The process is referred to
as a Bernoulli process.

39
3. Discrete Random Variables and Probability Distributions

The Bernoulli Process


Strictly speaking, the Bernoulli process must possess the following properties:
1. The experiment consists of repeated trials.
2. The repeated trials are independent.
3. Each trial results in an outcome that may be classified as a success or a failure.
4. The probability of success, denoted by p, remains constant from trial to trial.
Consider the set of Bernoulli trials where three items are selected at random from a man-
ufacturing process, inspected, and classified as defective or nondefective. A defective item is
designated a success. The number of successes is a random variable X assuming integral values
from 0 through 3. If we assume that the process produces 25% defectives, we have p = 0.25.
Since the items are selected independently, we have
1 3 1 3
   
P[{N DN }] = P[{N }] P[{D}] P[{N }] = =
4 4 4 64
Similar calculations yield the probabilities for the other possible outcomes {DN N } and {N N D}.
Thus,
3 9
 
P[X = 1] = 3 =
64 64
Similar calculations can be done for the cases X = 0, X = 2 and X = 3. Therefore, the
probability distribution of X is
x 0 1 2 3
27 27 9 1
f (x) 64 64 64 64

Binomial Distribution
Many processes can be thought of as consisting of a sequence of Bernoulli trials, such as, for
example, the repeated tossing of a coin or the repeated examination of objects to determine
whether or not they are defective. In such cases, a random variable of interest is the number
of successes obtained within a fixed number of trials n, where a success is defined in an
appropriate manner. Such a random variable is called a binomial random variable.
Definition 3.9
If the binomial random variable X is the number x of trials that result in a success in
a Bernoulli process having n trials, the probability mass function of X is
!
n x
f (x) = p (1 − p)n−x , x = 0, 1, 2, . . . , n (3.14)
x

The binomial probability mass function is shown in Figure 3.5 with n = 10 and p = 0.35.
The binomial random variable is probably the most important of all discrete probability
distributions. Its probability distribution is called a binomial distribution.

Binomial Distribution

The mean µ and variance σ 2 of the binomial random variable X with parameters n and
p, the number of trials and the probability of a success, respectively, are

µ = E[X] = np (3.15)
σ 2 = E[X] = np(1 − p) (3.16)

40
3.7. Poisson Distribution

f (x)

0.2

0.1

0 x
0 1 2 3 4 5 6 7 8 9 10

Figure 3.5: Binomial probability mass function with n = 10, p = 0.35

The proof of Equations (3.15) and (3.16) can be found at the appendix.
Example 3.10
Each sample of water has a 10% chance of containing a particular organic pollutant. Assume
that the samples are independent with regard to the presence of the pollutant.

(a) Find the probability that in the next 18 samples, exactly 2 contain the pollutant.
Let X be the number of samples that contain the pollutant in the next 18 samples analyzed.
X is a binomial random variable with p = 0.1 and n = 18. Therefore,

18
!
P[X = 2] = (0.1)2 (0.9)16 = 0.2835
2

(b) Find the probability that 3 to 5 of the 20 samples contain the pollutant.
The required probability is P[3 ≤ X ≤ 5].

P[3 ≤ X ≤ 5] = P[X = 3] + P[X = 4] + P[X = 5]


20 20 20
! ! !
= (0.1)3 (0.9)17 + (0.1)4 (0.9)16 + (0.1)5 (0.9)15
3 4 5
= 0.3118

(c) Find the mean and standard deviation of the number of pollutants in 16 samples.

µ = np = 16(0.1) = 1.6
√ q q
σ = σ 2 = np(1 − p) = 16(0.1)(0.9) = 1.2

3.7. Poisson Distribution


Experiments yielding numerical values of a random variable X, the number of outcomes oc-
curring during a given time interval or in a specified region, are called Poisson experiments.
The given time interval may be of any length, such as a minute, a day, a week, a month, or
even a year. For example, a Poisson experiment can generate observations for the random
variable X representing the number of telephone calls received per hour by an office. The
specified region could be a line segment, an area, a volume, or perhaps a piece of material. In
such instances, X might represent the number of field mice per acre, the number of bacteria
in a given culture, or the number of typing errors per page. A Poisson experiment is derived

41
3. Discrete Random Variables and Probability Distributions

from the Poisson process and possesses the following properties.

Poisson Process
1. The number of outcomes occurring in one time interval or specified region of space is
independent of the number that occur in any other disjoint time interval or region. In
this sense we say that the Poisson process has no memory.
2. The probability that a single outcome will occur during a very short time interval or in
a small region is proportional to the length of the time interval or the size of the region
and does not depend on the number of outcomes occurring outside this time interval or
region.
3. The probability that more than one outcome will occur in such a short time interval or
fall in such a small region is negligible.

Poisson Distribution
The number X of outcomes occurring during a Poisson experiment is called a Poisson ran-
dom variable, and its probability distribution is called the Poisson distribution.
Definition 3.10
The probability mass function of the Poisson random variable X, representing the num-
ber of outcomes occurring in a given time interval or specified region denoted by t,
is
e−λt (λt)x
f (x) = , x = 0, 1, 2, . . . (3.17)
x!
where λ is the average number or outcomes per unit time, distance, area, or volume.

The probability distribution of the Poisson random variable X with λt = 2.8 is shown in
Figure 3.6.

f (x)

0.2

0.1

0 x
0 2 4 6 8 10 12

Figure 3.6: Poisson probability mass function with λt = 2.8

Example 3.11
Ten is the average number of oil tankers arriving each day at a certain port. The facilities
at the port can handle at most 15 tankers per day.

(a) What is the probability of finding 8 oil tankers on a given day?


We are given λ = 10 (oil tankers per day), so we take t = 1 (day).

e− 10108
f (x = 8) = = 0.1126
8!

42
3.8. Other Distributions (Optional)

(b) What is the probability that on a given day tankers have to be turned away?
Tankers will be turned away if the number of tankers exceed the port’s capacity of 15. Thus,
the probability we seek is P[X > 15].

P[X > 15] = 1 − P[X ≤ 15]


15 −10 x
e 10
=1−
X

x=0
x!
= 1 − 0.9513 = 0.0487

Poisson Distribution
The mean and variance of the Poisson random variable are

µ = λt (3.18)
2
σ = λt (3.19)

The proof can be found at the appendix.

3.8. Other Distributions (Optional)


3.8.1. Hypergeometric Distribution
The simplest way to view the distinction between the binomial distribution of Section 3.6
and the hypergeometric distribution is to note the way the sampling is done. The types of
applications for the hypergeometric are very similar to those for the binomial distribution.
We are interested in computing probabilities for the number of observations that fall into a
particular category. But in the case of the binomial distribution, independence among trials is
required. As a result, if that distribution is applied to, say, sampling from a lot of items (deck
of cards, batch of production items), the sampling must be done with replacement of each
item after it is observed to maintain a constant probability of a success. On the other hand,
the hypergeometric distribution does not require independence and is based on sampling done
without replacement.
In general, we are interested in the probability of selecting x successes from the k items
labeled successes and n − x failures from the N − k items labeled failures when a random
sample of size n is selected from N items. The characteristics of the hypergeometric random
variable are summarized below.
Definition 3.11
The hypergeometric random variable X, the number of successes in a random sample
of size n without replacement selected from N items of which k are labeled success and
N − k labeled failure, has the probability mass function
k  N −k
x n−x
f (x) = N
, x = max(0, n + k − N ), . . . , min(k, n) (3.20)
n

The mean and variance of the hypergeometric random variable are

µ = np (3.21)
N −n
 
2
σ = np(1 − p) (3.22)
N −1
where p = N.
k

43
3. Discrete Random Variables and Probability Distributions

Example 3.12
A particular part that is used as an injection device is sold in lots of 40. The lot is deemed
unacceptable if it contains 3 or more defectives. A sampling procedure to select 5 compo-
nents at random and reject the lot if a defective is found. Comment on the utility of this
plan.
we first examine the utility of the plan when the lot is considered acceptable, that is, it
contains fewer the 3 defective parts. When k = 0, the sampling procedure will find the 5
components to be in good condition, therefore the lot is not rejected. If k = 1 defective part
(and N − k = 39 good parts), the probability that the sampling procedure will not reject
the lot, that is, x = 0, is   1 39
0 5
P[X = 0] = 40 = 0.875
5
So there is a 1 − 0.875 = 12.5% probability that the lot will be rejected when it is deemed
acceptable.
When k = 2, the probability of rejecting an acceptable lot is
2 38
0 5
1− 40 = 0.2372
5

The plan is not desirable because of moderate probability of rejecting an acceptable lot.
We now examine the utility of the plan when the lot is deemed unacceptable. We take the
case k = 3.   3 37
0 5
40 = 0.6624
5
Under this scheme, 66.24% of the lots will not be rejected when they are actually unaccept-
able. Once again, the plan is not desirable.
The probability of not rejecting an unacceptable lot decreases as k increases, but this is also not
good because the lot in the first place is unacceptable.

3.8.2. Negative Binomial and Geometric Distributions

Let us consider an experiment where the properties are the same as those listed for a binomial
experiment, with the exception that the trials will be repeated until a fixed number of successes
occur. Therefore, instead of the probability of x successes in n trials, where n is fixed, we are
now interested in the probability that the k th success occurs on the xth trial. Experiments of
this kind are called negative binomial experiments.
The number X of trials required to produce k successes in a negative binomial experiment
is called a negative binomial random variable and its probability distribution is called
the negative binomial distribution.
Definition 3.12
The probability mass function of the negative binomial random variable X, the number
of the trial on which the k th success occurs is

x−1 k
!
f (x) = p (1 − p)x−k , x = k, k + 1, k + 2, . . . (3.23)
k−1

where p is the probability of a success that is a constant in each trial.

44
3.8. Other Distributions (Optional)

The mean and variance of the negative binomial random variable are
k
µ= (3.24)
p
k(1 − p)
σ2 = (3.25)
p2

Example 3.13
In an NBA (National Basketball Association) championship series, the team that wins four
games out of seven is the winner. Suppose that teams A and B face each other in the
championship games and that team A has probability 0.55 of winning a game over team B.

(a) What is the probability that team A will win the series in 6 games?
A wins the series by claiming its fourth win (k = 4) in the sixth game (x = 6).

5
!
P[X = 6] = (0.55)4 (0.45)2 = 0.1583
3

(b) What is the probability that team A will win the series?
A will win the series by claiming its fourth win in 4, 5, 6 or 7 games (x = 4, 5, 6, 7).
7
!
x
P[4 ≤ X ≤ 7] = (0.55)x (0.45)x−4 = 0.6083
X

x=4
4

A special case of the negative binomial distribution where k = 1 is a probability distribution


for the number of trials for a single or first success. The negative binomial probability mass
function reduces to the form
f (x) = p(1 − p)x−1 , x = 1, 2, 3, . . .
The successive terms of the probability mass function constitute a geometric progression. It
is customary to refer to the random variable as a geometric random variable and its
distribution as geometric distribution.

Chapter Summary
• A random variable X assigns a numerical value to each element of the sample space S. It
defines mutually exclusive events that are exhaustive of S.
• Every discrete random variable has a corresponding probability distribution called a proba-
bility mass function f (x) which is the probability that the random variable X is equal to the
value x, written as P[X = x].
• A cumulative distribution function F (x) can be obtained from a probability mass function
f (x). Conversely, the probability mass function can be determined from a cumulative distri-
bution function.
• The mean µ or expected value is defined as
X
µ = E[X] = xf (x)
all x

• The variance σ 2 or V[X] is defined as


X
σ2 = (x − µ)2 f (x)
all x

45
3. Discrete Random Variables and Probability Distributions

An alternate form of the variance is


X
σ2 = x2 f (x) − µ2
all x

• Chebyshev’s theorem states that the proportion of the values of the random variable X lying
within kσ of the mean is at least 1 − k12 , stated as
1
P[|X − µ|] ≥ 1 −
k2
• The mean and variance of a linear function of the random variable X are

E[aX + b] = aE[X] + b
V[aX + b] = a2 V[X]

• The discrete uniform random variable X, with n possible values a, a + 1, a + 2, . . . , b, has a


probability mass function, mean and variance
1
f (x) =
n
a+b
µ=
2
(a + b + 1)2 − 1
σ =
2
12
• For the binomial random variable X, the number of trials that result in a success in a Bernoulli
process having n trials,
 
n x
f (x) = p (1 − p)n−x , x = 0, 1, 2, . . . , n
x
µ = np
σ 2 = np(1 − p)

• For the Poisson random variable X,


e−λt (λt)x
f (x) = , x = 0, 1, 2, . . .
x!
µ = λt
σ 2 = λt

• In a hypergeometric distribution,
k N −k
 
x n−x
f (x) = N
 , x = 0, 1, 2, . . . , min (k, n)
n
µ = np
 
N −n
σ2 = np(1 − p)
N −1

where p = k
N
• If the random variable X follows a negative binomial distribution,
x−1 k
 
f (x) = p (1 − p)x−k , x = k, k + 1, k + 2, . . .
k−1
k
µ=
p
k(1 − p)
σ2 =
p2

46
Exercises

Exercises
3-1. Let X be a random variable for each case. Give 3-8. Verify that the given function is a probability
the possible values of the random variable. mass function, and determine the requested probabil-
a) number of coins landing face up in a toss of 10 ities.
2x + 1
coins f (x) = , x = 0, 1, 2, 3, 4
25
b) the length of time to play 18 holes of golf a) P[X = 4] b) P[X ≤ 1]
c) the amount of milk produced yearly by a partic- c) P[2 ≤ X < 4] d) P[X > −10]
ular cow 3-9. An assembly consists of two mechanical compo-
d) number of defective parts among 150 units tested nents. Suppose that the probabilities that the first and
e) the number of eggs laid each month by a hen second components meet specifications are 0.95 and
f) number of patients arriving in an emergency room 0.98, respectively. Assume that the components are
in any day independent. Determine the probability mass func-
tion of the number of components in the assembly that
g) number of lines in use at a particular time in a
meet specifications.
voice communication system with 50 lines
3-10. Consider the circuit in Example 2.10. Assume
h) A healthcare provider schedules 30 minutes for
that devices fail independently. What is the probabil-
each patient’s visit, but some visits require extra
ity mass function of the number of failed devices?
time. The random variable is the number of pa-
tients treated in an eight-hour day. 3-11. The space shuttle flight control system called
Primary Avionics Software Set (PASS) uses four in-
3-2. A random experiment consists of flipping two
dependent computers working in parallel. At each
coins. Let the random variable X denote the number critical step, the computers ‘vote’ to determine the
of coins that landed face up. Enumerate the outcomes appropriate step. The probability that a computer
of each event arising from X. will ask for a roll to the left when a roll to the right
3-3. In a random experiment, two six-sided dice are is appropriate is 0.0001. Let X denote the number of
rolled. Let Y be the total of the outcomes of the dice. computers that vote for a left roll when a right roll is
Enumerate the outcomes of each event arising from Y . appropriate. What is the probability mass function of
X?
3-4. An overseas shipment of 5 foreign automobiles
contains 2 that have slight paint blemishes. If an 3-12. Determine the value c so that the function
agency receives 3 of these automobiles at random, list f (x) = c(x2 + 4) for x = 0, 1, 2, 3 can serve as a proba-
the elements of the sample space S, using the letters B bility distribution of the discrete random variable X.
and N for blemished and nonblemished, respectively; 3-13. Show that Property (2) of Definition 3.2 holds
then to each sample point assign a value x of the ran- for Example 3.3.
dom variable X representing the number of automo- 3-14. Construct the cumulative distribution function
biles with paint blemishes purchased by the agency. of X, the number of imperfections per 10 meters of a
3-5. Let W be a random variable giving the number synthetic fabric in continuous rolls of uniform width,
of heads minus the number of tails in three tosses of given its probability mass function.
a coin. List the elements of the sample space S for x 0 1 2 3 4
the three tosses of the coin and to each sample point f (x) 0.41 0.37 0.16 0.05 0.01
assign a value w of W . 3-15. An investment firm offers its customers munici-
pal bonds that mature after varying numbers of years.
3-6. The sample space of a random experiment is
Given that the cumulative distribution function of T ,
S = {a, b, c, d, e, f }, and each outcome is equally likely.
the number of years to maturity for a randomly se-
A random variable X is defined as follows:
outcome a b c d e f lected bond, is
0 ,t < 1

x 0 0 1.5 1.5 2 3 
 1/4 , 1 ≤ t < 3

Determine the probability mass function of X. Use

the probability mass function to determine the follow- F (t) = 1/2 ,3 ≤ t < 5
ing probabilities:  3/4 , 5 ≤ t < 7



1 ,7 ≤ t
a) P[X = 1.5] b) P[0.5 < X < 2.7]
find
c) P[X > 3] d) P[0 ≤ X < 2] a) P[T = 5] b) P[T > 3]
e) P[X = 0 or X = 2] c) P[1.4 < T < 6] d) P[T ≤ 5|T > 2]
3-7. Verify that the function f (x) is a probability 3-16. Compute the mean and variance of the random
mass function, and determine the requested probabil- variable in Exercise 3-7.
ities. 3-17. Compute the mean and variance of the random
x -2 -1 0 1 2 variable in Exercise 3-8.
f (x) 0.2 0.4 0.1 0.2 0.1 3-18. Compute the mean and standard deviation of
the random variable in Exercise 3-14.
a) P[X ≤ 2] b) P[X > −2]
3-19. A random variable X has a mean µ = 10 and a
c) P[−1 ≤ X ≤ 1] d) P[X ≤ −1 or X = 2] variance σ 2 = 4. Using Chebyshev’s theorem, find

47
3. Discrete Random Variables and Probability Distributions

a) P[|X − 10| ≥ 3] a) exactly 5 accidents will occur?


b) P[5 < X < 15] b) fewer than 3 accidents will occur?
c) the value of the constant c such that c) at least 2 accidents will occur?
3-30. The number of cracks in a section of a high-
P[|X − 10| ≥ c] ≤ 0.04
way that are significant enough to require repair is
3-20. Seventy new jobs are opening up at an automo- assumed to follow a Poisson distribution with a mean
bile manufacturing plant, and 1000 applicants show of two cracks per mile. What is the probability that
up for the 70 positions. To select the best 70 from there are no cracks that require repair in 5 miles of
among the applicants, the company gives a test that highway?
covers mechanical skill, manual dexterity, and mathe- 3-31. Suppose that the number of customers who en-
matical ability. The mean grade on this test turns out ter a bank in an hour is a Poisson random variable,
to be 60, and the scores have a standard deviation of and suppose that P[X = 0] = 0.05. Determine the
6. Can a person who scores 84 count on getting one of mean and variance of X.
the jobs? Assume that the distribution is symmetric
Hypergeometric and Negative Binomial Distribu-
about the mean and use Chebyshev’s theorem.
tions
3-21. Let X be a random variable with the following
3-32. Suppose that X has a hypergeometric distribu-
probability distribution:
tion with N = 20, n = 4, and k = 4. Determine the
x -3 6 9 following:
f (x) 1
6
1
2
1
3 a) P[X = 1] b) P[X = 4]
Find E[g(X)] where g(X) = (2X + 1)2 . c) P[X ≤ 2] d) µX , σX
2

3-22. Let the random variable X have a discrete uni- 3-33. A batch contains 36 bacteria cells and 12 of the
form distribution on the integers 1 ≤ x ≤ 3. Deter- cells are not capable of cellular replication. Suppose
mine the mean and variance of X. that you examine three bacteria cells selected at ran-
3-23. Thickness measurements of a coating process dom without replacement.
are made to the nearest hundredth of a millimeter. a) What is the probability mass function of the num-
The thickness measurements are uniformly distributed ber of cells in the sample that can replicate?
with values 0.15, 0.16, 0.17, 0.18, and 0.19. Determine b) What are the mean and variance of the number
the mean and variance of the coating thickness for this of cells in the sample that can replicate?
process. c) What is the probability that at least one of the
3-24. The random variable X has a binomial distri- selected cells cannot replicate?
bution with n = 10 and p = 0.08. Determine the 3-34. A slitter assembly contains 48 blades. Five
following probabilities. blades are selected at random and evaluated each day
a) P[X = 5] b) P[X ≤ 2] for sharpness. If any dull blade is found, the assembly
is replaced with a newly sharpened set of blades.
c) P[X ≥ 9] d) P[3 ≤ X < 5]
a) If 10 of the blades in an assembly are dull, what is
3-25. Determine the cumulative distribution function
the probability that the assembly is replaced the
of a binomial random variable with n = 3 and p = 14 . first day it is evaluated?
3-26. Suppose that airplane engines operate indepen- b) If 10 of the blades in an assembly are dull, what is
dently and fail with probability equal to 0.4. Assum- the probability that the assembly is not replaced
ing that a plane makes a safe flight if at least one-half until the third day of evaluation? [Hint: Assume
of its engines run, determine whether a 4-engine plane that the daily decisions are independent, and use
or a 2-engine plane has the higher probability for a the geometric distribution.]
successful flight. c) Suppose that on the first day of evaluation, 2 of
3-27. Because not all airline passengers show up for the blades are dull; on the second day of evalu-
their reserved seat, an airline sells 125 tickets for a ation, 6 are dull; and on the third day of evalu-
flight that holds only 120 passengers. The probability ation, 10 are dull. What is the probability that
that a passenger does not show up is 0.10, and the the assembly is not replaced until the third day
passengers behave independently. of evaluation? [Hint: Assume that the daily deci-
a) What is the probability that every passenger who sions are independent. However, the probability
shows up can take the flight? of replacement changes every day.]
b) What is the probability that the flight departs 3-35. Suppose that X is a negative binomial random
with empty seats? variable with p = 0.2 and k = 4. Determine the fol-
lowing:
3-28. Suppose that X has a Poisson distribution with
a mean of 4. Determine the following probabilities: a) E[X] b) P[X = 19] c) P[X = 21]
a) P[X = 0] b) P[X ≤ 2] 3-36. Heart failure is due to either natural occur-
rences (87%) or outside factors (13%). Outside factors
c) P[X = 4] d) P[X = 8] are related to induced substances or foreign objects.
3-29. On average, 3 traffic accidents per month occur Natural occurrences are caused by arterial blockage,
at a certain intersection. What is the probability that disease, and infection. Assume that causes of heart
in any given month at this intersection failure for the individuals are independent.

48
Exercises

a) What is the probability that the first patient with 3-37. Suppose that the random variable X has a ge-
heart failure who enters the emergency room has ometric distribution with p = 0.5. Determine the fol-
the condition due to outside factors? lowing probabilities:
b) What is the probability that the third patient with a) P[X = 1] b) P[X = 4] c) P[X = 8]
heart failure who enters the emergency room is the
d) P[X ≤ 2] e) P[X ≥ 2]
first one due to outside factors?
c) What is the mean number of heart failure patients 3-38. Suppose that the random variable X has a ge-
with the condition due to natural causes who en- ometric distribution with a mean of 2.5. Determine
ter the emergency room before the first patient the following probabilities:
with heart failure from outside factors? a) P[X = 1] b) P[X = 4] c) P[X ≤ 3]

49
4. Continuous Probability Distributions
Physical quantities such as time, length, area, temperature, pressure, load, intensity, etc., when
they need to be described probabilistically, are modeled by continuous random variables. This
chapter deals with distributions of continuous random variables. A number of important
continuous distributions are introduced in this chapter. The nature and applications of these
distributions in science and engineering are discussed. An understanding of the situations
in which these random varibles arise enable us to choose an appropriate distribution for a
scientity phenomenon under consideration.

Learning Objectives
At the end of this chapter, you should be able to do the following:

1. Compute probabilities from probability density functions and cumulative distribution


functions

2. Calculate mean and variance of continuous random variables

3. Understand the assumptions of some continuous probability distributions

4. Calculate probabilities of some continuous probability distributions

4.1. Continuous Random Variables


Definition 4.1
A continuous random variable is a function whose range is an interval of real num-
bers.

Recall that a random variable is a discrete random variable if it is defined over a sample
space having a finite or countably infinite number of points. In this case, the random variable
takes on discrete values and it is possible to enumerate all the values it may assume. When
a sample space has an infinite number of sample points, the associated random variable is
continuous with its values distributed over one or more intervals on the real number line. This
distinction is necessary because discrete and continuous random variables require different
ways of assigning probability values.

4.2. Continuous Probability Functions


Definition 4.2
The function f (x) is a probability density function of the continuous random variable
X defined over the set of real numbers if

(1) f (x) ≥ 0
Z ∞
(2) f (x) dx = 1
−∞

51
4. Continuous Probability Distributions

Z b
(3) P[a ≤ x ≤ b] = f (x) dx
a

As shown in Figure 4.1, the graph lies entirely above the x-axis (Property 1). Property (2)
requires that the total area between the curve and the x-axis is 1, and property (3) is illustrated
by the shaded region in red.

f (x)

0 x
a b

Figure 4.1: Probability density function

Note that since X is a continuous random variable,

P[a < X < b] = P[a < X ≤ b] = P[a ≤ X < b] = P[a ≤ X ≤ b]

Example 4.1
Consider the function
1 2
−1 < x < 2
(
f (x) = 3x
0 elsewhere

(a) Show that it is a probability density function of some continuous random variable X.
We show that Properties (1) and (2) are satisfied.
(1) Clearly, 13 x2 ≥ 0 for all real number x.
Z ∞
1 2
(2) We must show that 3 x dx = 1.
−∞

Z ∞ Z 2 2
1 2 1 2 1 3
= = = 91 (8 + 1) = 1

3 x dx 3 x dx 9x
−∞ −1 −1

(b) Determine P[0 < X ≤ 1].


Z 1 1
1 2 1 3 1
P[0 < X ≤ 1] = = =

3 x dx 9x 9
0 0

Example 4.2
Let the continuous random variable X denote the diameter of a hole drilled in a sheet
metal component. The target diameter is 12.5 milimeters. Most random disturbances to
the process result in larger diameters. Historical data show that the distribution of X can
be modeled by a probability density function f (x) = 20 exp(−20(x − 12.5)), for x ≥ 12.5.

(a) What proportion of parts is within 12.6 and 12.8 millimeters?

52
4.3. Cumulative Distribution Functions

The desired proportion is the probability P[12.6 < X < 12.8].


Z 12.8
P[12.6 < X < 12.8] = 20 exp(−20(x − 12.5)) dx
12.6
12.8
= −e−20(x−12.5) = 0.0066

12.6

(b) If a part with a diameter greater than 12.60 mm is scrapped, what proportion of parts
is scrapped?

P[X > 12.6] = 1 − P[X ≤ 12.6]


Z 12.6
=1− 20 exp(−20(x − 12.5)) dx = 0.1353
12.5

The complete probability density function for Example 4.2 is

20e−20(x−12.5) x > 12.5


(
f (x) =
0 elsewhere

4.3. Cumulative Distribution Functions


Definition 4.3
If X is a continuous random variable with probability density function f (x), the cumu-
lative distribution function F (x) is defined as
Z x
F (x) = P[X ≤ x] = f (t) dt (4.1)
−∞

Example 4.3
Find the cumulative distribution function of the random variable of Example 4.2.
Z x
F (x) = 20 exp(−20(t − 12.5)) dt
12.5
x
= −e−20(t−12.5) = 1 − e−20(x−12.5)

12.5

The graph of the cumulative distribution function of the random variable X of Example 4.2
is shown in Figure 4.2.

F (x)

x
12.5

Figure 4.2: Cumulative distribution function of X of Example 4.2

The cumulative distribution function has the following properties:

53
4. Continuous Probability Distributions

(1) lim F (x) = 0


x→−∞

(2) lim F (x) = 1


x→+∞

(3) P[a < X ≤ b] = F (b) − F (a)


Moving the right boundary of the region to the far left brings the area closer to zero (Prop-
erty 1); moving the area to the far right brings the area closer to unity (Property 2). Prop-
erty (3) can be deduced from the property of the definite integral
Z a Z b Z b
f (x) dx + f (x) dx = f (x) dx
c a c
by taking c = −∞.
The probability density function of a continuous random variable can be determined if its
cumulative distribution function is known by applying the Fundamental Theorem of Calculus.
Z x
d
F 0 (x) = f (t) dt = f (x) (4.2)
dx −∞

Example 4.4
1
Suppose that for some continuous random variable X, F (x) = x4 − 1 for 1 ≤ x ≤ 3.

80

(a) What is the probability that X assumes a value between 1.2 and 2.6?
We apply Property 3 to compute P[1.2 < X < 2.6].

P[1.2 < X < 2.6] = F (2.6) − F (1.2)


1 4 1 4
= 80 (2.6 − 1) − 80 (1.2 − 1) = 0.5453

(b) Find the density function and use it to compute P[1.2 < X < 2.6].
1 3 1 3
f (x) = F 0 (x) = 80 (4x ) = 20 x
Z 2.6
1 3
P[1.2 < X < 2.6] = 20 x dx = 0.5453
1.2

4.4. Mean and Variance of a Continuous Random Variable


Definition 4.4
Let X be a continuous random variable with density f (x). The mean or expected
2 or V[X] are defined
value of X, denoted µX or E[X], and variance of X, denoted σX
as
Z ∞
µX = E[X] = xf (x) dx (4.3)
−∞
Z ∞
2
σX = V[X] = (x − µX )2 f (x) dx (4.4)
−∞

It is easy to show that Z ∞


σ2 = x2 f (x) dx − µ2 (4.5)
−∞
The standard deviation of a continuous random variable is defined in the same way as the
standard deviation of a discrete random variable.
√ q
σ = σ 2 = V[X]

54
4.5. Continuous Uniform Distribution

Example 4.5
Compute the mean and standard deviation of the random variable of Example 4.2.
Integration by parts can be used to show that

1
 
µ=− x+ e−20(x−12.5) = 12.55

20 12.5

We use Equation (4.5) and integrate by parts twice to compute σ 2 .


Z ∞
σ2 = x2 · 20e−20(x−12.5) dx − µ2
12.5

= − x2 + 1 1
− 12.552
 
10 x + e−20(x−12.5)

200 12.5
= (0 + 0 + 0) + 12.52 + 1 1
− 12.552
n h io
10 (12.5) + 200

= 157.505 − 157.5025 = 0.0025



σ = 0.0025 = 0.05

Example 4.6
Compute the mean and variance of the random variable of Example 4.1.
Z 2 Z 2
µ= x · 31 x2 dx = 1 3
3 x dx
−1 −1
2
1 4 1 5
= 12 x = 12 (16 − 1) = 4
−1
Z 2 Z 2  2
5
σ2 = x2 · 13 x2 dx − µ2 = 1 4
3 x dx −
−1 −1 4
2
1 5 25 1 25 51
= 15 x − 16 = 15 (32 + 1) − 16 = 80
−1

4.5. Continuous Uniform Distribution


Definition 4.5
The continuous uniform random variable has a probability density function
1
f (x) = , a<x<b (4.6)
b−a
Its distribution is called continuous uniform distribution.

The density function of a continuous uniform random variable X is shown in Figure 4.3.

f (x)

1
b−a

x
a b

Figure 4.3: Continuous uniform probability density function

55
4. Continuous Probability Distributions

Example 4.7
Let the continuous random variable X denote the current measured in a thin copper wire
in milliamperes. Assume the density function of X is f (x) = 5 for 4.9 ≤ x ≤ 5.1 mA. What
is the probability that a current measurement is between 4.95 mA and 5.1 mA?
The desired probability is
Z 5.1 Z 5.1
P[4.95 < X < 5.1] = f (x) dx = 5 dx = 0.75
4.95 4.95

The mean and variance of a continuous uniform random variable X are given below. The
derivations are left as exercises.
Continuous Uniform Distribution
a+b
µ= (4.7)
2
(b − a)2
σ2 = (4.8)
12

The cumulative distribution function of a continuous random variable X is obtained by


integration. The function F (x) and its graph are shown below.
1
Z x
x−a
F (x) = dt = (4.9)
a b−a b−a

F (x)

x
a b

Figure 4.4: Cumulative distribution function of a continuous uniform random variable

The uniform distribution is one of the simplest distributions and is commonly used in situ-
ations where there is no reason to give unequal likelihoods to possible ranges assumed by the
random variable over a given interval. For example, the arrival time of a flight might be con-
sidered uniformly distributed over a certain time interval, or the distribution of the distance
from the location of live loads on a bridge to an end support might be adequately represented
by a uniform distribution over the bridge span. Let us also comment that one often assigns
a uniform distribution to a specific random variable simply because of a lack of information,
beyond knowing the range of values it spans.

4.6. Normal Distribution


Undoubtedly, the most widely used model for a continuous measurement is a normal ran-
dom variable and its distribution, normal distribution, is the most important continuous
probability distribution. Its graph, called the normal curve, is the bell-shaped curve of Fig-
ure 4.5. The equation is derived by Karl Friedrich Gauss from a study of errors in repeated
measurements of the same quantity.

56
4.6. Normal Distribution

f (x)

x
a

Figure 4.5: The normal curve

The density function of a normal random variable X with parameters a and b is


1 1
 
f (x) = √ exp − (x − a)2 , −∞ < x < ∞
2πb 2b
Once the values of the parameters a and b are specified, the normal curve is completely
determined. In this chapter, we shall assume that these two parameters are known, perhaps
from previous investigations. Later, we shall make statistical inferences when these parameters
are unknown and have been estimated from the available experimental data.
The mean and variance of a normal random variable X with parameters a and b are
1
Z ∞
e− 2b (x−a) dx = a
1 2
µ= x· √
−∞ 2πb
1
Z ∞
σ2 = (x − µ)2 · √ e− 2b (x−a) dx = b
1 2

−∞ 2πb
With the results, we write the probability density function of a normal random variable
1
f (x) = √ e− 2σ2 (x−µ)
1 2
(4.10)
σ 2π
Figure 4.6 shows two normal curves having the same parameter σ 2 but different parameters
µ. Figure 4.7 shows two normal curves with the same parameter µ but different parameters
σ 2 . Lastly, Figure 4.8 shows two normal curves with different parameters µ and σ 2 .

f (x) µ1
µ2

x
µ2 µ1

Figure 4.6: Normal curves with µ2 < µ1 and σ12 = σ22

Based on the inspection of Figures 4.6-4.8, the properties of the normal curve were deter-
mined:
(1) The curve is symmetric about a vertical axis through x = µ.
(2) The normal curve approaches the horizontal axis asymptotically in either direction
away from x = µ.
The normal density function has no antiderivative; probability calculations are approximated
with a calculator.

57
4. Continuous Probability Distributions

f (x) σ12
σ22

x
µ1 = µ2

Figure 4.7: Normal curves with µ1 = µ2 and σ12 < σ22

f (x) {µ1 , σ12 }


{µ2 , σ22 }

x
µ1 µ2

Figure 4.8: Normal curves with µ1 < µ2 and σ12 < σ22

Example 4.8
The time X until recharge for a battery in a laptop computer under common conditions is
normally distributed with µ = 260 minutes and σ = 50 minutes. Find the probability that
a fully charged laptop lasts

(a) anywhere from 3 to 4 hours;


Z 240
1 1
√ e− 2·502 (x−260) dx = 0.2898
2
P[180 < X < 240] =
180 50 2π

(b) longer than 3 hours;


We use the symmetric property of the curve, P[−∞ < X < µ] = P[µ < X < ∞] = 0.5.

1 1
Z ∞
√ e− 2·502 (x−260) dx
2
P[X > 180] =
180 50 2π
Z 260
1 1 1 1
Z ∞
√ e− 2·502 (x−260) dx + √ e− 2·502 (x−260) dx
2 2
=
180 50 2π 260 50 2π
= 0.4452 + 0.5 = 0.9452

(c) less than 270 minutes;


Z 270
1 1
√ e− 2·502 (x−260) dx
2
P[X < 270] =
−∞ 50 2π
Z 260 Z 270
1 1
2 (x−260)
1 1
√ e− 2·502 (x−260) dx
− 2 2
= √ e 2·50 dx +
−∞ 50 2π 260 50 2π
= 0.5 + 0.0793 = 0.5793

58
4.6. Normal Distribution

(d) longer than 300 minutes.

1 1
Z ∞
√ e− 2·502 (x−260) dx
2
P[X > 300] =
300 50 2π
Z 300
1 1 1 1
Z ∞
(x−260)2
√ e− 2·502 (x−260) dx
− 2
= √ e 2·50 2
dx −
260 50 2π 260 50 2π
= 0.5 − 0.2881 = 0.2119

The notation X ∼ N(µ, σ 2 ) is used to denote a random variable X that is normally dis-
tributed.

Standard Normal Distribution


Let us now consider the random variable
X − µX
Z=
σX
where X ∼ N(µ, σ 2 ). Note that
X −µ 1 µ
Z= = X−
σ σ σ
is a linear function of X. Also, Equations (3.10) and (3.11) hold for continuous random
variables. Hence, the mean and variance of Z are
1 µ 1 µ
E[Z] = E[X] + = · µ − = 0
σ σ σ σ
 2
1 1
V[Z] = V[X] = 2 · σ 2 = 1
σ σ
To find the probability density function of Z, we use the probability density function of X.
2
1

x−µ
1 1 1 2
Z ∞ Z ∞

√ e 2 σ
dx = √ e− 2 z (σdz)
−∞ σ 2π −∞ σ 2π
1 1 2
Z ∞
= √ e− 2 z dz
−∞ 2π
Since σz = x − µ and σ dz = dx. Therefore,
1 1 2
f (z) = √ e− 2 z

is the probability density function of Z. Furthermore, the probability density function of
X ∼ N(0, 1) is
1 1 2
f (x) = √ e− 2 x

We conclude that Z also follows a normal distribution with µ = 0 and σ 2 = 1. This particular
normal distribution is called the standard normal distribution and is denoted by N(0, 1).
Definition 4.6
If X follows a normal distribution with mean µ and variance σ 2 , the random variable
X −µ
Z= (4.11)
σ
is called the standard normal random variable with probability density function
1 1 2
f (z) = √ e− 2 z (4.12)

59
4. Continuous Probability Distributions

Example 4.9
Determine the value of P[−1.6 < Z < 0.74] for the standard normal random variable Z.
Z 0.74
1 1 2
P[−1.6 < Z < 0.74] = √ e− 2 z dz = 0.7155
−1.6 2π
Table A-1 of Appendix A provides cumulative probabilities for a standard normal random
variable. The use of Table A-1 is illustrated by the following example.
Example 4.10
Assume that Z is a standard normal random variable.

(a) Find P[Z < 1.52].


Read down the z0 column to the row that equals 1.5.
z0 0.00 0.01 0.02 0.03
0.0 0.500000 0.503989 0.507978 0.511966
.. ..
. .
1.5 0.933193 0.934478 0.935745 0.936992
The column headings refer to the hundredths digit of the value of z0 in P[Z < z0 ]. Thus,

P[Z < 1.52] = 0.935745

(b) P[Z > 1.26].

P[Z > 1.26] = 1 − P[Z ≤ 1.26] = 1 − 0.896165 = 0.103835

(c) P[−1.25 < Z < 0.37].

P[−1.25 < Z < 0.37] = P[Z < 0.37] − P[Z < −1.25] = 0.644309 − 0.105650 = 0.538659

If X ∼ N(µ, σ 2 ),

P[x1 < X < x2 ] = P[x1 − µ < X − µ < x2 − µ]


x1 − µ X −µ x2 − µ
 
=P < <
σ σ σ
= P[z1 < Z < z2 ]

where
x1 − µ x2 − µ
z1 = , z2 =
σ σ
The areas described by P[x1 < X < x2 ] and P[z1 < Z < z2 ] are shown on the same axes (see
Figure 4.9). The shaded region have equal areas.

z1 z2 x1 x2

Figure 4.9: Normal (orange) and standard normal (violet) distributions

60
4.6. Normal Distribution

The next example illustrates how to compute probability values for a normal distribution
using Table A-1.
Example 4.11
The life of a semiconductor laser at a constant power is normally distributed with a mean
of 7000 hours and a standard deviation of 600 hours.

(a) What is the probability that a laser fails before 5000 hours?
Let X be the life of a semiconductor laser, where X ∼ N(7000, 6002 ). The probability we
seek is P[X < 5000]. We will use Equation (4.11) to convert x = 5000 to z-score.
5000 − 7000
z= = −3.33
600
From Table A-1,
P[X < 5000] = P[Z < −3.33] = 0.000434

(b) If three lasers are used in a product and they are assumed to fail independently, what
is the probability that all three are still operating after 8000 hours?
Let p be the probability that each laser is still operating after 8000 hours, where

p = P[X > 8000] = P[Z > 1.67] = 1 − P[Z ≤ 1.67] = 0.047460

Thus, the probability that three independent lasers are still operating after 8000 hours is
p3 = 0.000107.

According to Chebyshev’s theorem on page 37, the probability that a random variable
assumes a value within k = 2 standard deviations of the mean is at least 43 . If the random
variable has a normal distribution, the z values corresponding to x1 = µ − 2σ and x2 = µ + 2σ
are z1 = −2 and z2 = 2 respectively, and

P[−2 < Z < 2] = 0.9545

which is a much stronger statement than that given by Chebyshev’s theorem.


Sometimes, we are required to find the value of z corresponding to a specified probability
that falls between values listed in Table A-1. For convenience, we shall always choose the z
value corresponding to the tabular probability that comes closest to the specified probability.
Example 4.12
Find the value of z0 such that

(a) P[Z < z0 ] = 0.365.


We use Table A-1 in reverse. We find an entry that is nearest 0.365.
z0 -0.09 ··· -0.05 -0.04
-3.9 0.000033 ··· 0.000039 0.000041
.. ..
. .
-0.3 0.348268 ··· 0.363169 0.366928
Since 0.365 is nearer to 0.366928, we take z0 = −0.34
(0.363169 + 0.366928)/2 = 0.3650485 ≈ 0.365 so the average of z0 values can also be used, that is,
z0 = (−0.34 + −0.33)/2 = −0.335.

(b) P[Z > z0 ] = 0.19.

61
4. Continuous Probability Distributions

We express the equation as a cumulative probability.

P[Z ≤ z0 ] = 1 − P[Z > z0 ] = 1 − 0.19 = 0.81

z0 ··· 0.07 0.08


0.8 ··· 0.807850 0.810570
We find z0 = 0.88.
The next example illustrates how to find a value x that satisfies a given condition.
Example 4.13
Refer to the semiconductor laser of Example 4.11. What is the life in hours that 90% of the
lasers exceed?
We are looking for the life x such that P[X > x] = 0.90. We find x in two steps:

i) Find z such that P[Z > z] = 0.90 or P[Z < z] = 0.1.


From Table A-1, z = −1.28.

ii) Convert z-score to x using Equation (4.11)

x = µ + zσ = 7000 + (−1.28)(600) = 6232

10% of the semiconductor lasers have a life of 6232 hours or less, and 90% of them have a life
exceeding 6232 hours.

4.7. Exponential Distribution


The discussion of the Poisson distribution defined a random variable to be the number of
outcomes in a time interval or a region in space which we shall call distance in this section.
The distance between outcomes is another random variable that is often of interest. Let the
random variable X denote the length from any starting point until an outcome occurs. As
you might expect, the distribution of X can be obtained from knowledge of the distribution
of the number of flaws. The key to the relationship is the following concept. The distance
to the first outcome exceeds x units of distance if and only if there are no outcomes within a
distance of x units – simple but sufficient for an analysis of the distribution of X.
In general, let the random variable N denote the number of outcomes in x units. If the
mean number of outcomes is λ per unit distance, N has a Poisson distribution with mean λx.
Now, the probability that the first outcome occurs after a distance x, P[X > x], is the same
probability that no outcome occur in that same distance x, P[N = 0]. Thus,

e−λx (λx)0
P[X > x] = P[N = 0] = = e−λx (4.13)
0!
Therefore,
F (x) = P[X ≤ x] = 1 − P[X > x] = 1 − e−λx
is the cumulative distribution of X. By differentiating F (x), the probability density function
of X is
f (x) = F 0 (x) = λe−λx , x ≥ 0
The derivation of the distribution of X depends only on the assumption that the outcomes
follow a Poisson process. Also, the starting point for measuring X does not matter because
the probability of the number of outcomes in an given distance of a Poisson process depends
only on the distance of the interval, not on the location. For any Poisson process, the following
general result applies.

62
4.7. Exponential Distribution

Definition 4.7
The random variable X, the distance between successive events from a Poisson process
with mean number of events λ > 0 per unit distance is an exponential random
variable with parameter λ. The probability density function of X is

f (x) = λe−λx (4.14)

The exponential distribution obtains its name from the exponential function in the prob-
ability density function. See plots of the exponential distribution for selected values of λ in
Figure 4.10.
f (x)
1
λ=1
0.8 λ = 0.4
λ = 0.2
0.6

0.4

0.2

0 x
0 2 4 6 8 10 12

Figure 4.10: Exponential density function

The following results are easily obtained and are left as an exercise.

Exponential Distribution

1
µ= (4.15)
λ
2 1
σ = 2 (4.16)
λ

The quantity µ is called the mean distance or mean lifetime.


Example 4.14
In a large corporate computer network, user log-ons to the system can be modeled as a
Poisson process with a mean of 25 log-ons per hour.

(a) What is the probability that there are no log-ons in an interval of six minutes?
Let X denote the time in hours from the start of the interval until the first log-on. Then
X has an exponential distribution with λ = 25 log-ons per hour. We are interested in
P[X > 0.1]. Why 0.1?
Z ∞ ∞
P[X > 0.1] = 25e−25x dx = −e−25x = 0.0821

0.1 0.1

(b) What is the probability that the time until the next log-on is between two and three
minutes?
1 1 1 1
h i    
P 30 <X< 20 =F 20 −F 30 = 0.7135 − 0.5654 = 0.1481

(c) Determine the interval of time such that the probability that no log-on occurs in the
interval is 0.90.

63
4. Continuous Probability Distributions

We seek the number x such that P[X > x] = 0.90.

P[X > x] = 0.90


e−25x = 0.90 (see Equation 4.13)
1
x= − 25 ln 0.90 = 0.0042

About 90% of the time, consecutive log-ons occur at least 0.0025 hr or 0.25 minute interval.

The exponential distribution plays an important role in both queuing theory and reliability
problems. Time between arrivals at service facilities and time to failure of component parts and
electrical systems often are nicely modeled by the exponential distribution. Other applications
include survival times in biomedical experiments and computer response time. In reliability
theory, equipment breakdown can be modeled by the exponential distribution.
Example 4.15
The lifetime of a mechanical assembly in a vibration test is exponentially distributed with
a mean of 400 hours.

(a) What is the probability that an assembly on test fails in less than 100 hours?
Let X be the time before a mechanical assembly in a vibration test fails, measured in hours.
The mean lifetime is µ = 400 hours.

P[X < 100] = 1 − F (100) = 0.2212

(b) What is the probability that an assembly operates for more than 500 hours before
failure?
P[X > 500] = e− 400 (500) = 0.2865
1

If an assembly has been on test for 500 hours without a failure, one may feel that a failure
is ‘due’. That is, the probability of a failure in the next 100 hours is higher than 0.2212. To
see if this is true, we compute the requested probability which is a conditional probability.

P[(X < 600) ∩ (X > 500)]


P X < 600 X > 500 =
 
P[X > 500]
P[500 < X < 600]
=
P[X > 500]
F (600) − F (500)
= 1
e− 400 (500)
0.06337463671
= = 0.2212
0.28650476969
The probability that a failure is due in the next 100 hours when 500 hours have passed
without a failure is the same as the probability of a failure in 100 hours immediately after
starting the counter. The fact that 500 hours have passed does not change the probability
of a failure in the next 100 hours. This is called the memoryless property of the exponential
distribution. This memoryless property is stated in Equation (4.17).

P X < t1 + t2 X > t1 = P[X < t2 ] (4.17)


 

Chapter Summary

• If f (x) is the probability density function of a continuous random variable X, the probability

64
4.7. Exponential Distribution

of the random variable assuming a value in the interval [a, b] is


Z b
P[a < X < b] = f (x) dx
a

• The cumulative distribution of a continuous random variable X with density f (x) is


Z x
P[X < x] = f (t) dt
−∞

• the mean and variance of a continuous random variable X with density f (x) are
Z ∞
µ= f (x) dx
−∞
Z ∞
σ2 = x2 f (x) dx − µ2
−∞

• If X is a continuous uniform random variable in the interval [a, b],


1
f (x) =
b−a
x−a
F (x) =
b−a
a+b
µ=
2
(b − a)2
σ2 =
12

• A normal random variable X with mean µ and variance σ 2 has the density function
1 1 2
f (x) = √ e− 2σ2 (x−µ)
σ 2π

• The standard normal random variable Z has mean µ = 0 an variance σ 2 = 1.


• Probability values for the standard normal variable are approximated using a scientific cal-
culator or by using a table of cumulative values of the standard normal random variable.
• The linear function
X −µ
Z=
σ
transforms a normal random variable X to a standard normal random variable Z. The
random variable X is said to be standardized.
• The exponential distribution with parameter λ, has density and cumulative functions

f (x) = λe−λx , x>0


F (x) = 1 − e −λx

• The mean and variance of the exponential distribution are


1
µ=
λ
1
σ2 = 2
λ

• The mean of an exponential distribution is also called mean distance or mean lifetime.

65
4. Continuous Probability Distributions

Exercises
4-1. The shelf life, in days, for bottles of a certain 4-9. What is the the mean shelf life of a bottle of the
prescribed medicine is a random variable having the prescribed medicine in Exercise 4-1?
density function 4-10. Find the mean and standard deviation of the
( random variable in Exercise 4-2.
20,000
x>0
f (x) = (x+100)3
4-11. Find the mean and standard deviation of the
0 elsewhere
random variable in Exercise 4-4.
Find the probability that a bottle of this medicine will 4-12. The thickness of a conductive coating in mi-
have a shelf life of crometers has a density function of 600
x2
for 100 < x <
a) at least 200 days; 120. Determine the mean and variance of the coating
thickness.
b) anywhere from 80 to 120 days.
4-13. The thickness of photoresist applied to wafers in
4-2. The total number of hours, measured in units of
semiconductor manufacturing at a particular location
100 hours, that a family runs a vacuum cleaner over a
on the wafer is uniformly distributed between 0.2050
period of one year is a continuous random variable X
and 0.2150 micrometers. Determine the following:
that has the density function
a) Cumulative distribution function of photoresist
0<x<1 thickness

x
f (x) = 2−x 1≤x<2 b) Proportion of wafers that exceeds 0.2125 microm-

0 elsewhere eters in photoresist thickness
c) Thickness exceeded by 10% of the wafers
Find the probability that over a period of one year, a d) Mean and variance of photoresist thickness
family runs their vacuum cleaner
4-14. Derive Equation (4.7).
a) less than 120 hours;
4-15. Derive Equation (4.8).
b) between 50 and 100 hours.
4-16. Use Table A-1 to determine the following prob-
4-3. A test instrument needs to be calibrated peri-
abilities for the standard normal random variable Z.
odically to prevent measurement errors. After some
time of use without calibration, it is known that the a) P[Z < 1.32] b) P[Z < 3.0]
probability density function of the measurement error c) P[Z > 1.45] d) P[Z > −2.15]
is f (x) = 1 − 0.5x for 0 < x < 2 millimeters.
e) P[−2.34 < Z < 1.76]
a) If the measurement error within 0.5 millimeters is
4-17. Use Table A-1 to determine the following prob-
acceptable, what is the probability that the error
is not acceptable before calibration? abilities for the standard normal random variable Z.
a) P[−1 < Z < 1] b) P[−2 < Z < 2]
b) What is the value of measurement error exceeded
with probability 0.2 before calibration? c) P[−3 < Z < 3] d) P[Z < 3]
4-4. The distribution of X is approximated with a tri- e) P[0 < Z < 1.5]
angular probability density function f (x) = 0.0025x −
4-18. Assume that Z has a standard normal distribu-
0.075 for 30 < x ≤ 50 and f (x) = −0.0025x + 0.175
tion. Use Table A-1 to determine the value for z that
for 50 < x ≤ 70. Determine the following:
solves each of the following:
a) P[X ≤ 40] b) P[40 ≤ X ≤ 60] a) P[Z < z] = 0.9 b) P[Z < z] = 0.5
c) Value x exceeded with probability 0.99. c) P[Z > z] = 0.1 d) P[Z > z] = 0.9
4-5. Suppose that for a random variable X,
e) P[−1.24 < Z < z] = 0.8
3 4-19. Assume that Z has a standard normal distribu-
f (x) = (8x − x2 ), 0<x<8
k tion. Use Table A-1 to determine the value for z that
for some number k. solves each of the following:
a) Find the value of k so that f (x) is a density func- a) P[−z < Z < z] = 0.95
tion. b) P[−z < Z < z] = 0.99
b) Determine the cumulative distribution function of c) P[−z < Z < z] = 0.0.68
X. d) P[−z < Z < z] = 0.9973
c) Use the cumulative distribution function to deter- 4-20. Assume that X is normally distributed with a
mine P[X < 2]. mean of 10 and a standard deviation of 2. Determine
4-6. Find the cumulative distribution function of the the following:
random variable in Exercise 4-1. a) P[X < 13] b) P[X > 9]
4-7. Find the cumulative distribution function of the c) P[6 < X < 14]
random variable in Exercise 4-2. 4-21. Assume that X is normally distributed with a
4-8. Find the cumulative distribution of the random mean of 10 and a standard deviation of 2. Determine
variable in Exercise 4-4. the value for x that solves each of the following:

66
Exercises

a) P[X > x] = 0.5 b) P[X > x] = 0.95 g) An adult whose cholesterol level is less than one
standard deviations below the mean is thought to
c) P[x < X < 10] = 0.2 be at low risk. What percentage of the population
d) P[−x < X − 10 < x] = 0.95 is at low risk?
h) What is the cholesterol range for an adult that is
e) P[−x < X − 10 < x] = 0.99 at moderate risk?
4-22. The compressive strength of samples of cement i) What is the cholesterol range for an adult that is
can be modeled by a normal distribution with a mean at low risk?
of 6000 kilograms per square centimeter and a stan- 4-26. The weight of a sophisticated running shoe is
dard deviation of 100 kilograms per square centimeter. normally distributed with a mean of 12 ounces and a
a) What is the probability that a sample’s strength standard deviation of 0.5 ounce.
is less than 6250 kg/cm2 ? a) What is the probability that a shoe weighs more
b) What is the probability that a sample’s strength than 13 ounces?
is between 5800 and 5900 kg/cm2 ? b) What must the standard deviation of weight be in
order for the company to state that 99.9% of its
4-23. The tensile strength of paper is modeled by a
shoes weighs less than 13 ounces?
normal distribution with a mean of 35 pounds per
c) If the standard deviation remains at 0.5 ounce,
square inch and a standard deviation of 2 pounds per
what must the mean weight be for the company
square inch.
to state that 99.9% of its shoes weighs less than
a) What is the probability that the strength of a 13 ounces?
sample is less than 40 lb/in2 ?
4-27. Assume that a random variable is normally dis-
b) If the specifications require the tensile strength to tributed with a mean of 24 and a standard deviation of
exceed 30 lb/in2 , what proportion of the samples 2. Consider an interval of length one unit that starts
is scrapped? at the value a so that the interval is [a, a+1]. For what
4-24. The fill volume of an automated filling machine value of a is the probability of the interval greatest?
used for filling cans of carbonated beverage is nor- Does the standard deviation affect that choice of in-
mally distributed with a mean of 12.4 fluid ounces terval?
and a standard deviation of 0.1 fluid ounce. 4-28. Suppose that f (x) = e−x for x > 0. Determine
a) What is the probability that a fill volume is less the following:
than 12 fluid ounces? a) P[1 < X] b) P[1 < X < 2.5]
b) If all cans less than 12.1 or more than 12.6 c) t such that P[X < t] = 0.10
ounces are scrapped, what proportion of cans is 4-29. Suppose that X has an exponential distribution
scrapped? with λ = 2. Determine the following:
c) Determine specifications that are symmetric a) P[X ≤ 0] b) P[X > 2]
about the mean that include 99% of all cans. c) P[X ≤ 1] d) P[1 < X < 2]
4-25. Cholesterol is a fatty substance that is an im- 4-30. Suppose that X has an exponential distribution
portant part of the outer lining (membrane) of cells in with mean equal to 10. Determine the following:
the body of animals. Its normal range for an adult is a) P[X > 10] b) P[X > 20]
120–240 mg/dl. The Food and Nutrition Institute of c) P[X < 30] d) P[X < 5]
the Philippines found that the total cholesterol level e) P[X < 15 |X > 10]
for Filipino adults has a mean of 159.2 mg/dl and 4-31. The distance between major cracks in a high-
84.1% of adults have a cholesterol level less than 200 way follows an exponential distribution with a mean
mg/dl (http://www.fnri.dost.gov.ph/). Suppose that of five miles.
the total cholesterol level is normally distributed.
a) What is the probability that there are no major
a) Determine the standard deviation of this distribu- cracks in a 10-mile stretch of the highway?
tion. b) What is the probability that the first major crack
b) What is the 25% percentile of this distribution? occurs between 12 and 15 miles of the start of
That is, P[X < x] = 0.25. inspection?
c) What is the 75% percentile (also called third quar- c) Given that there are no cracks in the first five
tile) of this distribution? miles inspected, what is the probability that there
d) What is the value of the cholesterol level that ex- are no major cracks in the next 10 miles in-
ceeds 90% of the population? spected?
e) An adult is at moderate risk if cholesterol level is 4-32. The CPU of a personal computer has a lifetime
more than one but less than two standard devi- that is exponentially distributed with a mean lifetime
ations above the mean. What percentage of the of six years. You have owned this CPU for three years.
population is at moderate risk according to this a) What is the probability that the CPU fails in the
criterion? next three years?
f) An adult whose cholesterol level is more than two b) Assume that your corporation has owned 10 CPUs
standard deviations above the mean is thought to for three years, and assume that the CPUs fail in-
be at high risk. What percentage of the popula- dependently. What is the probability that at least
tion is at high risk? one fails within the next three years?

67
4. Continuous Probability Distributions

4-33. The time between calls to a plumbing supply e) Determine x such that the probability that you
business is exponentially distributed with a mean time wait less than x minutes is 0.50.
between calls of 15 minutes. 4-35. The life of automobile voltage regulators has an
a) What is the probability that there are no calls exponential distribution with a mean life of six years.
within a 30-minute interval? You purchase a six-year-old automobile, with a work-
b) What is the probability that at least one call ar- ing voltage regulator and plan to own it for six years.
rives within a 10-minute interval? a) What is the probability that the voltage regulator
c) What is the probability that the first call arrives fails during your ownership?
within 5 and 10 minutes after opening? b) If your regulator fails after you own the automo-
d) Determine the length of an interval of time such bile three years and it is replaced, what is the
that the probability of at least one call in the in- mean time until the next failure?
terval is 0.90.
4-36. Assume that the flaws along a magnetic tape
4-34. The time between arrivals of taxis at a busy in- follow a Poisson distribution with a mean of 0.2 flaw
tersection is exponentially distributed with a mean of per meter. Let X denote the distance between two
10 minutes. successive flaws.
a) What is the probability that you wait longer than a) What is the mean of X?
one hour for a taxi?
b) What is the probability that there are no flaws in
b) Suppose that you have already been waiting for 10 consecutive meters of tape?
one hour for a taxi. What is the probability that
c) How many meters of tape need to be inspected so
one arrives within the next 10 minutes?
that the probability that at least one flaw is found
c) Determine x such that the probability that you is 90%?
wait more than x minutes is 0.10.
d) Determine x such that the probability that you 4-37. Derive Equation (4.15).
wait less than x minutes is 0.90. 4-38. Derive Equation (4.16).

68
5. Joint Probability Distributions
Our study of random variables and their probability distributions in the preceding chapters is
restricted to one-dimensional sample spaces, in that we recorded outcomes of an experiment as
values assumed by a single random variable. There will be situations, however, where we may
find it desirable to record the simultaneous outcomes of several random variables. The study
of probability distributions for more than one random variable is the focus of this chapter.

Learning Objectives
At the end of this chapter, you should be able to do the following:
1. Use joint probability mass functions and joint probability density functions to calculate
probabilities

2. Calculate marginal and conditional probability distributions from joint probability dis-
tributions

3. Interpret and calculate covariances and correlations between random variables

4. Calculate means and variances for linear combinations of random variables and calculate
probabilities for linear combinations of normally distributed random variables

5. Determine the distribution of a general function of a random variable


In the previous chapters, we studied probability distributions for a single random variable.
However, it is often useful to have more than one random variable defined in a random experi-
ment. For example, the continuous random variable X can denote the length of one dimension
of an injection-molded part, and the continuous random variable Y might denote the length
of another dimension. We might be interested in probabilities that can be expressed in terms
of both X and Y .
In general, if X and Y are two random variables, the probability distribution that defines
their simultaneous behavior is called a joint probability distribution. In this chapter, we
investigate some important properties of these joint distributions.

5.1. Two Random Variables


For simplicity, we begin by considering random experiments in which only two random vari-
ables are studied. In later sections, we generalize the presentation to the joint probability
distribution of more than two random variables.

5.1.1. Joint Probability Function


If X and Y are discrete random variables, the joint probability distribution of X and Y is a
descripion of the set of points (x, y) in the range of (X, Y ) along with the probability of each
point. Also, P[X = x and Y = y] is usually written as P[X = x, Y = y]. The joint probability
distribution of two random variables is sometimes referred to as the bivariate probability
distribution or bivariate distribution of the random variables. One way to describe the
joint probability distribution of two discrete random variables is through a joint probability
mass function f (x, y).

69
5. Joint Probability Distributions

Definition 5.1
The joint probability mass function of the discrete random variables X and Y ,
denoted f (x, y), satisfies

(1) f (x, y) ≥ 0

(2) f (x, y) = 1
XX

all x all y

(3) f (x, y) = P[X = x, Y = y]

Example 5.1
Two ballpoint pens are selected at random from a box that contains 3 blue pens, 2 red pens,
and 3 green pens. Let X be the number of blue pens selected and Y be the number of red
pens selected.

(a) Find the joint probability mass function f (x, y).


Clearly, x = 0, 1, 2 and y = 0, 1, 2 with the restriction 0 ≤ x + y ≤ 2 since two ballpoint
pens are selected. The possible pair of values (x, y) are (0, 0), (0, 1), (0, 2), (1, 0), (1, 1) and
(2, 0). f (1, 0) represents the probability that 1 blue ballpen and no red ballpen are selected.
The probability mass function is
3  2 3 
x y 2−x−y
f (x, y) = 8
2

The probability at each mass point (x, y) is shown in Table 5.1.

(b) Find the probability of selecting at least one green pen at random.
Let G1 and G2 be the events of selecting 1 and 2 green pens.

P[G1 ∪ G2 ] = P[G1 ] + P[G2 ]


= [f (1, 0) + f (0, 1)] + f (0, 0)
" 3 2 3 3 2 3
# 3 2 3
1 0 1 0 1 1
= 8 + 8 + 0 08 2
2 2 2
9 6 3 9
 
= + + =
28 28 28 14

Table 5.1: Joint Probability Distribution for Example 5.1

y
Total
0 1 2
3 3 1 5
0 28 14 14 14
9 3 15
x 1 28 14 0 28
3 3
2 28 0 0 28
15 3 1
Total 28 7 28 1

The joint probability distribution of two continuous random variables X and Y can be
specified by providing a method for calculating the probability that X and Y assume a value
in any region R of two-dimensional space. Analogous to the probability density function of
a single continuous random variable, a joint probability density function can be defined over
two-dimensional space.

70
5.1. Two Random Variables

Definition 5.2
The joint probability density function of the continuous random variables X and
Y , denoted f (x, y), satisfies

(1) f (x, y) ≥ 0
Z ∞ Z ∞
(2) f (x, y) dx dy = 1
−∞ −∞
ZZ
(3) P[(X, Y ) ∈ R] = f (x, y) dx dy for any region R in the xy-plane.
R

Example 5.2
Let X and Y be continuous random variables with joint density function
1 2
f (x, y) = 36 x y, −1 ≤ x ≤ 2, 1 ≤ y ≤ 5

(a) Calculate P[X ≥ 0, 1 ≤ Y < 4].


The probability we seek is
Z 2Z 4
1 2
P[0 ≤ X < 2, 1 ≤ Y < 4] = 36 x y dy dx
0 1

We first compute the inner integral, leaving the variable x ‘untouched’ since the variable of
integration is y.
Z 4 2 y=4

1 2 y 1 2 15
 
1 2 5 2
36 x y dy = 36 x · 2 = 36 x
2
= 24 x
1 y=1

We now compute the outer integral.


2
x3
Z 2 Z 4 Z 2
8 5
  
1 2 5 2 5 5
36 x y dy dx = 24 x dx = 24 · = 24 =
0 1 0 3 0 3 9

(b) Calculate P[X ≥ 1, X ≤ Y < 4].

P[X ≥ 1, X ≤ Y < 4] = P[1 ≤ X < 2, X ≤ Y < 4]


Z 2Z 4 Z 2 Z 2 
1 2 1 2
= 36 x y dy dx = 36 x y dy dx
1 1
x y=4  x

y 2 
Z 2
=  1 x2 · dx
36 2 y=x

1

x2
Z 2 ! Z 2
1 2 2 2 1 2
= 36 x 8− dx = 9x − 72 x dx
1 2 1

2 3 1 5
 2
2 1 467
= 27 x − 360 x = 27 (8 − 1) − 360 (32 − 1) =
1 1080

(c) Calculate P −1 < X < 21 (Y − 1), 1 < Y < 5 .


h i

Inserting the limits of integration, we realize that the double integral cannot be evaluated.
Z 1
(y−1) Z 5
2
1 2
36 x y dy dx
−1 1

71
5. Joint Probability Distributions

So we switch the limits of integration and the differentials. The order of integration is with
respect to x first, then with respect to y.
x= 1 (y−1)
x3 2
Z 5Z 1
(y−1) Z 5
2
1 2 1
36 x y dx dy = 36 y · dy
3 x=−1

1 −1 1
Z 5
1 1
− 1)3 + 1 dy
h i
= 36·3 y 8 (y
1
Z 5
= 1
38·3·8 y(y − 1)3 + 1
36·3 y dy
1
i 5
1
− 1)4 (4y + 1) + 1 2
h
= 36·3·8·20 (y 36·3·2 y

1

44 · 21 24 19
= + =
36 · 3 · 8 · 20 36 · 3 · 2 45

5.1.2. Marginal Probability Distribution


If more than one random variable is defined in a random experiment, it is important to distin-
guish between the joint probability distribution of X and Y and the probability distribution
of each variable individually. The individual probability distribution of a random variable is
referred to as its marginal probability distribution.
Definition 5.3 (Marginal Probability Function)

If f (x, y) is the joint probability mass function of the discrete random variables X and
Y , the marginal probability mass function of the random variables are

f (x) = f (x, y) (5.1)


X

all y

f (y) = f (x, y) (5.2)


X

all x

If f (x, y) is the joint probability density function of the continuous random variables X
and Y , the marginal probability density function of the random variables are
Z ∞
f (x) = f (x, y) dy (5.3)
−∞
Z ∞
f (y) = f (x, y) dx (5.4)
−∞

The marginal probability mass function of Example 5.1 can be found at the total row and
total column (called margins) of Table 5.1. The marginal distribution of X is the total column
and the marginal distribution of Y is the total row.

5 15
 
 14

 x=0  28

 y=0
fX (x) = 15 fY (y) = 3
28 x=1 7 y=1
3 1
 
x=2 y=2
 
28 28

Example 5.3
Obtain the marginal probability density function of X in Example 5.2 and verify Property 2
of Definition 4.2.
Z ∞
1 2
f (x) = 36 x y dy
−∞

72
5.1. Two Random Variables

y=5
y 2
Z 5
1 2 1 2
= 36 x y dy = 36 x ·
2 y=1

1
1 2
= 36 x (12) = 13 x2
2
x3
Z 2
8 −1
Z ∞
1 2 1 2
3 x dx = 3 x dx = = − =1
−∞ −1 9 −1 9
9

The mean and variance of the random variable X can be obtained by first calculating the
marginal probability distribution of X from a joint probability distribution using Definition 5.3
and computing E[X] and V[X].

5.1.3. Conditional Probability Distribution


When two random variables are defined in a random experiment, knowledge of one can change
the probabilities that we associate with the values of the other. Recall that in Example 5.1,
X and Y denote the number of blue pens and red pens, respectively. We can expect the
probability that X = 1 to be greater at Y = 0 than at Y = 1. From the notation of conditional
probability
in Section 2.3.1, we can write such conditional probabilities as P X = 1 Y = 0 and
 

P X = 1 Y = 1 . Consequently, the random variables X and Y are expected to be dependent.


 

Knowledge of the value obtained for Y changes the probabilities associated with the values of
X.
Example 5.4
Compute P X = 1 Y = 0 and P X = 1 Y = 1 for Example 5.1.
   

P[X = 1, Y = 0]
P X = 1 Y = 0 =
 
P[Y = 0]
f (1, 0) 9/28 3
= = =
fY (0) 15/28 5
P[X = 1, Y = 1]
P X = 1 Y = 1 =
 
P[Y = 1]
f (1, 1) 3/14 1
= = =
fY (1) 3/7 2

Also,

P X = 0 Y = 0 + P X = 1 Y = 0 + P X = 2 Y = 0
     

P[X = 0, Y = 0] P[X = 1, Y = 0] P[X = 2, Y = 0]


= + +
P[Y = 0] P[Y = 0] P[Y = 0]
3/28 9/28 3/28
= + + =1
15/28 15/28 15/28
This set of probabilities defines the conditional distribution of X given that Y = 0.
The conditional probability for a joint probability density function is defined below.
Definition 5.4
Let X and Y be continuous random variables. The conditional probability density
function of Y given X = x, denoted P Y X = x is


f (x, y)
P Y X = x = fY |x (y) = f (x) > 0 (5.5)
 
,
f (x)

Because fY |x (y) is a density function, the following properties are satisfied:

73
5. Joint Probability Distributions

(1) fY |x (y) ≥ 0
Z ∞
(2) fY |x (y) dy = 1
−∞
Z b
(3) P a < Y < b X = x = fY |x (y) dy
 
a

Example 5.5
Obtain the conditional probability fY |x (y) from the joint probability density function of the
continuous random variables X and Y of Example 5.2, and compute P 2 < Y < 3 X = 1.5 .
 

1 2
f (x, y) 36 x y 1
fY |x (y) = = 1 2 = y
f (x) 3x
12
Z 3
1 5
P 2 < Y < 3 1.5 = =
 
12 y dy 24
2

Example 5.6
A privately owned business operates both a drive-in facility and a walk-in facility. On a
randomly selected day, let X and Y , respectively, be the proportions of the time that the
drive-in and the walk-in facilities are in use, and suppose that the joint density function of
these random variables is

f (x, y) = 52 (2x + 3y), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1

(a) Determine P Y X = x .
 

We first find f (x), the marginal distribution of X, and apply Equation (5.5) to determine
the desired conditional probability.
Z ∞ Z 1
2
f (x) = f (x, y) dy = 5 (2x + 3y) dy
−∞ 0
! y=1
2 3y 2
=2 3
 
= 5 2x · y + 2x +
2 y=0 5 2

2
f (x, y) (2x + 3y) 2(2x + 3y)
fY |x (x, y) = = 52 3 =
f (x) 5 (2x + 2 )
4x + 3

(b) Compute the conditional probability P 0.2 < X < 0.6 Y = 0.8 .
 

First, we determine the marginal density fY (0.8).


Z 1 1
fY (0.8) = 2
5 (2x + 3.4) dx = 52 (x2 + 3.4x) = 52 (3.4)

0 0

The desired probability is


Z 0.6 Z 0.6
f (x, 0.8)
P 0.2 < X < 0.8 Y = 0.8 = fX|y (x, 0.8) dx =
 
dx
0.2 0.2 f (0.8)
Z 0.6 2
5 (2x + 2.4)
= 2 dx
0.2 5 (3.4)
0.6
= 1
(x2 + 2.4x) = 1
3.4 (2.24) = 56

3.4 0.2 85

Because the conditional event Y x is a random variable, it has a mean and variance.

74
5.1. Two Random Variables

Definition 5.5
The conditional mean µY |x and conditional variance σY2 |x are
Z ∞
µY |x = E Y X = x = yfY |x (x, y) dy (5.6)
 
−∞
Z ∞
σY2 |x = V Y X = x = y 2 fY |x (x, y) dy − µ2Y |x (5.7)
 
−∞

Independence

In general, the conditional probability distribution function fY |x (x, y) is a function of x and y.


This is illustrated in Example 5.6(a). When the conditional probability distribution function
fY |x (x, y) does not depend on x, then the random variables X and Y are independent.
Definition 5.6 (Independence)

Let X, Y be continuous random variables with joint probability density function


fXY (x, y). Suppose that the conditional probability density and marginal density func-
tions of Y are fY |x (y) and fY (y), respectively. If fY |x (y) = fY (y) for all x, then X
and Y are independent. If the equality fails for some value of x, then X and Y are not
independent.

Example 5.7
Show that the random variables X and Y of Example 5.2 are independent.
In Example 5.3, we determined the marginal density of X to be

fX (x) = 13 x2

We compute fX|y (x, y) next.


Z 2
1 2
fY (y) = 36 x y dx
−1
x=2
1 x3 1 1
= 36 y · = 36 y(3) = 12 y
3 x=−1

1 2
fXY (x, y) 36 x y
fX|y (x, y) = = 1 = 31 x2
fY (y) 12 y

Since fX (x) = fX|y (x, y), we conclude that X and Y are independent.

For the random variables X and Y , if any one of the following properties is true, then the
other properties are also true, and X and Y are independent.

(i) fY |x (y) = fY (y)

(ii) fX|y (x) = fX (x)

(iii) fXY (x, y) = fX (x)fY (y)

(iv) P[X ∈ A, Y ∈ B] = P[X ∈ A] · P[Y ∈ B] for any sets A and B in the range of X and Y
respectively.

75
5. Joint Probability Distributions

In Example 5.3, we found the marginal density of X to be fX (x) = 13 x2 . In Example 5.7,


1
we showed that fY (y) = 12 y. The product

fX (x)fY (y) = 31 x2 1 1 2
 
12 y = 36 x y = fXY (x, y)

satisfies Property (iii). We conclude that X and Y are independent. Furthermore,

P[X ≥ 0, 1 ≤ Y < 4] = P[X ≥ 0] · P[1 ≤ Y < 4]


Z 2  Z 4 
1 2 1
= 3 x dx 12 y dy
 0   1
2 4
x3 y 2
=   
9 0 24 1
8 5 5
  
= =
9 8 9

using Property (iv). The computed probability agrees with Example 5.2(a).

5.1.4. Covariance and Correlation


When two or more random variables are defined on a probability space, it is useful to describe
how they vary together; that is, it is useful to measure the relationship between the variables.
A common measure of the relationship between two random variables is the covariance. To
define the covariance, we need to describe the expected value of a function of two random
variables h(X, Y ). The definition simply extends the one for a function of a single random
variable.
Definition 5.7
If X and Y are random variables with joint probability distribution function f (x, y), the
expected value of a function h(X, Y ) of the random variables is
P P

 h(x, y)f (x, y) X, Y discrete
all x all y

E[h(X, Y )] = R∞ R∞ (5.8)


 h(x, y)f (x, y) dy dx X, Y continuous
−∞ −∞

The expected value E[h(X, Y )] can be thought of as the weighted average of h(x, y) for each
point in the range of (X, Y ). The value of E[h(X, Y )] represents the average value of h(X, Y )
that is expected in a long sequence of repeated trials of the random experiment.
Definition 5.8
The covariance of the random variables X and Y , denoted by σXY or cov(X, Y ), is
defined as

σXY = cov(X, Y ) = E[(X − µX )(Y − µY )] = E[XY ] − µX µY (5.9)

The covariance is defined for both continuous and discrete random variables by the same
formula.
Example 5.8
Compute the covariance of the random variables in Example 5.1.
We reproduce the table of the joint probability mass function for easy reference.

76
5.1. Two Random Variables

y
f (x, y) fX (x)
0 1 2
3 3 1 5
0 28 14 14 14
9 3 15
x 1 28 14 0 28
3 3
2 28 0 0 28
15 3 1
fY (y) 28 7 28 1

We tabulate the values of xyf (x, y).

y
0 1 2
0 0 0 0
3
x 1 0 14 0
2 0 0 0

Thus, E[XY ], the sum of all entries in the table, is


3
E[XY ] =
14
Also,
5 15 3 3
µX = 0 · +1· +2· =
14 28 28 4
15 3 1 1
µY = 0 · +1· +2· =
28 7 28 2
Therefore,
3 3 1 9
σXY = E[XY ] − µX µY = − · =−
14 4 2 56

Covariance is a measure of linear relationship between the random variables. If the relation-
ship between the random variables is nonlinear, the covariance might not be sensitive to the
relationship.
There is another measure of the relationship between two random variables that is often
easier to interpret than the covariance.
Definition 5.9
The correlation between the random variables X and Y , denoted ρXY , is defined as
σXY
ρXY = p (5.10)
V[X] V[Y ]

The correlation just scales the covariance by the product of the standard deviation of each
variable. Consequently, the correlation is a dimensionless quantity that can be used to compare
the linear relationships between pairs of variables in different units.
For the random variables in Exercise 5.1,

2  2
5 15 3 3 45
      
2
V[X] = x f (x) − µ2X = 0 2
+1 2
+2 2
=
X

x=0
14 28 28 4 112
 2
15 3 1 1 9
      
2 2 2
V[Y ] = 0 +1 +2 − =
28 7 28 2 28

77
5. Joint Probability Distributions

9
− 56
ρXY = q = −0.4472
45 9
112 × 28

For any two random variables X and Y ,

−1 ≤ ρXY ≤ 1

Two random variables with nonzero correlation are said to be correlated. Similar to co-
variance, the correlation is a measure of the linear relationship between random variables.
For independent random variables, we do not expect any relationship in their joint proba-
bility distribution That is,
σXY = 0
if the random variables are independent. The result is left as an exercise.

5.2. More Than Two Random Variables


More than two random variables can be defined in a random experiment. Results for multiple
random variables are straightforward extensions of those for two random variables. A summary
for the continuous random variables is provided here. For the discrete case, simply replace the
integral with a summation.
Definition 5.10
A probability density function f (x1 , x2 , . . . , xn ) for the continuous random variables
X1 , X2 , . . . , Xn has the following properties:

(1) f (x1 , x2 , . . . , xn ) ≥ 0
Z ∞ Z ∞ Z ∞
(2) ··· f (x1 , x2 , . . . , xn ) dx1 dx2 . . . dxn = 1
−∞ −∞ −∞

(3) For any region R of n-dimensional space,


Z Z
P[(X1 , . . . , Xn ) ∈ R] = ··· f (x1 , . . . , xn ) dx1 · · · dxn
R

The marginal probability density function of the random variable Xi can be obtained by
multiple integration of the joint density function at all the random variables except Xi . For
example, the marginal density of X2 is
Z ∞ Z ∞ Z ∞
f (x2 ) = ··· f (x1 , x2 , x3 , . . . , xn ) dx1 dx3 · · · dxn
−∞ −∞ −∞

Likewise, the joint probability density function of several random variables can be obtained by
multiple integration of the joint density function except at those random variables of interest.
The mean and variance of the random variable Xi from a joint distribution f (x1 , x2 , . . . , xn )
can be obtained by applying the definition of the mean and variance of a single random variable
(Equations 4.3 and 4.4) using its marginal density function f (xi ).
Conditional probability distributions can be developed for multiple random variables by
an extension of ideas used for the two random variables. That is, if the random variables
X1 , . . . , Xm , Y1 , . . . , Yn have a joint probability density function f (x1 , . . . , xm , y1 , . . . , yn ), the
conditional probability distribution of the X’s given the Y ’s is

fX1 ,...,Xm ,Y1 ,...,Yn (x1 , . . . , xm , y1 , . . . , yn )


fX1 ,...,Xm |y1 ,...,yn (x1 , . . . , xm ) =
fY1 ,...,Yn (y1 , . . . , yn )

78
5.3. Linear Functions of Random Variables

Similar to the results for only two random variables, independence of the random variables
X1 , X2 , . . . , Xn implies that

fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = fX1 (x1 )fX2 (x2 ) · · · fXn (xn ) (5.11)

Example 5.9
Suppose that X1 , X2 , and X3 represent the thickness in micrometers of a substrate, an
active layer, and a coating layer of a chemical product, respectively. Assume that the
random variables are independent and normally distributed with µ1 = 10000, µ2 = 1000,
µ3 = 80, σ1 = 250, σ2 = 20, and σ3 = 4, respectively. The specifications for the thickness
of the substrate, active layer, and coating layer are 9200 < x1 < 10800, 950 < x2 < 1050,
and 75 < x3 < 85, respectively.

(a) What proportion of chemical products meets all thickness specifications?


The requested probability is

P[9200 < X1 < 10800, 950 < X2 < 1050, 75 < X3 < 85]

Because the random variables are independent,

P[9200 < X1 < 10800, 950 < X2 < 1050, 75 < X3 < 85]
= P[9200 < X1 < 10800] P[950 < X2 < 1050] P[75 < X3 < 85]
= P[−3.2 < Z < 3.2] P[−2.5 < Z < 2.5] P[−1.25 < Z < 1.25]

after standardizing. From Appendix A-1, the desired probability is

(0.998626)(0.987581)(0.788701) = 0.777836

(b) Which one of the three thicknesses has the least probability of meeting specifications?
The thickness of the coating layer has the least probability of meeting specifications. Con-
sequently, a priority should be to reduce variability in this part of the process.

5.3. Linear Functions of Random Variables


A random variable is sometimes defined as a function of one or more random variables. In
this section, results for linear functions are highlighted because of their importance in the
remainder of the handout. For example, if the random variables X1 and X2 denote the length
and width, respectively, of a manufactured part, Y = 2X1 + 2X2 is a random variable that
represents the perimeter of the part.
In this section, we develop results for random variables that are linear combinations of
random variables.
Definition 5.11
Given random variables X1 , X2 , . . . , Xk and constants a1 , a2 , . . . , ak ,

Y = a1 X1 + a2 X2 + · · · + ak Xk (5.12)

is a linear combination of X1 , X2 , . . . , Xk .

The expected value and variance of the random variable Y = a1 X1 + a2 X2 + · · · + ak Xk are

E[Y ] = a1 E[X1 ] + a2 E[X2 ] + · · · + ak E[Xk ] (5.13)

79
5. Joint Probability Distributions

V[Y ] = a22 V[X1 ] + a22 V X 2 + · · · + a2k V[Xk ] + 2


h i
ai aj cov(Xi , Xj ) (5.14)
XX

i<j

We prove Equation (5.13) for the case k = 2.


Z ∞ Z ∞
E[Y ] = E[a1 X1 + a2 X2 ] = (a1 x1 + a2 x2 )f (x1 , x2 ) dx1 dx2
−∞ −∞
Z ∞ Z ∞ Z ∞ Z ∞
= a1 x1 f (x1 , x2 ) dx1 dx2 + a2 x2 f (x1 , x2 ) dx1 dx2
−∞ −∞ −∞ −∞
Z ∞ Z ∞ Z ∞ Z ∞
= a1 x1 f (x1 , x2 ) dx1 dx2 + a2 x2 f (x1 , x2 ) dx1 dx2
−∞ −∞ −∞ −∞
= a1 E[X1 ] + a2 E[X2 ]

If the random variables X1 , X2 , . . . , Xk are independent, cov(Xi , Xj ) = 0 for i 6= j, and the


variance of the linear function of the random variables is

V[a1 X1 + a2 X2 + · · · + ak Xk ] = a21 V[X1 ] + a22 V[X2 ] + · · · + a2k V[Xk ] (5.15)

The particular linear function that represents the average of random variables with identical
means and variances is used quite often in subsequent chapters. We highlight the results for
this special case.
Definition 5.12 (Average)

Let X1 , X2 , . . . , Xn be independent and identically distributed random variables with


E[Xi ] = µ and V[Xi ] = σ 2 . Define the random variable X by

X1 + X2 + · · · + Xn
X= (5.16)
n

Average

The expected value and variance of X are

EX =µ (5.17)
 

σ2
VX = (5.18)
 
n

Example 5.10
An automated filling machine fills soft-drink cans. The mean fill volume is 12.1 fluid ounces,
and the standard deviation is 0.1 oz. Assume that the fill volumes of the cans are indepen-
dent, normal random variables. What is the probability that the average volume of 10 cans
selected from this process is less than 12 oz?
Let X1 , X2 , . . . , Xn denote the fill volumes of the 10 cans. The average fill volume X is a
normal random variable with
h i
E X = 12.1
h i 0.12
V X = = 0.001
10
Consequently,
12 − 12.1
h i  
P X < 12 = P Z < √ = P[Z < −3.16] = 0.000789
0.001

80
5.4. General Function of Random Variables

Another useful result concerning linear functions of random variables is a reproductive


property that holds for independent, normal random variables.

Reproductive Property

If X1 , X2 , . . . , Xn are independent normal random variables with E[Xi ] = µi and


V[Xi ] = σi2 , then
Y = a1 X1 + a2 X2 + · · · + an Xn
is a normal random variable with mean and variance

E[Y ] = a1 µ1 + a2 µ2 + · · · + an µn (5.19)
V[Y ] = a21 σ12 + a22 σ22 + ··· + a2n σn2 (5.20)

Example 5.11
Let the random variables X1 and X2 denote the length and width, respectively, of a manu-
factured part. Assume that X1 is normal(ly distributed) with E[X1 ] = 2 cm and standard
deviation 0.1 cm and that X2 is normal with E[X2 ] = 5 cm and standard deviation 0.2 cm.
Also assume that X1 and X2 are independent. Determine the probability that the perimeter
exceeds 14.5 cm.
Let Y = 2X1 + 2X2 be the perimeter of the manufactured part. Y is normally distributed
with

µY = E[Y ] = 2(2) + 2(5) = 14 (5.21)


σY2 = V[Y ] = 4(0.1)2 + 4(0.2)2 = 0.2 (5.22)

Thus,
14.5 − 14
 
P[Y > 14.5] = P Z > √ = P[Z > 2.5] = 0.131776
0.2

5.4. General Function of Random Variables


In many situations in statistics, it is necessary to derive the probability distribution of a
function of one or more random variables. In this section, we present some results that are
helpful in solving this problem.
Suppose that X is a discrete random variable with probability distribution fX (x). Let
Y = h(X) be a function of X that defines a one-to-one transformation between the values
of X and Y and that we wish to find the probability distribution of Y . By a one-to-one
transformation, we mean that each value x is related to one and only one value of y = h(x)
and that each value of y is related to one and only one value of x, say x = u(y) where u(y) is
found by solving y = h(x) for x in terms of y.
Now the random variable Y takes on the value y when X takes on the value u(y). Therefore,
the probability distribution of Y is
fY (y) = P[Y = y] = P[X = u(y)] = fX (u(y))
We now consider the situation in which the random variables are continuous. Let Y = h(X)
with X continuous and the transformation one to one. The equation y = h(x) can be solved
for x in terms of y, say x = u(y). The probability distribution of Y is
fY (y) = fX (u(y))|J| (5.23)
where J = u0 (y) is called the Jacobian of the transformation. The two vertical bars is the
absolute value function.

81
5. Joint Probability Distributions

Example 5.12
Let X be a continuous random variable with probability distribution
x
f (x) = , 0<x<4
8

Find the probability distribution of the random variable Y = X + 4.

The graph of y = x + 4 in the interval 0 < x < 4 shows a one-to-one transformation.
Therefore, the equation can be solved uniquely for x: x = u(y) = (y − 4)2 .

6
0
fY (y) = fX (u(y))|u (y)|
(y − 4)2 (y − 4)3
= |2(y − 4)| =
8 4
4
0 4

Thus, f (y) = 14 (y − 4)3 4<y<6

Chapter Summary

• If X and Y are discrete random variables, the joint probability mass function f (x, y) is the
probability of the event X = x and Y = y,

f (x, y) = P[X = x, Y = y]

• If X and Y are continuous random variables, f (x, y) is called the joint probability density
function. It does not represent the probability that X = x and Y = y. The probability
P[(X, Y ) ∈ R] is determined by
ZZ
P[(X, Y ) ∈ R] = f (x, y) dx dy
R

• The marginal probability mass function of X from the joint probability mass function f (x, y)
is X
f (x) = f (x, y)
all y

• The marginal probability density function of X from the joint probability density function
f (x, y) is Z ∞
f (x) = f (x, y) dy
−∞

• The conditional probability function fY |x (x, y) is the quotient of the joint probability function
and marginal probability function.

f (x, y)
fY |x =
f (x)

• The conditional mean and conditional variance are


Z ∞
µY |x = yfY |x (x, y) dy
−∞
Z ∞
σY2 |x = y 2 fY |x (x, y) dy − µ2Y |x
−∞

82
Exercises

• The covariance cov(x, y) of the random variables X and Y is

cov(x, y) = E[XY ] − µY µY

where
XX
E[XY ] = xyf (x, y) X, Y discrete
all x all y
Z ∞Z ∞
E[XY ] = xyf (x, y) dx dy X, Y continuous
−∞ −∞

• The mean and variance of the linear combination of the random variables X1 , X2 , . . . , Xk are

E[a1 X1 + · · · + ak Xk ] = a1 E[X1 ] + · · · + ak E[Xk ]


XX
V[a1 X1 + · · · + ak Xk ] = a21 V[X1 ] + · · · + a2k V[Xk ] + 2 ai aj cov(Xi , Xj )
i<j

If the random variables are independent,

V[a1 X1 + · · · + ak Xk ] = a21 V[X1 ] + · · · + a2k V[Xk ]

• If X1 , X2 , . . . , Xk are independent normal random variables with means µ1 , µ2 , . . . , µk and


variances σ12 , σ22 , . . . , σk2 respectively, the linear combination a1 X1 + a2 X2 + · · · + ak Xk is
normally distributed with mean and variance

E[a1 X1 + a2 X2 + · · · + ak Xk ] = a1 µ1 + a2 µ2 + · · · + ak µk
V[a1 X1 + a2 X2 + · · · + ak Xk ] = a21 σ12 + a22 σ22 + · · · + a2k σk2

Exercises
5-1. Show that the following function satisfies the a) f (x, y) = cxy, for x = 1, 2, 3; y = 1, 2, 3
properties of a joint probability mass function. b) f (x, y) = c|x − y|, for x = −2, 0, 2; y = −2, 3
5-4. From a sack of fruit containing 3 oranges, 2 ap-
x y f (x, y) ples, and 3 bananas, a random sample of 4 pieces of
1.0 1 1/4 fruit is selected. Give the joint probability distribution
1.5 2 1/8 of the number of oranges and the number of apples in
1.5 3 1/4 the sample.
2.5 4 1/4 5-5. The conditional probability distribution of Y |x
3.0 5 1/8 is fY |x (y) = xe−xy for y > 0, and the marginal prob-
Determine the following: ability distribution of X is a continuous uniform dis-
a) P[X < 2.5, Y < 3] b) P[X < 2.5] tribution over 0 to 10. Determine
c) P[Y < 3] d) P[X > 1.8, Y > 4.7] a) P[Y < 2|X = 2] b) E[Y |X = 2]
c) E[Y |x] d) fXY (x, y)
e) E[X], E[Y ], V[X], and V[Y ]
e) fY (y)
f) Marginal probability distribution of X
5-6. A fast-food restaurant operates both a driv-
g) fY |X=1.5 (y) h) fX|Y =2 (x) ethrough facility and a walk-in facility. On a randomly
i) E[Y |X = 1.5] selected day, let X and Y , respectively, be the propor-
tions of the time that the drive-through and walk-in
5-2. If the joint probability distribution of X and Y
facilities are in use, and suppose that the joint density
is given by
x+y function of these random variables is
f (x, y) =
30 f (x, y) = 23 (x + 2y) 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
for x = 0, 1, 2, 3; y = 0, 1, 2, find
a) Find the marginal density of X and determine the
a) P[X ≤ 2, Y = 1] b) P[X > 2, Y ≤ 1]
probability that the drive-through facility is busy
c) P[X > Y ] d) P[X + Y = 4] less than one-half of the time.
5-3. Determine the values of c so that the following b) Find the marginal density of Y .
functions represent joint probability distributions of c) Find
 the  conditional probability function
the random variables X and Y : P Y X = x .

83
5. Joint Probability Distributions

d) Are the random variables independent? of wheel throwing and the time of firing are normally
e) Find µX and µY . distributed random variables with means of 40 min-
utes and 60 minutes and standard deviations of 2 min-
5-7. A candy company distributes boxes of chocolates
utes and 3 minutes, respectively.
with a mixture of creams, toffees, and cordials. Sup-
pose that the weight of each box is 1 kilogram, but the a) What is the probability that a piece of pottery
individual weights of the creams, toffees, and cordials will be finished within 95 minutes?
vary from box to box. For a randomly selected box, b) What is the probability that it will take longer
let X and Y represent the weights of the creams and than 110 minutes?
the toffees, respectively, and suppose that the joint 5-14. A mechanical assembly used in an automobile
density function of these variables is f (x, y) = 24xy engine contains four major components. The weights
where 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 and x + y ≤ 1. of the components are independent and normally dis-
a) Find the probability that in a given box the cor- tributed with the following means and standard devi-
dials account for more than 1/2 of the weight. ations (in ounces)
b) Find the marginal density for the weight of the Component µ σ
creams. Left case 4.0 0.4
c) Find the probability that the weight of the tof- Right case 5.5 0.5
fees in a box is less than 1/8 of a kilogram if it is Bearing assembly 10.0 0.2
known that creams constitute 3/4 of the weight. Bolt assembly 8.0 0.5
5-8. Let X denote the diameter of an armored elec- a) What is the probability that the weight of an as-
tric cable and Y denote the diameter of the ceramic sembly exceeds 29.5 ounces?
mold that makes the cable. Both X and Y are scaled b) What is the probability that the mean weight of
so that they range between 0 and 1. Suppose that X eight independent assemblies exceeds 29 ounces?
and Y have the joint density
5-15. The weight of a small candy is normally dis-
f (x, y) = 1
y
0<x<y<1 tributed with a mean of 0.1 ounce and a standard
deviation of 0.01 ounce. Suppose that 16 candies are
Find P X + Y > .
 1

2 placed in a package and that the weights are indepen-
5-9. The length and width of panels used for interior dent.
doors (in inches) are denoted as X and Y , respectively. a) What are the mean and variance of the package’s
Suppose that X and Y are independent, continuous net weight?
uniform random variables for 17.75 ≤ x ≤ 18.25 and b) What is the probability that the net weight of a
4.75 ≤ y ≤ 5.25 respectively. package is less than 1.6 ounces?
a) Determine the probability that the area of a panel c) If 17 candies are placed in each package, what is
exceeds 90 square inches. the probability that the net weight of a package
b) What is the probability that the perimeter of a is less than 1.6 ounces?
panel exceeds 46 inches?
5-16. The weight of adobe bricks for construction is
5-10. Determine the covariance and correlation for normally distributed with a mean of 3 pounds and a
the following joint probability distribution: standard deviation of 0.25 pound. Assume that the
x 1 1 2 4 weights of the bricks are independent and that a ran-
y 3 4 5 6 dom sample of 25 bricks is chosen. What is the prob-
f (x, y) 1/8 1/4 1/2 1/8 ability that the mean weight of the sample is less than
5-11. Determine the covariance and correlation of the 2.95 pounds?
random variables in Exercise 5-6. 5-17. Assume that the weights of individuals are in-
5-12. X and Y are independent, normal random vari- dependent and normally distributed with a mean of
ables where X ∼ N(0, 4) and Y ∼ N(10, 9). Determine 160 pounds and a standard deviation of 30 pounds.
the following: Suppose that 25 people squeeze into an elevator that
is designed to hold 4300 pounds.
a) E[2X + 3Y ] b) V[2X + 3Y ]
a) What is the probability that the load (total
c) P[2X + 3Y < 30] d) P[2X + 3Y < 40] weight) exceeds the design limit?
5-13. Making handcrafted pottery generally takes b) What design limit is exceeded by 25 occupants
two major steps: wheel throwing and firing. The time with probability 0.0001?

84
Part II.

Estimation, Statistical Inference and


Model Verification

85
6. Point Estimation and Sampling Distribution
The purpose of this chapter is to introduce the concept of sampling and to present some
distribution theoretical results that are engendered by sampling. In addition, the student will
be given an introduction to the role that the sample mean and variance will play in statistical
inference in later chapters. It is a connecting chapter – it merges the distribution theory of
the previous chapters into the statistical theory of the remaining chapters. The intent is to
present here in one location some of the distributions that are associated with sampling and
that will be necessary in the study of the theory of statistics, especially estimation and testing
hypotheses.

Learning Objectives
At the end of this chapter, you should be able to do the following:
1. Explain the general concepts of estimating the parameters of a population or a proba-
bility distribution
2. Explain the important role of the normal distribution as a sampling distribution
3. Understand the central limit theorem
4. Explain important properties of point estimators, including bias, variance, and mean
square error

6.1. Point Estimation


Parameters
In statistical inference, the term parameter is used to denote a quantity θ (Greek theta), say,
that is a property of an unknown probability distribution. For example, it may be the mean,
variance, or a particular quantile of the probability distribution. Parameters are unknown,
and one of the goals of statistical inference is to estimate them.
Parameters can be thought of as representing a quantity of interest about a general pop-
ulation. In earlier chapters, probability calculations were made based on given values of the
parameters of the probability distributions, but in practice the parameters are unknown since
the probability distribution that characterizes observations from the population is unknown.
An experimenter’s goal is to find out as much as possible about these parameters since they
provide an understanding of the underlying probability distribution that characterizes the
population.
Example 6.1

(a) Let p0 be the probability that a machine breakdown is due to operator misuse. This is
a parameter because it depends upon the probability distribution that governs the causes
of the machine breakdowns. In practice p0 is an unknown quantity, but it may be estimated
from the records of machine breakdown causes.

(b) Let µ and σ 2 be the mean and variance of the probability distribution of % scrap
when an ingot is passed once through the rollers. These are unknown parameters that

87
6. Point Estimation and Sampling Distribution

are properties of the unknown underlying probability distribution governing the % scrap
obtained from the rolling process.

Statistics
Definition 6.1
Let X1 , . . . , Xn be independent random variables having the same probability distribu-
tion function f (x). The random variables constitute a random sample (of size n) from
a population.

The primary purpose in taking a random sample is to obtain information about the unknown
population parameters. Suppose, for example, that we wish to reach a conclusion about the
proportion of people in a locality who prefer a particular brand of soft drink. Let p represent
the unknown value of this proportion. It is impractical to question every individual in the
population to determine the true value of p. To make an inference regarding the true proportion
p, a more reasonable procedure would be to select a random sample (of an appropriate size)
and use the observed proportion p̂ of people in this sample favoring the brand of soft drink.
The sample proportion, p̂, is computed by dividing the number of individuals in the sample
who prefer the brand of soft drink by the total sample size n. Thus, p̂ is a function of the
observed values in the random sample. Because many random samples are possible from a
population, the value of p̂ will vary from sample to sample. That is, p̂ is a random variable.
Such a random variable is called a statistic.
Definition 6.2
A statistic is a function of observable random variables, which is itself an observable
random variable.

Whereas a parameter is a property of a population or a probability distribution, a statistic


is a property of a sample from the population. In contrast to parameters, statistics take
observed values and consequently can be thought of as being known. However, in the discussion
of statistical estimation it is useful to remember that statistics are actually observations of
random variables with their own probability distributions.
For example, suppose that a sample of size n is collected observations from a particular
probability distribution f (x). The data values recorded, x1 , x2 , . . . , xn , are the observed values
of a set of n random variables X1 , X2 , . . . , Xn , and each has the probability distribution f (x).
In general, a statistic is any function

h(X1 , X2 , . . . , Xn )

of these random variables. The observed value of the statistic

h(x1 , x2 , . . . , xn )

can be calculated from the observed data values x1 , x2 , . . . , xn .


Examples of statistics are the sample mean and sample variance.

X1 + X2 + · · · + Xn
X= (6.1)
n
n
(Xi − X)2
X

S2 = i=1
(6.2)
n−1

88
6.2. General Concepts of Point Estimation

Estimation

Estimation is a procedure by which the information contained within a sample is used to


investigate properties of the population from which the sample is drawn. In particular, a
point estimate of an unknown parameter θ is a statistic θ̂ that is in some sense a “best
guess” of the value of θ. Notice that a caret or “hat” placed over a parameter signifies a
statistic used as an estimate of the parameter.
For a given data set x1 , x2 , . . . , xn , the sample mean and sample variance take the observed
values
n
(xi − x)2
P
x1 + x2 + · · · + xn
x= and s2 = i=1
n n−1

These are the point estimates µ̂ and σ̂ 2 respectively.


Of course, an experimenter does not in general believe that a point estimate θ̂ is exactly
equal to the unknown parameter θ. Nevertheless, good point estimates are chosen to be good
indicators of the actual values of the unknown parameter θ. In certain situations, however,
there may be two or more good point estimates of a certain parameter which could yield
slightly different numerical values.
Remember that point estimates can only be as good as the data set from which they are
calculated. Again, this is a question of how representative the sample is of the population
relating to the parameter that is being estimated. In addition, if a data set has some obvious
outliers, then these observations should be removed from the data set before the point estimates
are calculated.
Example 6.2

(a) Consider the unknown parameter p0 , which represents the probability that a machine
breakdown is due to operator misuse. Suppose that a representative sample of n machine
breakdowns is recorded, of which x0 are due to operator misuse. The statistic x0 /n is an
obvious point estimator of the unknown parameter p0 , and this may be written
x0
p̂0 =
n

(b) Given a representative sample x1 , . . . , xn of % scrap values, point estimates of the


unknown parameters µ and σ 2 , the mean and variance of the probability distribution of %
scrap when an ingot is passed once through the rollers, are
n
1 1 X
µ̂ = (x1 + . . . + xn ) and σ̂ 2 = (xi − x)2
n n − 1 i=1

6.2. General Concepts of Point Estimation

This section considers two basic criteria for determining good point estimates of a particular
parameter, namely, unbiased estimates and minimum variance estimates. These criteria help
us decide which statistics to use as point estimates. In general, when there is more than one
obvious point estimate for a parameter, these criteria can be used to compare the possible
choices of point estimate.

89
6. Point Estimation and Sampling Distribution

6.2.1. Unbiased Estimator


Definition 6.3
If X1 , X2 , . . . , Xn is a random sample from a population with density function f (x), the
statistic
Θ̂ = h(X1 , X2 , . . . , Xn )
is called a point estimator of the unknown parameter θ. After the sample x1 , x2 , . . . , xn
has been selected, the point estimator Θ̂ takes on a single numerical value

θ̂ = h(x1 , x2 , . . . , xn )

called the point estimate of θ.

The statistics
1
X= (X1 + X2 + · · · + Xn )
n
n
1 X
S2 = (Xi − X)2
n − 1 i=1

are point estimators of the mean µ and the variance σ 2 , respectively. These are not the only
estimators of µ and σ 2 . For example,
1
µ̂ = (2X1 + X2 + X3 + · · · + Xn−1 + 2Xn )
n+2
is also an estimator of µ.
Definition 6.4

The point estimator Θ̂ is an unbiased estimator of the unknown parameter θ if

E Θ̂ = θ
 

If the estimator is not unbiased, the quantity

bias Θ̂ = E Θ̂ − θ
   

is called the bias of the estimator.

It is easy to show that X is an unbiased estimator of µ. The proof that the sample variance
S2 is an unbiased estimator of σ 2 can be found at the Appendix section.
Example 6.3

(a) In Example 6.2(a), the random variable X0 , the number of machine breakdowns due to
misuse, is a binomial random variable with parameter θ = p0 . The point estimate of p0 is
the value
x0
p̂0 =
n
The expected value of X0 is
E[X0 ] = np0
Consequently,
X0 1
 
E[p̂0 ] = E = (np0 ) = p0
n n
indicating that x0 /n is an unbiased estimator of p0 .

90
6.2. General Concepts of Point Estimation

(b) In Example 6.2(b), the sample mean is an unbiased estimator of µ since

1 1 1
 
EX =E (X1 + · · · + Xn ) = (µ + · · · + µ) = (nµ) = µ
 
n n | {z } n
n terms

(c) The estimator


1
Θ̂ = 2X1 + X2 + X3 + · · · + Xn−2 + Xn−1 + 2Xn

n+2
is an unbiased estimator of µ.
Since the random variables Xi have the same probability distribution function f (x), then

E[Xi ] = µ

Using Equation (5.13),

1
 
E Θ̂ = E (2X1 + X2 + · · · + Xn−1 + 2Xn )
 
n+2
1
= E[2X1 + X2 + · · · + Xn−1 + 2Xn ]
n+2
1 1
= (2µ + µ + · · · + µ + 2µ) = (2 + 1 + · · · + 1 + 2)µ
n+2 n+2
1
= (n + 2)µ = µ
n+2

6.2.2. Variance of a Point Estimator


We saw in the previous section that a parameter may have more than one estimator. The
property of unbiasedness alone can not be relied on to select an estimator. A method to select
among unbiased estimators is needed.
As well as looking at the expectation E Θ̂ of the estimator Θ̂, it is important to consider
 

the variance V Θ̂ of the estimator. It is generally desirable to have unbiased point estimates
 

with as small a variance as possible.


For example suppose that two point estimates θ̂1 and θ̂2 have symmetric distributions.
Moreover, suppose that their distributions are both centered at θ so that they are unbiased
point estimates of θ, as shown in the Figure 6.1. Which is the better estimator? Since
V Θ̂1 < V Θ̂2
   

Θ̂1 is clearly a better point estimator than Θ̂2 . It is better in the sense that it is likely to
provide an estimate closer to the true value θ than the estimate provided by Θ̂2 .
In mathematical terms, this can be written
h i h i
P θ̂1 − θ < δ < P θ̂2 − θ < δ

for any value of δ > 0, as illustrated in Figure 6.2.


Definition 6.5
If all unbiased estimators of θ are considered, the one with the smallest variance is called
the minimum variance unbiased estimator (MVUE).

In a sense, the MVUE is most likely among all unbiased estimators to produce an estimate
θ̂ that is close to the true value of θ. We can develop methodology to identify the MVUE in

91
6. Point Estimation and Sampling Distribution

Θ̂1
Θ̂2

Figure 6.1: Density functions of two unbiased estimators Θ̂1 and Θ̂2

θ−δ θ θ+δ θ−δ θ θ+δ


h i h i
Figure 6.2: P θ̂1 − θ < δ < P θ̂2 − θ < δ

many practical situations but it will not be covered in this handout. When we do not know
whether an MVUE exists, we could still use a minimum variance principle to choose among
competing unbiased estimators.
Example 6.4
Compute the variance of the two unbiased estimators of µ using a random sample of size n.
1 1
µ̂1 = (X1 + X2 + · · · + Xn ) µ̂2 = (2X1 + X2 + · · · + Xn−1 + 2Xn )
n n+2
We apply Equation (5.15).

1
 
V[µ̂1 ] = V (X1 + X2 + · · · + Xn )
n
1 
= 2 V[X1 ] + V[X2 ] + · · · + V[Xn ]
n
1 σ2
= 2 (nσ 2 ) =
n n
1

V[µ̂2 ] = V (2X1 + X2 + · · · + Xn−1 + 2Xn )
n+2
1  
= 4V[X 1 ] + V[X 2 ] + · · · + V[X n−1 ] + 4V[X n ]
(n + 2)2
1 n+6 2
4σ 2 + σ 2 + · · · + σ 2 + 4σ 2 =
 
= 2
σ
(n + 2) (n + 2)2

Which of the two estimators of µ has a smaller variance? We determine the size n of the
sample such that the variance of µ̂1 is less than the variance of µ̂2 .

V[µ̂1 ] < V[µ̂2 ]


1 2 n+6 2
σ < σ
n (n + 2)2

92
6.2. General Concepts of Point Estimation

1 n+6
<
n (n + 2)2
(n + 2)2 < n(n + 6)
n2 + 4n + 4 < n2 + 6n
4n + 4 < 6n
2<n
For samples of size greater than 2, µ̂1 has a smaller variance than µ̂2 .
If n = 2,
σ2
V[µ̂1 ] =
2
1 σ2
 
V[µ̂2 ] = V (2X1 + 2X2 ) =
2+2 2
so that the two estimators have equal variances.
When n = 1, their variances are
V[µ̂1 ] = σ 2
1 4
 
V[µ̂2 ] = V (2X1 ) = σ 2
1+2 9
For a random sample of size n = 1, µ̂2 is the better estimator of µ in the sense that it has
smaller variance than µ̂1 . However,
1 2
 
E[µ̂2 ] = E (2X1 ) = µ
3 3
and µ̂2 is not an unbiased estimator of µ.
As for the estimator µ̂1 , a random sample of size, say n = 25, has a smaller variance than a
random sample of size, say n = 20. The same thing can be said for µ̂2 .

6.2.3. Standard Error


When the numerical value or point estimate of a parameter is reported, it is usually desirable
to give some idea of the precision of estimation. The measure of precision usually employed is
the standard error of the estimator that has been used.
Definition 6.6

The standard error of an estimator Θ̂ is its standard deviation given by


q  
se Θ̂ = σΘ̂ = V Θ̂
 

If the standard error involves unknown parameters, an estimate of the unknown parameter
is used. The resulting quantity is called a standard error estimate and is denoted σ̂Θ̂ .

6.2.4. Mean Squared Error of an Estimator


Sometimes it is necessary to use a biased estimator. In such cases, the mean squared error of
the estimator can be important.
Definition 6.7

The mean squared error of an estimator Θ̂ is the quantity


 2 
MSE Θ̂ = E Θ̂ − θ
 

93
6. Point Estimation and Sampling Distribution

In some circumstances it may be useful to compare two point estimates that have different
expectations and different variances. For example, in Figure 6.3, the point estimate θ̂2 has a
smaller bias than the point estimate θ̂1 , but it also has a larger variance. In such cases, it is
usual to prefer the point estimate that minimizes the value of mean square error.

Θ̂1
Θ̂2

θ̂1 θ̂2 θ

Figure 6.3: Two biased estimators Θ̂1 and Θ̂2

Finally, it is worth remarking that the properties of a point estimate generally depend on the
size n of the sample from which they are constructed. In particular, the variances of sensible
point estimates decrease as the sample size n increases. Notice that it is reassuring if the
variance of a point estimate tends to 0 as the sample size becomes larger and larger, and if the
point estimate is either unbiased or has a bias that also tends to 0 as the sample size becomes
larger and larger (such point estimates are said to be consistent), since in this case the point
estimate can be made to be as accurate as required by taking a sufficiently large sample size.

6.3. Sampling Distribution


6.3.1. Sample Mean
Definition 6.8
Let X1 , X2 , . . . , Xn be a random sample of size n from a population with density function
f (x), mean µ and variance σ 2 . The random variable X, defined by

X1 + · · · + Xn
X=
n
is called the sample mean.

Since X is a statistic, it is itself a random variable with mean and variance

EX =µ (6.3)
 

σ2
VX = (6.4)
 
n
The distribution of a random sample is called a sampling distribution. Theoretically
the distribution of X can be found. In general, we would suspect that the distribution of X
depends on the density f (x) from which the random sample was selected, and indeed it does.
Two characteristics of the distribution of X, its mean and variance, do not depend on the
density f (x) per se but depend only on two characteristics of the density f (x), as stated in
Equations (6.3) and (6.4).
The sampling distributions of X and S 2 should be viewed as the mechanisms from which
we make inferences on the parameters µ and σ 2 . The sampling distribution of X with sample
size n is the distribution that results when an experiment is conducted over and over (always
with sample size n) and the many values of X. This sampling distribution, then, describes

94
6.3. Sampling Distribution

the variability of sample averages around the population mean µ. Knowledge of the sampling
distribution of X arms us with the knowledge of a “typical” discrepancy between an observed
x value and true µ. The same principle applies in the case of the distribution of S 2 . The
sampling distribution produces information about the variability of s2 values around σ 2 in
repeated experiments.
Example 6.5
An electronics company manufactures resistors that have a mean resistance of 100 ohms
and a standard deviation of 10 ohms. The resistance follows a normal distribution. Find
the probability that a random sample of 25 resistors will have an average resistance of fewer
than 95 ohms.
According to the reproductive property on page 81, the sampling distribution of X is normal
with E X = µ = 100 and V X = σn = 10
2 2
25 = 4. Thus,
   

 
X−E X
95 − 100 
h i  
P X < 95 = P q   < √
VX 4

= P[Z < −2.5] = 0.0062

If the distribution of resistance is normal with mean 100 ohms and standard deviation of 10 ohms,
finding a random sample of resistors with a sample mean less than 95 ohms is a rare event. If
this actually happens, it casts doubt as to whether the true mean is really 100 ohms or if the true
standard deviation is really 10 ohms.

We now state one of the most important results in the whole of probability theory. It may
explain why many naturally occurring phenomena are observed to have distributions similar
to the normal distribution, because they may be considered to be composed of the aggregate
of many smaller random events.

Central Limit Theorem

Let f (x) be a probability function with mean µ and variance σ 2 . Let X be the mean of
a random sample of size n from a population with distribution f (x), then

σ2
!
X ∼ N µ,
n

for sufficiently large n.

The Central Limit Theorem tells us that X is approximately, or asymptotically, distributed


as a normal distribution with mean µ and variance σ 2 /n.
The astonishing thing about the theorem is the fact that nothing is said about the form of
the original probability function. Whatever the distribution function, the sample mean X will
have approximately the normal distribution for large samples.
The importance of the theorem, as far as practical applications are concerned, is the fact
that the mean X of a random sample from any distribution with variance σ 2 and mean µ is
approximately distributed as a normal random variable with mean µ and variance σ 2 /n.
When is the sample size large enough so that the Central Limit Theorem can be assumed
to apply? The answer depends on how close the underlying distribution is to the normal. A
general rule is that the approximation is adequate as long as n ≥ 30, although the approxima-
tion is often good for much smaller values of n, particularly if the distribution of the random
variables Xi has a probability density function with a shape reasonably similar to the normal
bell-shaped curve. In most cases encountered in practice, this guideline is very conservative,

95
6. Point Estimation and Sampling Distribution

and the Central Limit Theorem will apply for sample sizes much smaller than 30.
The next example makes use of the Central Limit Theorem.
Example 6.6
Suppose that a random variable has a continuous distribution

f (x) = 0.5 for 4 ≤ x ≤ 6

(a) Find the distribution of the sample mean of a random sample of size 40.
The mean and variance of a continuous uniform distribution are given by Equations (4.7)
and (4.8).

a+b 4+6
µ = E[X] = = =5
2 2
1 1
σ 2 = V[X] = (b − a)2 =
12 3
By the CLT, the distribution of X is approximately normal with mean and variance

2 1/3 1
µX = 5 σX = =
40 120

(b) What is the probability that a random sample of size 40 has a mean value between 4.8
and 5.3?

4.8 − 5 5.3 − 5
" #
h i X − µX
P 4.8 < X < 5.3 = P p < <p
1/120 σX 1/120
= P[−2.19 < Z < 3.29] = 0.984870

In real life applications, both mean µ and variance σ 2 are unknown, so that V X is also
 

unknown. An unbiased estimate for σ 2 , the sample variance, is used to estimate the standard
error.
Now consider the case of two independent populations X and Y . Suppose that the first
population has mean µX and variance σX 2 and the second population has mean µ and variance
Y
2
σY . If random samples of sizes nX and nY are taken from the first and second populations,
respectively, then by Equations (6.3) and (6.4),

E X = µX E Y = µY
   

2
σX σY2
VX = VY =
   
nX nY
Furthermore,

E X − Y = E X − E Y = µx − µY
     

2
σX σ2
V X −Y = V X +V Y = + Y
     
nX nY

If the sample sizes nX and nY are sufficiently large, the distributions of X and Y are
approximately normal by the Central Limit Theorem, so that

X −Y −V X −Y
  
q  ∼ N(0, 1)
V X −Y


96
6.3. Sampling Distribution

Example 6.7
Two independent experiments are run in which two different types of paint are compared.
Thirty-six specimens are painted using type A, and the drying time, in hours, is recorded for
each. The same is done with type B. The population standard deviations are both known
to be 1.0. Assuming that the mean drying time is equal for the two types of paint, find
P X A − X B > 0.5 .


Since the sample sizes are greater than 30, we can apply the Central Limit Theorem, where

σ2 σ2
!
X A − X B ∼ N µA − µB , A + B
nA nB

Thus,
 
X A − X B − (µA − µB ) 0.5 − 0

P X A − X B > 0.5 = P
 
q  >q 
V XA − XB 1.02 1.02
+

36 36
0.5
" #
=P Z> p = 0.017003
1/18

6.3.2. Sample Proportion


Let X be a Bernoulli random variable with mass function

x=1
(
p
f (x) =
1−p x=0

where p is the probability of a success. The mean and variance of X are

E[X] = 0(1 − p) + 1(p) = p


V[X] = (0 − p)2 (1 − p) + (1 − p)2 p = p(1 − p)

Consider the random sample X1 , X2 , · · · , Xn from the population X, but with unknown
parameter p. The mean and variance of the statistic ni=1 Xi are
P

" n # n n
E Xi = E[Xi ] = p = np
X X X

i=1 i=1 i=1


" n # n n
V Xi = V[Xi ] = p(1 − p) = np(1 − p)
X X X

i=1 i=1 i=1

We can see that the statistic ni=1 Xi , which represents the number of trials resulting in a
P

success, follows a binomial distribution with parameter p.


The sample mean
n
X1 + X2 + · · · + Xn 1X
X= = Xi
n n i=1

is an estimator of p with
n
1X 1
" #
EX =E Xi = (np) = p
 
n i=1 n

so that
p̂ = X

97
6. Point Estimation and Sampling Distribution

is unbiased for p. Thus, we can write the estimate as


x0
x=
n
where x0 represents the number of trials resulting in a success, among n trials.
For sufficiently large n, the distribution of p̂ is approximately normal by virtue of the Central
Limit Theorem,with mean p and variance
n
1X 1
" #
p(1 − p)
V[p̂] = V Xi = 2 np(1 − p) =

n i=1 n n

The standard error of the estimator is


s
p(1 − p)
se P̂ = σP̂ =
 
n

Since the success probability p is unknown, the standard error is really also unknown since it
depends upon p. However, it is customary to estimate the standard error by replacing p by
the observed value p̂ = x = x0 /n, so that
s
p̂(1 − p̂)
σ̂P̂ = (6.5)
n

is the standard error estimate of the proportion p.


Example 6.8
Suppose that the probability p that a vaccine provokes a serious adverse reaction is unknown.
If the vaccine is administered to n = 500,000 head of cattle and then x0 = 372 are observed
to suffer the reaction.

(a) Find the point estimate of p.


372
p̂ = = 0.000 744
500 000

(b) Calculate the standard error estimate.


s
0.000744(0.999256)
se p̂ = = 0.3856 × 10−5
 
500000

A comparison of this calculation with discussion in Section 3.6 provides a distinct contrast
between the different uses of probability theory and statistical inference. In a binomial distri-
bution, the probability of an adverse reaction p is taken to be known, and probability theory
then allows the number of cattle suffering a reaction to be predicted. However, the situation
is now reversed. In this discussion, the number of cattle suffering a reaction is observed, and
hence is known, and statistical inference is used to estimate the probability of an adverse
reaction p.
Now consider the case of two independent populations. Suppose that we want to estimate
the difference in the proportions of success between the populations, denoted p1 − p2 . Our
unbiased estimates of p1 and p2 are
x0 y0
p̂1 = p̂2 =
n1 n2

98
6.3. Sampling Distribution

where x0 and y0 represent the number of trials resulting in a success and n1 and n2 are the
sample sizes from the two populations.
We choose the estimator
X0 Y0
P̂1 − P̂2 = −
n1 n2
since this is unbiased for p̂1 − p̂2 .
h i h i h i
E P̂1 − P̂2 = E P̂1 − E P̂2 = p1 − p2

The variance of this estimator is


h i p1 (1 − p1 ) p2 (1 − p2 )
V P̂1 − P̂2 = +
n1 n2

and the standard error of the estimate is


s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
se P̂1 − P̂2 = + (6.6)
 
n1 n2

Equations (6.5) and (6.6) are valid when the conditions of the Central Limit Theorem are
met. In the next chapters, we will find the utility of these equations. Finally, we state that
the sample mean x is the MVUE of µ.

6.3.3. Sample Variance


The statistic S 2 is an unbiased point estimator of the population variance σ 2 , and the numerical
value s2 computed from the sample data is called the point estimate of σ 2 . The sample variance
can be thought of as an estimate of the variance σ 2 of the unknown underlying probability
distribution of the observations in the data set. It provides an indication of the variability in
the sample in the same way that the variance σ 2 provides an indication of the variability of a
probability distribution.
An alternative computational formula for the sample variance s2 is

n

n
!2 
1 X 1 X 
s2 = Xi2 − Xi
n − 1  i=1 n i=1 

Gamma Function
Definition 6.9
The function Z ∞
Γ(ν) = xν−1 e−x dx for ν > 0
0
is called the gamma function.

The gamma function has the following properties:

i) Γ(1) = 1

ii) Γ(ν) = (ν − 1)Γ(ν − 1)

iii) Γ(n) = (n − 1)! for positive integer n


1
  √
iv) Γ 2 = π

99
6. Point Estimation and Sampling Distribution

Example 6.9
Evaluate the following:

(a) Γ(5).
Using Property (iii) with n = 5, Γ(5) = (5 − 1)! = 4! = 24

(b) Γ 27 .
 

Property (ii) will be used recursively, together with Property (iv).

7 7 7
− 1 = 52 Γ 5
       
Γ 2 = 2 −1 Γ 2 2
5 5 5 5
× 32 Γ 3 15 3
h   i    
= 2 2 −1 Γ 2 −1 = 2 2 = 4 Γ 2
15 3 3 15

× 21 Γ 1 15
h   i  
= 4 2 −1 Γ 2 −1 = 4 2 = 8 π

Chi-Squared Distribution
Definition 6.10
The continuous random variable χ2 , whose density is

1  ν ν x
f (x) = 1
2
2
x( 2 −1) e− 2 where x > 0
Γ ν

2

is said to have a chi-squared distribution with ν degrees of freedom where ν is a


positive integer.

Graphs of several chi-squared distributions are shown in Figure 6.4.

f (x)

ν=2
ν=5
ν=9

Figure 6.4: Chi-Squared density functions

Example 6.10
Let χ2 be a chi-squared random variable with ν = 6 degrees of freedom. Compute
P 0.6 < χ2 < 3.2 .
 

1  1  62 6/2−1 −x/2
Z 3.2
P 0.6 < χ2 < 3.2 =
h i
x e dx
0.6 Γ(6/2) 2
Z 3.2
1  1  2 −x/2
= x e dx
0.6 2! 8
Z 3.2
1 2 −x/2
= 16 x e dx = 0.2130
0.6

Suppose X1 , X2 , . . . , Xn consitute a random sample from a population that is normally

100
6.3. Sampling Distribution

distributed with mean µ and variance σ 2 . The statistics X and S 2 are independent random
2
variables, with X being normal with mean µ and variance σn . The statistic

(n − 1)S 2
χ2(ν) = (6.7)
σ2
has a chi-squared distribution with ν = n − 1 degrees of freedom.
Example 6.11
The time it takes a central processing unit to process a certain type of job is normally
distributed with mean 20 seconds and standard deviation 3 seconds. If a sample of 15 such
jobs is observed, what is the probability that the sample variance will exceed 12?
The random sample of size n = 15 is from a population that is normally distributed with
µ = 20 and σ = 3.
We first convert the event S 2 > 12 to a chi-squared event.

S 2 > 12
(n − 1)S 2 (n − 1)(12)
2
>
σ σ2
(14)(12)
χ2(ν) >
32
56
χ2(14) >
3
Therefore,

P S 2 > 12 = P χ2(14) > 56


= 1 − P χ2(14) ≤ 56
h i h i h i
3 3
Z 56
3 1  14/2
=1− 1
x14/2−1 e−x/2 dx
0 Γ(14/2) 2
Z 56
3 1 1 6 −x/2
=1− · 128 x e dx = 0.1781
0 6!

Although the population variance is σ 2 = 9, it is not surprising, probability-wise, to find a sample


with variance greater than 12.

The distributional result of S 2 turns out to be very important for the problem of estimating
a normal population mean. This is because, the standard error of the sample mean µ̂ = X is

σ/ n. The dependence of the standard error on the unknown variance σ 2 is rather awkward,
but the sample variance S 2 can be used to overcome the problem. The elimination of the
unknown paramater σ 2 in favor of the sample variance S 2 is the statistic

X−µ

S/ n

called the t-statistic. The distribution of the estimator is called the Student’s t distribution.
This result is very important since in practice an experimenter knows the values of n and
the observed sample mean x and sample variance s2 , and so knows everything in the quantity

x−µ

s/ n

except for µ. This allows the experimenter to make useful inferences about µ, as described in
the next chapter.

101
6. Point Estimation and Sampling Distribution

Chapter Summary
• A parameter is a property of the population whereas a statistic is a property of the sample.
• The quantity
E Θ̂ − θ
 

is called the bias of the estimator Θ̂. If this quantity is zero, the estimator Θ̂ is called an
unbiased estimator of θ.
• The estimator Θ̂ that has the least variance among all unbiased estimators is called the
minimum variance unbiased estimator (MVUE).
• The standard deviation of an estimator is also called the standard error of the estimator.
• If a random sample is taken from a population with mean µ and variance σ 2 ,
  σ2
E X =µ VX =
 
n

• The sample mean


X1 + X2 + · · · + Xn
X=
n
is the MVUE of the population mean µ.
• The sample proportion
X0
P̂ =
n
is an unbiased estimator of the population proportion p.
• The sample variance
n
1 X 2
S2 = Xi − X
n − 1 i=1

is an unbiased estimator of the population variance σ 2 .


• (Central Limit Theorem) The sampling distribution X is approximately normally distributed
with mean µ and variance σ 2 /n for sufficiently large n.
• The following distributions are approximately standard normal for a sample size n > 30.
X−E X
 
1. q  
VX
X −Y −E X −Y
  
2. q 
V X −Y


P̂ − E P̂
 
3. q  
V P̂
P̂X − P̂Y − E P̂X − P̂Y
  
4. q 
V P̂X − P̂Y


• The statistic
(n − 1)S 2
σ2
is a chi-squared random variable with n − 1 degrees of freedom.
• The statistic
X−E X
 

S/ n
is a student’s t random variable with n − 1 degrees of freedom.

102
Exercises

Exercises
6-1. Consider a sample X1 , X2 , . . . , Xn of normally variable x
distributed random variables with mean µ and vari- N ?
ance σ 2 = 7. mean ?
SE mean 2.05
a) If n = 15, what is P X − µ ≤ 0.4 ?
 
stDev 10.25
b) What is this probability if n = 50? Variance ?
Sum 3761.70
6-2. The compressive strength of concrete is normally
Sum of Squares ?
distributed with µ = 2500 psi and σ = 50 psi. Find
the probability that a random sample of n = 5 spec- a) Find all the missing quantities.
imens will have a sample mean compressive strength b) What is the estimate of the mean of the popula-
that falls in the interval from 2499 psi to 2510 psi. tion from which this sample was drawn?
6-3. A normal population has mean 100 and variance 6-9. The amount of time that a drive-through bank
25. How large must the random sample be if you want teller spends on a customer is a random variable with
the standard error of the sample average to be 1.5? a mean µ = 3.2 minutes and a standard deviation
σ = 1.6 minutes. If a random sample of 64 customers
6-4. If all possible samples of size n = 16 are drawn
is observed, find the probability that their mean time
from a normal population with mean equal to 50 and at the teller’s window is
standard deviation equal to 5, what is the probabil-
a) at most 2.7 minutes;
ity that a sample mean x will fall in the interval from
µx − 1.96σx to µx − 0.4σx ? Assume that the samples b) more than 3.5 minutes;
means can be measured to any degree of accuracy. c) at least 3.2 minutes but less than 3.4 minutes.

6-5. The random variable X, representing the number 6-10. The amount of time that a customer spends
of cherries in a cherry puff, has the following proba- waiting at an airport check-in counter is a random
bility distribution: variable with mean 8.2 minutes and standard devia-
tion 1.5 minutes. Suppose that a random sample of
x 4 5 6 7
n = 49 customers is observed. Find the probability
P[X = x] 0.2 0.4 0.3 0.1 that the average time waiting in line for these cus-
tomers is
a) Find the mean µ and variance σ 2 of X.
a) less than 10 minutes b) less than 6 minutes?
b) Find E X and V X for random samples of 36
   
cherry puffs. c) between 5 and 10 minutes
c) Find the probability that the average number of 6-11. The mean score for freshmen on an aptitude
cherries in 36 cherry puffs will be less than 5.5. test at a certain college is 540, with a standard de-
viation of 50. Assume the means to be measured to
6-6. The tar contents of 8 brands of cigarettes selected any degree of accuracy. What is the probability that
at random from the latest list released by the Federal two groups selected at random, consisting of 32 and 50
Trade Commission are as follows: 7.3, 8.6, 10.4, 16.1, students, respectively, will differ in their mean scores
12.2, 15.1, 14.5, and 9.3 milligrams. Calculate by
a) the sample mean; b) the sample variance. a) more than 20 points?
6-7. Data on the oxide thickness in angstrom of semi-
b) an amount between 5 and 10 points?
conductor wafers are as follows: 425, 431, 416, 419, 6-12. Of n1 randomly selected engineering students
421, 436, 418, 410, 431, 433, 423, 426, 410, 435, 436, at University X1 , x1 owned a Casio calculator, and of
428, 411, 426, 409, 437, 422, 428, 413, 416. n2 randomly selected engineering students at Univer-
a) Calculate a point estimate of the mean oxide sity X2 , x2 owned a Casio calculator. Let p1 and p2
thickness for all wafers in the population. be the probability that randomly selected X1 and X2
engineering students, respectively, own Casio calcula-
b) Calculate a point estimate of the standard devia- tors.
tion of oxide thickness for all wafers in the popu-
a) Suppose that n1 = 200, x1 = 150, n2 = 250, and
lation.
x2 = 185. Compute an estimate p̂1 − p̂2 .
c) Calculate the standard error of the point estimate
b) Compute an estimate of V P̂1 − P̂2 .
 
from part (a).
6-13. Show that
d) Calculate a point estimate of the proportion of
wafers in the population that have oxide thickness 1X
n
2
of more than 430 angstroms. Θ̂ = Xi − X
n
i=1
6-8. A computer software package calculated some
is a biased estimator of σ 2 . That is, show that E Θ̂
 
numerical summaries of a sample of data. The results
are displayed here: is not exactly σ 2 .

103
7. Statistical Intervals
The previous chapter dealt with the point estimation of a value of a parameter. Such point
estimates are quite useful, yet they leave something to be desired. In all those cases when
the point estimator under consideration had a probability density function, there is still no
reason to expect a point estimate from a given sample to be exactly equal to the population
parameter it is supposed to estimate. Hence, it seems desirable that a point estimate should
be accompanied by some measure of the possible error of the estimate. For instance, a point
estimate might be accompanied by some interval about the point estimate together with some
measure of assurance that the true value of the parameter lies within the interval. Instead of
making the inference of estimating the true value of the parameter to be a point, one might
make the inference of estimating that the true value of the parameter is contained in some
interval. This is called interval estimation.

Learning Objectives
At the end of this chapter, you should be able to do the following:

1. Construct confidence intervals on the mean of a normal distribution, using either the
normal distribution or the t-distribution

2. Construct confidence intervals on the variance and standard deviation of a normal dis-
tribution

3. Construct confidence intervals on a population proportion

4. Construct prediction intervals for a future observation

5. Construct a tolerance interval for a normal population

6. Explain the three types of interval estimates: confidence intervals, prediction intervals,
and tolerance intervals

7. Construct confidence intervals on the difference of the means of two independent popu-
lations

8. Construct confidence intervals on the difference of two population proportions

9. Construct confidence intervals on the ratio of two population variances

7.1. Confidence Interval


An interval estimate of a population parameter θ is an interval of the form

θ̂1 < θ < θ̂2

where θ̂1 and θ̂2 depend on the value of the statistic Θ̂ for a particular sample and also on
the sampling distribution of Θ̂. Information about the precision of estimation is conveyed
by the length of the interval. A short interval implies precise estimation. We cannot be
certain that the interval contains the true unknown population parameter – only a sample
from the full population is used to compute the point estimate and the interval. However,

105
7. Statistical Intervals

the confidence interval is constructed so that there is high confidence that it does contain the
unknown population parameter.
Definition 7.1
A confidence interval for an unknown parameter θ is an interval that contains a set of
plausible values of the parameter. It is associated with a confidence level or degree of
confidence 1 − α, which measures the probability that the confidence interval actually
contains the unknown parameter value.

The lower and upper limits of the confidence interval can be determined from the observed
sample and the sampling distribution. Because the interval may or may not contain the true
parameter θ, a probability value 1 − α is associated with it, and this is stated as
h i
P θ̂1 < θ < θ̂2 = 1 − α

For a given 1 − α (or α), it is possible to construct different intervals satisfying the equation
above. Figure 7.1 shows two intervals for θ = µ of a standard normal distribution at 1 − α =
0.6991. The second interval is more precise than the first, in the sense that its width 2.32 is
shorter than the first interval whose width is 2.70. Among all intervals with the same degree
of confidence, the logical choice is the shortest interval.

1−α 1−α

−2.13 0.57 −1.65 0.67

Figure 7.1: Two confidence intervals with equal degrees of confidence

7.2. Confidence Interval for a Population Parameter


7.2.1. Population Mean
Consider the problem of finding a confidence interval θ̂1 < µ < θ̂2 for the population mean
µ for a given α such that the width is as small as possible. We know that the confidence
limits θ̂1 and θ̂2 are dependent on the probability distribution f (x). For the standard normal
distribution f (z), it can be shown using calculus that the confidence limits θ̂1 = −zα/2 and
θ̂2 = zα/2 give the shortest interval, where zα/2 is a z-value that leaves an area of α/2 to its
right (Figure 7.2). Owing to the symmetry of the density curve, the same area α/2 lies to the
left of −zα/2 . Mathematically,

P[Z < −zα/2 ] = P[Z > zα/2 ] = α/2 or P[−zα/2 < Z < zα/2 ] = 1 − α

The derivation of the confidence limits can be found at the Appendix.

Variance σ 2 Known
Suppose that a random sample X1 , X2 , . . . , Xn has a common probability distribution with
unknown mean µ but known variance σ 2 . If n ≥ 30, the sampling distribution of X is approx-
imately normal according to the Central Limit Theorem. On the other hand, if n < 30 and

106
7.2. Confidence Interval for a Population Parameter

1−α
α/2

z
− zα/2 0 zα/2

Figure 7.2: Critical number zα/2

if we assume that the population is normal, then the distribution of X is exactly normal. In
both cases,
X−E X
 
X−µ
q   = √ ∼ N(0, 1)
VX σ/ n

Thus " #
X−µ
1 − α = P[−zα/2 < Z < zα/2 ] = P −zα/2 < √ < zα/2
σ/ n

We express the interval


X−µ
−zα/2 < √ < zα/2
σ/ n

for µ by multiplying the relation above by −σ/ n and adding X, yielding the interval
σ σ
X − zα/2 √ < µ < X + zα/2 √
n n

For a set of observations x1 , x2 , . . . , xn with mean x, a 100(1 − α)% confidence interval for
the population mean µ is
σ σ
x − zα/2 · √ < µ < x + zα/2 · √ (7.1)
n n

Different samples will yield different values of x and therefore produce different interval
estimates of the parameter µ, as shown in Figure 7.3. The dot at the center of each interval is
the position of the point estimate x for that random sample. Note that all of these intervals are
of the same width, since their widths depend only on the choice of zα/2 once x is determined. A
higher degree of confidence increases the value of zα/2 making the intervals wider. In general,
for a selection of zα/2 , 100(1 − α)% of the intervals will cover µ.

1 2 3 4 5 6 7 8 9 10

Figure 7.3: Interval estimates of µ for different samples

107
7. Statistical Intervals

Example 7.1
The average zinc concentration recovered from a sample of measurements taken in 36 dif-
ferent locations in a river is found to be 2.6 grams per milliliter. Find the 95% and 99%
confidence intervals for the mean zinc concentration in the river. Assume that the popula-
tion standard deviation is 0.3 gram per milliliter.
We have the following information: x = 2.6, n = 36 and σ = 0.3.
For a 95% confidence interval, α = 0.05 and the z-value that leaves an area α/2 = 0.025 to
its right is 1.96 (Table A-1), and we write z0.025 = 1.96. Thus, a 95% confidence interval
for the mean zinc concentration in the river is
0.3 0.3
   
2.6 − 1.96 √ < µ < 2.6 + 1.96 √
36 36
which simplifies to
2.50 < µ < 2.70
At 99% confidence level, z0.005 = 2.58. A 99% confidence interval for µ is
0.3 0.3
   
2.6 − 2.58 √ < µ < 2.6 + 2.58 √
36 36
or
2.47 < µ < 2.73

The 100(1 − α)% confidence interval provides an estimate of the accuracy of the point
estimate. If µ is actually the center value of the interval, then x estimates µ without error.
Most of the time, however, x will not be exactly equal to µ and the point estimate will be
in error. The size of this error will be the absolute value of the difference between µ and x,
and this difference will not exceed the quantity zα/2 · √σn , called the margin of error, with a
100(1 − α)% degree of confidence.
Frequently, the size of the sample is desired to ensure that the margin of error in estimating
µ will be less than a specified amount ε, that is,
σ
zα/2 · √ ≤ ε
n

Solving this equation gives the following formula for n.


2
σzα/2

n≥ (7.2)
ε

When solving for the sample size n, all fractional values are rounded up to the next whole
number. By adhering to this principle, the degree of confidence never falls below 100(1 − α)%.
Strictly speaking, the formula in Equation (7.2) is applicable only if the variance of the
population is known prior to selecting the sample. Lacking this information, a preliminary
sample of size n ≥ 30 may be taken to provide an estimate of σ. Then, using s as an
approximation for σ, the number of observations can be determined to provide the needed
degree of accuracy.
Example 7.2
How large a sample is required if a 95% degree of confidence is desired that the estimate of
µ in Example 7.1 is off by less than 0.05?
2 2
σzα/2 0.3 × 1.96
 
n≥ = = 138.3
ε 0.05

108
7.2. Confidence Interval for a Population Parameter

Variance σ 2 Unknown
Suppose that the population of interest has a normal distribution with unknown mean µ and
unknown variance σ 2 . Assume that a random sample of size n is available, and let X and S 2
be the sample mean and variance, respectively. The statistic
X−µ

σ/ n
cannot be used to construct a confidence interval since σ is not known. A reasonable choice is
to replace σ with the sample standard deviation S. A logical question is what effect replacing σ
with S has on the distribution of the statistic. If n is large, the answer to this question is “very
little,” and we can proceed to use the confidence interval based on the normal distribution.
However, n is usually small in most engineering problems, and in this situation, the resulting
statistic is not necessarily normal. A different distribution must be employed to construct the
confidence interval.
Definition 7.2
Let a random sample of size n be from a normal distribution with unknown mean µ and
unknown variance σ 2 . The random variable
X−µ
T = √
S/ n

has a Student’s t-distribution with ν = n − 1 degrees of freedom.

The probability density function of the T(ν) random variable is


 
Γ ν+1
2 1
f (x) = √ · h i(ν+1)/2
Γ ν
νπ

x2
2
ν +1

where ν is the number of degrees of freedom. Its mean and variance are µ = 0 and σ 2 = ν−2 ν

respectively, where ν > 2.


Several t density curves are shown in Figure 7.4. The general appearance of the t distribution
is similar to the standard normal distribution in that both distributions are symmetric and
unimodal, and the maximum ordinate value is reached when the mean µ = 0. However, the t
distribution has heavier tails than the normal; that is, it has more probability in the tails than
does the normal distribution. As the number of degrees of freedom ν → ∞, the limiting form
of the t distribution is the standard normal distribution. Generally, the number of degrees
of freedom for t is the number of degrees of freedom associated with the estimated standard
deviation.

ν=2
ν=5
ν=9
N(0, 1)

Figure 7.4: Probability density functions of the t distribution

Table A-2 provides percentage points of the t distribution. We will let tα,ν be the value of
the random variable T(ν) with ν degrees of freedom above which is an area (or probability)

109
7. Statistical Intervals

t
tα,ν

Figure 7.5: Percentage point and critical number of the t distribution

α. Thus, tα,ν is an upper-tailed 100α percentage point of the t distribution with ν degrees of
freedom. This percentage point is shown in Figure 7.5. A section of the table is shown below.
The α values are the column headings, and the degrees of freedom are listed in the left column.
To illustrate the use of the table, note that the t-value with 10 degrees of freedom having an
area of 0.05 to the right is t0.05,10 = 1.812. That is,
h i
P T(10) > 1.812 = 0.05

Table 7.1: Critical value tα,ν at 100α percentage points of the t distribution
α
ν 0.1 0.05 0.025 0.01 0.005 0.0025 0.0005
1 3.078 6.314 12.706 31.821 63.657 127.321 636.619
.. .. ..
. . .
10 ··· 1.812 ··· 4.587
.. .. ..
. . .
∞ 1.282 1.645 1.960 2.326 2.576 2.807 3.291

We can easily find a 100(1 − α)% confidence interval on the mean of a normal distribution
with unknown variance by proceeding essentially as done with a normal distribution.
The distribution of
X−µ

S/ n
is T(ν) with ν = n − 1 degrees of freedom. Letting tα/2,ν be the upper 100(α/2) percentage
point of the t distribution with ν degrees of freedom, then

P[−tα/2,ν < T < tα/2,ν ] = 1 − α

or " #
X−µ
P −tα/2,ν < √ < tα/2,ν = 1 − α
S/ n
Rearranging the inequality yields
S S
 
P X − tα/2,ν √ < µ < X + tα/2,ν √ = 1 − α
n n
For a random sample of observations x1 , x2 , . . . , xn with mean x and standard deviation s,
a 100(1 − α)% confidence interval for the mean µ when the variance σ 2 is unknown, is
s s
x − tα/2,ν · √ < µ < x + tα/2,ν · √ (7.3)
n n

110
7.2. Confidence Interval for a Population Parameter

The assumption underlying this confidence interval is that the sample is from a normal
population. However, the t distribution-based confidence interval is relatively insensitive or
robust to this assumption. Checking the normality assumption by constructing a normal
probability plot of the data is a good general practice. Small to moderate departures from
normality are not a cause for concern.
Example 7.3
The contents of seven similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2,
and 9.6 liters. Find a 95% confidence interval for the mean contents of all such containers,
assuming an approximately normal distribution.
The sample mean and variance of the given data are x = 10.0 and s2 = 0.08a . Referring to
Table A-2, t0.025,6 = 2.447. Hence, a 95% confidence interval for µ is
√ √
0.08 0.08
! !
10.0 − 2.447 √ < µ < 10.0 + 2.447 √
7 7

which reduces to 9.74 < µ < 10.26.


Perhaps a good summary of these results is to say that there is high degree of confidence that the
mean content of similar containers of sulfuric acid is somewhere between 9.74 and 10.26. This is
very useful information because it indicates the content of similar sulfuric acid containers to be
expected over a certain period of time.
a
A user guide for statistical calculation of the sample variance with Casio fx-570/991 ES PLUS is included
in the Appendix.

Often statisticians recommend that even when normality cannot be assumed, σ is unknown,
and n ≥ 30, s can replace σ and the confidence interval
s s
x − zα/2 √ < µ < x + zα/2 √ (7.4)
n n

may be used. This is often referred to as a large-sample confidence interval. The justification
lies only in the presumption that with a sample as large as 30 and the population distribution
not too skewed, S will be very close to the true σ and thus the Central Limit Theorem
prevails. It should be emphasized that this is only an approximation and the quality of the
result becomes better as the sample size grows larger.
Example 7.4
The mathematics scores of a random sample of 500 senior high school students in an aptitude
test are collected, and the sample mean and standard deviation are found to be 501 and
112, respectively. Find a 99% confidence interval on the mean mathematics scores for the
students.
It is reasonable to use the normal approximation since the sample size is large. From the
table of standard normal probabilities, z0.005 = 2.58. Hence, a 99% confidence interval for
µ is
112 112
   
501 − 2.58 √ < µ < 501 + 2.58 √
500 500
which yields
488.1 < µ < 513.9

7.2.2. Population Proportion


Consider the random sample of size n, where each Xi is a Bernoulli random variable. The
statistic X = P̂ , the sample proportion (having the characteristic), is an unbiased estimator

111
7. Statistical Intervals

of the population proportion p, with variance pq/n.


According to the Central Limit Theorem, the limiting form of

P̂ − p
p
p(1 − p)/n
is the standard normal random variable. Thus, a 100(1 − α)% confidence interval for the true
population proportion p satisfies
" #
P̂ − p
P −zα/2 <p < zα/2 = 1 − α
p(1 − p)/n
This may be rearranged as
 s s 
p(1 − p) p(1 − p) 
PP̂ − zα/2 < p < P̂ + zα/2 =1−α
n n

Unfortunately, the upper and lower limits of the confidence interval contain the p unknown
parameter p. A solution
q that is often satisfactory is to replace the standard error p(1 − p)/n
by its estimate, P̂ (1 − P̂ )/n which results in
 s s 
P̂ (1 − P̂ ) P̂ (1 − P̂ ) 
PP̂ − zα/2 < p < P̂ + zα/2 ≈1−α
n n

If p̂ is the proportion of observations in a random sample of size n, an approximate 100(1 − α)%


confidence interval on the proportion p of the population is
s s
p̂(1 − p̂) p̂(1 − p̂)
p̂ − zα/2 < p < p̂ + zα/2 (7.5)
n n

Example 7.5
In a random sample of 85 automobile engine crankshaft bearings, 10 have a surface finish
that is rougher than the specifications allow. Find a 95% confidence interval for the true
proportion of bearings that exceeds the speficied roughness.
A point estimate of the proportion of bearings in the population that exceeds the roughness
specification is
10 2
p̂ = =
85 17
A 95% confidence interval for p is
v    v   
u 2 15 u 2 15
u u
2 t 17 17 2 t 17 17
− 1.96 <p< + 1.96
17 85 17 85
which simplifies to 0.049 < p < 0.186.
In situations when the sample size can be selected, we may choose the sample size n to be
100(1 − α)% confident that the margin of error is less than some specified value ε. Obtained
in the manner similar to Equation (7.2), the desired sample size is
2
zα/2

n ≥ pq (7.6)
ε
An estimate of p is required to use Equation (7.6). If an estimate p̂ from a previous sample
is available, it can be substituted for p in the equation, or perhaps a subjective estimate can be

112
7.2. Confidence Interval for a Population Parameter

made. If these alternatives are unsatisfactory, a preliminary sample can be taken, p̂ computed,
and the equation used to determine how many additional observations are required to estimate
p with the desired accuracy. Another approach to choosing n uses the fact that the sample
size from Equation (7.6) will always be a maximum for p = 0.5, that is, p(1 − p) ≤ 0.25, and
this can be used to obtain an upper bound on n. In other words, there is at least 1 − α degree
of confidence that the error in estimating p by p̂ is less than ε if the sample size is
2
zα/2

n = 0.25 (7.7)
ε
Example 7.6
Consider the situation in Example 7.5. How large a sample is required if a 95% degree of
confidence is desired that the error in using p̂ to estimate p is less than 0.05?
2
We use p̂ = 17 as an initial estimate of p. The required sample size is
2 2
zα/2 2 15 1.96
   
n ≥ p̂q̂ = = 160
ε 17 17 0.05
If a 95% degree of confidence is desired that the estimate p̂ of the true proportion p was
within 0.05 regardless of the value of p, the sample size can be determined from Equation (7.7).
2 2
zα/2 1.96
 
n = 0.25 = 0.25 = 385
ε 0.05
Comparing the sample sizes obtained by applying Equations (7.6) and (7.7), we see that
information concerning p, provided by a preliminary sample or from experience, enables us to
choose a smaller sample while maintaining our required degree of accuracy.
There is different way to construct a confidence interval on a binomial proportion than the
traditional approach in Equation 7.5. Starting with the interval
p̂ − p
−zα/2 < q < zα/2
p(1−p)
n

the inequality may be solved for p. The resulting confidence interval for p is
s s
2np̂ + zα2 /2 nzα/2 p̂q̂ zα2 /2 2np̂ + zα2 /2 nzα/2 p̂q̂ zα2 /2
2
− + < p < + + (7.8)
2 n + zα/2 n + zα2 /2 n 4n 2 2
2 n + zα/2 n + zα2 /2 n 4n2
Example 7.7
Reconsider the crankshaft bearing data introduced in Example 7.5. Construct a 95% con-
fidence interval θ̂1 < p < θ̂2 using Equation (7.8).
v   
u 2 15
1.962 1.962
s
zα2 /2 p̂q̂ zα2 /2 t 17 17
u
2
p̂ + + zα/2 + 17 + + (1.96) +
2n n 4n2 170 85 4(85)2
θ̂2 = = = 0.203
zα2 /2 1.962
1+ 1+
n 85
v   
u 2 15
1.962 1.962
s
zα2 /2 p̂q̂ zα2 /2 t 17 17
u
2
p̂ + − zα/2 + 17 + − (1.96) +
2n n 4n2 170 85 4(85)2
θ̂1 = 2 = = 0.065
zα/2 1.962
1+ 1+
n 85
Note that Equation (7.8) yields more accurate results for small sample size n. However, it
is more complicated to calculate, and the gain in accuracy that it provides diminishes when
the sample size is large enough. Hence, Equation (7.5) is commonly used in practice.

113
7. Statistical Intervals

7.2.3. Population Variance


Let a random sample of size n be from a normal population with unknown variance σ 2 . The
100(1 − α)% confidence interval for σ 2 shall be based on the unbiased point estimator S 2 and
the statistic
(n − 1)S 2
σ2
which has a chi-squared distribution with n − 1 degrees of freedom (Section 6.3.3).
We proceed in a manner similar to the interval estimation for the population mean µ. A
two-sided confidence interval θ̂1 < σ 2 < θ̂2 is determined satisfying

P θ̂1 < σ 2 < θ̂2 = 1 − α


h i

Utilizing the chi-squared distribution χ2(ν) , the ends of the shortest interval u and v are desired
such that
2
h i
P u < X(ν) <v =1−α
Due to the lack of symmetry of the chi-squared distribution a shown in Figure 6.4, the lower
and upper limits are chosen to leave an area α/2 on each side. The critical numbers are
denoted χ21−α/2,ν and χ2α/2,ν , respectively. Figure 7.6 shows the critical numbers. It should be
noted that the interval obtained in this manner is not the shortest.

1−α α/2

χ21−α/2,ν χ2α/2,ν

Figure 7.6: Critical values that leave an area α/2

Table A-3 provides the percentage points of the χ2(ν) distribution. The α values are the
column heading, and the degrees of freedom are listed in the first column. To illustrate the
use of the table, note that the chi-squared value with 10 degrees of freedom having an area of
0.90 to the right is χ20.90,10 = 4.87. That is,

2
h i
P X(10) > 4.87 = 0.90

The 100(1 − α)% confidence interval for σ 2 is

(n − 1)S 2 2 (n − 1)S 2
< σ <
χ2α/2,ν χ21−α/2,ν

Given a sample of observations x1 , x2 , . . . , xn with sample variance s2 , a 100(1 − α)% confi-


dence interval for σ 2 is
(n − 1)s2 2 (n − 1)s2
< σ < (7.9)
χ2α/2,ν χ21−α/2,ν
Example 7.8
The following are the weights, in decagrams, of 10 packages of grass seed distributed by a
certain company: 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2, and 46.0. Find a 95%
confidence interval for the standard deviation of the weights of all such packages of grass

114
7.2. Confidence Interval for a Population Parameter

seed distributed by this company, assuming a normal population.


With a calculator, we find that s2 = 0.286. From Table A-3,

χ20.025,9 = 19.02 χ20.975,9 = 2.70

Therefore, a 95% confidence interval for σ is


s s
(9)(0.286) (9)(0.286)
<σ<
19.02 2.70

or simply 0.368 < σ < 0.976.

7.2.4. One-Sided Confidence Intervals


The confidence intervals and resulting confidence bounds discussed thus far are two-sided (i.e.,
both upper and lower bounds are given). However, there are many applications in which only
one bound is sought. For example, if the measurement of interest is tensile strength, the
engineer receives better information from a lower bound only. This bound communicates the
worst-case scenario. On the other hand, if the measurement is something for which a relatively
large value of µ is not profitable or desirable, then an upper confidence bound is of interest. An
example would be a case in which inferences need to be made concerning the mean mercury
composition in a river. An upper bound is very informative in this case.
One-sided confidence bounds are developed in the same fashion as two-sided intervals. How-
ever, the source is a one-sided probability statement that makes use of the Central Limit
Theorem: " #
X−µ
P √ < zα = 1 − α
σ/ n
We can then manipulate the probability statement much as before and obtain

σ
 
P µ > X − zα √ = 1 − α
n

Similar manipulation of P[−zα < Z] = 1 − α yields

σ
 
P µ < X + zα √ = 1 − α
n

As a result, the lower and upper bounds respectively follow.


If X is the mean of a random sample of size n from a population with variance σ 2 , the
one-sided 100(1 − α)% confidence bounds for µ are given by
σ
µ < X + zα √ (upper one-sided bound)
n
σ
X − zα √ <µ (lower one-sided bound)
n

Example 7.9
Find an upper bound for the 99% one-sided confidence interval for the zinc concentration
in Example 7.1.
We find z0.01 = 2.33 in Table A-1.
σ 0.3
 
x + zα √ = 2.6 + 2.33 √ = 2.72
n 36

115
7. Statistical Intervals

We can be 99% confident that the mean zinc concentration in the river will not exceed 2.72 grams
per milliliter.

One-sided confidence bounds on the mean of a normal distribution are also of interest and
are easy to find. Simply use the appropriate lower or opper interval and replace the unknown
standard deviation σ with its estimate s and the critical value zα with tα,ν . The intervals are
given below.
s
µ < x + tα √ (upper one-sided bound)
n
s
x − tα √ <µ (lower one-sided bound)
n
Example 7.10
Hospital workers who are routinely involved in administering radioactive tracers to patients
are subject to a radiation exposure emanating from the skin of the patient. In an experiment
to assess the amount of this exposure, radiation levels were measured at a distance of 50 cm
from n = 28 patients who had been injected with a radioactive tracer, and a sample mean
x = 5.145 and sample standard deviation s = 0.7524 are obtained. Find an upper bound of
a 99% confidence interval for µ.
s 0.7524
   
µ ≤ x + t0.01,27 √ = 5.145 + (2.473) √ = 5.497
n 28
One-sided confidence intervals for a population proportion p can be used instead of a two-
sided confidence interval in an experimenter is interested in obtaining only an upper bound or a
lower bound on the population proportion. Their format is similar to the two-sided confidence
interval except that a critical point zα is employed in place of zα/2 . The constructed one-sided
confidence intervals are
s
p̂q̂
p < p̂ + zα (upper one-sided bound)
n
s
p̂q̂
p̂ − zα <p (lower one-sided bound)
n
Example 7.11
Consider the cattle inoculation problem in Example 6.8. Suppose that the vaccine can be
approved for widespread use if it can be established that on average no more than one in a
thousand cattle will suffer a serious adverse reaction. Is the number of observed cattle that
suffer the reaction low enough for the vaccine to be approved for widespread use, at 99%
level of confidence?
We seek an upper bound for p. We have p̂ = 0.000744 and z0.01 = 2.33. Thus, an upper
bound for p is s
p̂(1 − p̂)
p̂ + 2.33 = 0.0008
n
The vaccine will be approved for widespread use at 99% degree of confidence.
The one-sided confidence intervals for σ 2 are obtained by replacing α/2 by the one-sided
area α. The confidence intervals are given below.
(n − 1)s2
σ2 < (upper one-sided bound)
χ21−α,ν
(n − 1)s2
<σ 2 (lower one-sided bound)
χ2α,ν

116
7.3. Prediction Intervals

7.3. Prediction Intervals


In some problem situations, predicting a future observation of a variable is of interest. This is
a different problem than estimating the mean of that variable, so a confidence interval is not
appropriate. In this section, a 100(1 − α)% prediction interval on a future value of a normal
random variable will be obtained.
Suppose that X1 , X2 , . . . , Xn is a random sample from a normal population, and the objec-
tive is to predict the value of Xn+1 , a single future observation. A point estimator of Xn+1 is
X, the sample mean. The random variable Xn+1 − X is called the prediction error, whose
expected value is
E Xn+1 − X = µ − µ = 0
 

The variance of the prediction error is

σ2 
V Xn+1 − X = V[Xn+1 ] + V X = σ 2 + = 1 + n1 σ 2
    
n
because the future observation Xn+1 is independent of the mean and the random sample X.
The random variable Xn+1 − X follows a normal distribution. Therefore,

(Xn+1 − X) − E Xn+1 − X
 
Xn+1 − X
q  = q ∼ N(0, 1)
1
V Xn+1 − X σ 1+

n

The resulting 100(1 − α)% confidence interval for Xn+1 is


q q
1 1
X − zα/2 σ 1 + n < Xn+1 < X + zα/2 σ 1 + n (7.10)

If the population variance σ 2 is unknown, but the distribution of the population is normal,
then the prediction interval for Xn+1 is
q q
1 1
X − tα/2,ν S 1 + n < Xn+1 < X + tα/2,ν S 1 + n (7.11)

Example 7.12
A meat inspector has randomly selected 30 packs of 95% lean beef. The sample resulted in
a mean of 96.2% with a sample standard deviation of 0.8%. Find a 99% prediction interval
for the leanness of a new pack. Assume normality.
The critical number of the t-distribution is t0.005,29 = 2.756. Hence, a 99% prediction
interval of a new observation Xn+1 is
q q
1 1
96.2 − (2.756)(0.8) 1 + 30 < Xn+1 < 96.2 + (2.756)(0.8) 1 + 30

which simplifies to 93.96 < Xn+1 < 98.44.

The derivation of the lower and upper one-sided confidence bounds of the prediction intervals
is left as an exercise.
To this point very little attention has been paid to the concept of outliers, or aberrant
observations. The majority of scientific investigators are keenly sensitive to the existence of
outlying observations or so-called faulty or “bad data.” Outlier detection is certainly of interest
here since there is an important relationship between outlier detection and prediction intervals.
It is convenient to view an outlying observation as one that comes from a population with
a mean that is different from the mean that governs the rest of the sample of size n being
studied. The prediction interval produces a bound that “covers” a future single observation
with probability 1 − α if it comes from the population from which the sample was drawn.

117
7. Statistical Intervals

As a result, a methodology for outlier detection involves the rule that an observation is an
outlier if it falls outside the prediction interval computed without including the questionable
observation in the sample. As a result, for the prediction interval of Example 7.12, if a new
pack of beef is measured and its leanness is outside the interval (93.96, 98.44), that observation
can be viewed as an outlier.

7.4. Tolerance Intervals


As discussed in the previous chapter, the scientist or engineer may be less interested in estimat-
ing parameters than in gaining a notion about where an individual observation or measurement
might fall. Such situations call for the use of prediction intervals. However, there is yet a third
type of interval that is of interest in many applications. Once again, suppose that interest
centers around the manufacturing of a component part and specifications exist on a dimension
of that part. In addition, there is little concern about the mean of the dimension. But unlike
in the scenario in Section 7.3, one may be less interested in a single observation and more inter-
ested in where the majority of the population falls. If process specifications are important, the
manager of the process is concerned about long-range performance, not the next observation.
One must attempt to determine bounds that, in some probabilistic sense, “cover” values in
the population (i.e., the measured values of the dimension).
One method of establishing the desired bounds is to determine a confidence interval on a
fixed proportion of the measurements. This is best motivated by visualizing a situation in
which random sampling is done from a normal distribution with known mean µ and variance
σ 2 . Clearly, a bound that covers the middle 95% of the population of observations is
µ ± 1.96σ
This is called a tolerance interval. However, in practice, µ and σ are seldom known; thus,
the user must apply
x ± 1.96s
However, because of sampling variability in x and s, it is likely that this interval will contain
less than 95% of the values in the population. The solution to this problem is to replace 1.96
with some value k that will make the proportion of the distribution contained in the interval
95% with some level of confidence. This leads to a definition.
Definition 7.3
For a normal distribution of measurements with unknown mean µ and unknown standard
deviation σ, tolerance limits are given by

x ± ks

where k is determined such that one can assert with 100(1 − α)% confidence that the
given limits contain at least the fraction p of the measurements. The fraction p is also
called the coverage of the tolerance interval.

Example 7.13
Consider Example 7.12. With the information given, find a tolerance interval that gives two-
sided 95% bounds on 90% of the distribution of packages of 95 percent lean beef. Assume
the data came from an approximately normal distribution.
Recall that n = 30, x = 96.2 and s = 0.8. From Table A-5, k = 2.140.

x − ks = 96.2 − 2.140(0.8) = 94.49


x + ks = 96.2 + 2.140(0.8) = 97.91

118
7.4. Tolerance Intervals

The meat inspector can be 95% confident that at least 90% of the distribution of packages
will be 94.49 to 97.91 percent lean beef.

Distinction among Confidence Intervals, Prediction Intervals,and Tolerance Intervals


It is important to reemphasize the difference among the three types of intervals discussed and
illustrated in the preceding sections. The computations are straightforward, but interpretation
can be confusing. In real-life applications, these intervals are not interchangeable because their
interpretations are quite distinct.
In the case of confidence intervals, we are attentive only to the population parameter. If a
specification will be set by which a customer will not accept an item, the population parameter
must take a backseat. It is important that the engineer know where the majority of the values
are going to be. Thus, tolerance limits should be used. Surely, when tolerance limits on
any process output are tighter than process specifications, that is good news for the process
manager.
It is true that the tolerance limit interpretation is somewhat related to the confidence in-
terval. The 100(1 − α)% tolerance interval on, say, the proportion 0.95 can be viewed as a
confidence interval on the middle 95% of the corresponding normal distribution.
Prediction intervals are applicable when it is important to determine a bound on a single
value. The mean is not the issue here, nor is the location of the majority of the population.
Rather, the location of a single new observation is required.
Example 7.14 (Machine Quality)
A machine produces metal pieces that are cylindrical in shape. A sample of these pieces is
taken and the diameters are found to be 1.01, 0.97, 1.03, 1.04, 0.99, 0.98, 0.99, 1.01, and 1.03
centimeters. Use these data to calculate three interval types and draw interpretations that
illustrate the distinction between them in the context of the system. For all computations,
assume an approximately normal distribution. The sample mean and standard deviation
for the given data are x = 1.0056 and s = 0.0246.

(a) Find a 99% confidence interval on the mean diameter.


The 99% confidence interval for the mean diameter is given by
s 0.0246
x ± −tα/2,ν √ = 1.0056 ± 3.355 √ = 1.0056 ± 0.0275
n 9
Thus, the 99% confidence bound are 0.9781 and 1.0331.

(b) Compute a 99% prediction interval on a measured diameter of a single metal piece taken
from the machine.
The 99% prediction interval for a future observation is given by
q q
1 1
x ± tα/2,ν s 1 + n = 1.0056 ± 3.355(0.0246) 1 + 9 = 1.0056 ± 0.0870

with the bounds being 0.9186 and 1.0926.

(c) Find the 99% tolerance limits that will contain 95% of the metal pieces produced by
this machine.
From Table A-5 for n = 9, 1 − α = 0.99 and p = 0.95, the value of k is 4.550 for two-sided
limits. Hence, the 99% tolerance limits are given by

x ± ks = 1.0056 ± 4.550(0.0246)

with the bounds being 0.8937 and 1.1175. One can be 99% confident that such interval will
contain the central 95% of the distribution of diameters produced.

119
7. Statistical Intervals

The example illustrates that the three types of limits can give appreciably different results
even though they are all 99% bounds. In the case of the confidence interval on the mean, 99%
of such intervals cover the population mean diameter. Thus, we can say with 99% confidence
that the mean diameter produced by the process is between 0.9781 and 1.0331 centimeters.
Emphasis is placed on the mean, with less concern about a single reading or the general nature
of the distribution of diameters in the population. In the case of the prediction limits, the
bounds 0.9186 and 1.0926 are based on the distribution of a single “new” metal piece taken from
the process, and again 99% of such limits will cover the diameter of a new measured piece. On
the other hand, the tolerance limits give the engineer a sense of where the “majority,” say the
central 95%, of the diameters of measured pieces in the population reside. The 99% tolerance
limits, 0.8937 and 1.1175, are numerically quite different from the other two bounds. If these
bounds appear alarmingly wide to the engineer, it reflects negatively on process quality. On
the other hand, if the bounds represent a desirable result, the engineer may conclude that a
majority (95% in here) of the diameters are in a desirable range. Again, a confidence interval
interpretation may be used: namely, 99% of such calculated bounds will cover the middle 95%
of the population of diameters.

7.5. Confidence Interval Comparing Two Population Parameters


This section covers confidence interval estimation for the difference of parameters θ1 − θ2 of
two independent populations. We will find that the construction of such intervals depend on
the distribution of h i
θ̂1 − θ̂2 − E θ̂1 − θ̂2

r h i
V θ̂1 − θ̂2

and the distribution of the two population themselves.


A similar argument can be made for the case of confidence interval estimation on the dif-
ference of proportions of two independent populations or on the ratio of variances of two
independent normal populations.

The Experimental Conditions and the Experimental Unit


For the case of confidence interval estimation on the difference between two means, we need to
consider the experimental conditions in the data-taking process. It is assumed that there are
two independent random samples from distributions with means µ1 and µ2 , respectively. It is
important that experimental conditions emulate this ideal described by these assumptions as
closely as possible. Quite often, the experimenter should plan the strategy of the experiment
accordingly. For almost any study of this type, there is a so-called experimental unit, which
is that part of the experiment that produces experimental error and is responsible for the
population variance σ 2 . In a drug study, the experimental unit is the patient or subject. In
an agricultural experiment, it may be a plot of ground. In a chemical experiment, it may be
a quantity of raw materials. It is important that differences between the experimental units
have minimal impact on the results. The experimenter will have a degree of insurance that
experimental units will not bias results if the conditions that define the two populations are
randomly assigned to the experimental units.

7.5.1. Confidence Interval for the Difference of Two Means


Variances Known
Suppose two independent populations have means µ1 and µ2 and variances σ12 and σ22 , respec-
tively. Suppose that two random samples, one from each population, have sizes n1 and n2

120
7.5. Confidence Interval Comparing Two Population Parameters

respectively. A 100(1 − α)% confidence interval for the difference of the means µ1 − µ2 must
satisfy h i
P θ̂1 < µ1 − µ2 < θ̂2 = 1 − α

A natural choice for the estimator of µ1 − µ2 is the difference X 1 − X 2 , with expected value
and variance

E X 1 − X 2 = µ1 − µ 2
 

σ12 σ22
V X1 − X2 = +
 
n1 n2
If the two populations follow a normal distribution, then the statistic
 
X 1 − X 2 − (µ1 − µ2 )
r
σ12 σ22
n1 + n2

is the standard normal random variable. On the other hand, if the populations are not normally
distributed but the conditions of the Central Limit Theorem are met, then the distribution of
the statistic is approximately normal. in both cases,
 
X 1 − X 2 − (µ1 − µ2 )
 

P−zα/2 < q < zα/2  = 1 − α


σ12 /n1 + σ22 /n2

If x1 and x2 are means of independent random samples of size n1 and n2 from populations
with known variances σ12 and σ22 , a 100(1 − α)% confidence interval for the difference of the
population means µ1 − µ2 is
s s
σ12 σ22 σ12 σ22
(x1 − x2 ) − zα/2 + < µ1 − µ2 < (x1 − x2 ) + zα/2 + (7.12)
n1 n2 n1 n2

Example 7.15
A study was conducted in which two types of engines, A and B, were compared. Gas
mileage, in miles per gallon, was measured. Fifty experiments were conducted using engine
type A and 75 experiments were done with engine type B. The gasoline used and other
conditions were held constant. The average gas mileage was 36 miles per gallon for engine
A and 42 miles per gallon for engine B. Find a 96% confidence interval on µB − µA , where
µA and µB are population mean gas mileages for engines A and B, respectively. Assume
that the population standard deviations are 6 and 8 for engines A and B, respectively.
The point estimate of µB −µA is xB −xA = 42−36 = 6. Using α = 0.04, we find z0.02 = 2.05
from Table A-1. Hence, with substitution in the formula above, the 96% confidence interval
is
64 36 64 36
r r
6 − 2.05 + < µB − µA < 6 + 2.05 +
75 50 75 50
or simply 3.43 < µB − µA < 8.57.

This procedure for estimating the difference between two means is applicable if σ12 and σ22 are
known. If the variances are not known and the two distributions involved are approximately
normal, the t-distribution becomes involved, as in the case of a single sample. If one is not
willing to assume normality, large samples (say greater than 30) will allow the use of s1 and
s2 in place of σ1 and σ2 , respectively, with the rationale that s1 ≈ σ1 and s2 ≈ σ2 . Again, of
course, the confidence interval is an approximate one.

121
7. Statistical Intervals

Variances Unknown but Assumed Equal


Consider the case where σ12 and σ22 are unknown, but there is a reason to assume that
σ12 = σ22 = σ 2 . The variance of the estimator X 1 − X 2 is
1 1
 
2
V X1 − X2 = σ +
 
n1 n2
and the statistic  
X 1 − X 2 − (µ1 − µ2 )
Z=
1 1
q
σ n1 + n2

follows a standard normal distribution. According to Equation (6.7), the statistics

(n1 − 1)S12 (n2 − 1)S22


and
σ2 σ2
are independent chi-squared random variables with (n1 − 1) and (n2 − 1) degrees of freedom.
Consequently, their sum
(n1 − 1)S12 (n2 − 1)S22
V = +
σ2 σ2
has a chi-squared distribution with

ν = n1 + n2 − 2

degrees of freedom. Thus, the statistic


Z
T =p
V /ν
has a t-distribution with ν degrees of freedom.
Therefore, a 100(1 − α)% confidence interval is
Z
−tα/2,ν < p < tα/2,ν
V /ν
By writing
(n1 − 1) S12 + (n2 − 1) S22
Sp2 = (7.13)
n1 + n2 − 2
the random variable T can be written as
 
X 1 − X 2 − (µ1 − µ2 )
T =
1 1
q
Sp n1 + n2

If x1 and x2 are means of independent random samples of size n1 and n2 , respectively, from
approximately normal populations with unknown but equal variances, a 100(1−α)% confidence
interval for µ1 − µ2 is
s s
1 1 1 1
(x1 − x2 ) − tα/2,ν · sp + < µ1 − µ2 < (x1 − x2 ) + tα/2,ν · sp + (7.14)
n1 n2 n1 n2

Example 7.16
A study was conducted to estimate the difference in the amounts of the chemical orthophos-
phorus measured at two different stations on a river. Fifteen samples were collected from
station 1, and 12 samples were obtained from station 2. The 15 samples from station 1 had

122
7.5. Confidence Interval Comparing Two Population Parameters

an average orthophosphorus content of 3.84 milligrams per liter and a standard deviation
of 3.07 milligrams per liter, while the 12 samples from station 2 had an average content of
2.49 milligrams per liter and a standard deviation of 1.65 milligram per liter. Find a 95%
confidence interval for the difference in the true average orthophosphorus contents at these
two stations, assuming that the observations came from normal populations with equal
variances.
The point estimate for µ1 − µ2 is x1 − x2 = 3.84 − 2.49 = 1.35. The pooled variance s2p using
Equation (7.13) is s
14(3.072 ) + 11(1.652 )
sp = = 2.545
15 + 12 − 2
The critical number is t0.025,ν = 2.060 at ν = 15+12−2 = 25 degrees of freedom. Therefore,
the 95% confidence interval is

1 1
r q
1 1
2.35 − 2.060(2.545) + < µ1 − µ2 < 2.35 + 2.060(2.545) 15 + 12
15 12
or simply −0.76 < µ1 − µ2 < 3.46.
Since 0 is in the interval, we can safely say with 95% confidence that the amount of orthophosphorus
from the two different stations on the river are not different.

Equal Sample Sizes


The procedure for constructing confidence intervals for µ1 − µ2 with σ1 = σ2 = σ unknown
requires the assumption that the populations are normal. Slight departures from either the
equal variance or the normality assumption do not seriously alter the degree of confidence
for our interval. (A procedure is presented in Section 7.5.3 for testing the equality of two
unknown population variances based on the information provided by the sample variances.)
If the population variances are considerably different, we still obtain reasonable results when
the populations are normal, provided that n1 = n2 . Therefore, in planning an experiment, one
should make every effort to equalize the size of the samples.

Variances Unknown and Assumed Unequal


Consider the problem of finding an interval estimate of µ1 − µ2 when the unknown population
variances are not likely to be equal. The statistic most often used in this case is
 
X 1 − X 2 − (µ1 − µ2 )
T = r
S12 S22
n1 + n2

which has approximately a t-distribution with ν degrees of freedom, where


s21 s22 2
 
n1 + n2
ν= (s21 /n1 )2 (s22 /n2 )2
n1 −1 + n2 −1

Since ν is seldom an integer, it is rounded down to the nearest integer.


If random samples from two independent and normally distributed populations have means
x1 and x2 and variances s21 and s22 , a 100(1 − α)% confidence interval for the difference of two
means is given by
s s
s21 s2 s21 s2
(x1 − x2 ) − tα/2,ν + 2 < µ1 − µ2 < (x1 − x2 ) + tα/2,ν + 2 (7.15)
n 1 n2 n1 n2

123
7. Statistical Intervals

Example 7.17
Rework Example 7.16 assuming that the population variances are not equal.
The degrees of freedom is
2
3.072 1.652

15 + 12
ν= (3.072 /15)2 (1.652 /12)2
= 22.44
15−1 + 12−1

which is always rounded down to the nearest integer. Hence, ν = 22 and t0.025,22 = 2.074.
The 95% confidence interval is
s s
3.072 1.652 3.072 1.652
2.35 − 2.074 + < µ1 − µ2 < 2.35 + 2.074 +
15 12 15 12

which simplifies to −0.58 < µ1 − µ2 < 3.28.

Confidence Interval for Paired Observations


At this point, we shall consider estimation procedures for the difference of two means when
the samples are not independent and the variances of the two populations are not necessarily
equal. The situation considered here deals with a very special experimental condition, namely
that of paired observations. Unlike in the situations described earlier, the conditions of the
two populations are not assigned randomly to experimental units. Rather, each homogeneous
experimental unit receives both population conditions; as a result, each experimental unit has
a pair of observations, one for each population. For example, if we run a test on a new diet
using 15 individuals, the weights before and after going on the diet form the information for our
two samples. The two populations are “before” and “after,” and the experimental unit is the
individual. Obviously, the observations in a pair have something in common. To determine if
the diet is effective, we consider the differences d1 , d2 , . . . , dn in the paired observations. These
differences are the values of a random sample D1 , D2 , . . . , Dn from a population of differences
that we shall assume to be normally distributed with mean µD = µ1 − µ2 and variance σD 2.
2 2
We estimate σD by sd , the variance of the differences that constitute our sample. The point
estimator of µD is given by D.
A 100(1 − α)% confidence interval for µD can be established by writing
P[−tα/2,ν < T < tα/2,ν ] = 1 − α
It is now a routine procedure to determine an interval based on the differences of paired
observations and degrees of freedom ν = n − 1. The result is given below.
sd sd
d − tα/2,ν · √ < µD < d + tα/2,ν · √ (7.16)
n n
Example 7.18
Table 7.2 shows the radar detection distances in miles for 24 targets. The observations xi
are for the standard system and the observations yi are for the new system. This is a paired
experiment since the 24 observations are detected by both systems. Find a 90% confidence
interval in the difference in detection distances of the two systems.
We find xd = −0.261 and sd = 1.305 using a calculator. Also, t0.05,23 = 1.714. Thus, a 90%
confidence interval is
1.305 1.305
   
−0.261 − (1.714) √ < µd < −0.261 + (1.714) √
24 24
or simply −0.717 < µd < 0.196.

124
7.5. Confidence Interval Comparing Two Population Parameters

Table 7.2: Radar detection systems data set


xi yi di xi yi di
48.40 51.14 -2.74 52.03 52.37 -0.34
47.73 46.48 1.25 51.96 52.90 -0.94
51.30 50.90 0.40 49.15 50.67 -1.52
50.49 49.82 0.67 48.12 49.50 -1.38
47.06 47.99 -0.93 51.97 51.29 0.68
53.02 53.20 -0.18 53.24 51.60 1.64
48.96 46.76 2.20 55.87 54.48 1.39
52.03 54.44 -2.41 45.60 45.62 -0.02
51.09 49.85 1.24 51.80 52.24 -0.44
47.35 47.45 -0.10 47.64 47.33 0.31
50.15 50.66 -0.51 49.90 51.13 -1.23
46.59 47.92 -1.33 55.89 57.86 -1.97

7.5.2. Confidence Interval for Difference of Two Proportions


Consider the problem where we wish to estimate the difference between two binomial parame-
ters p1 and p2 . In Section 6.3.2, we found that the estimator P̂1 − P̂2 is an unbiased estimator
of p1 − p2 , its variance is
p1 q1 p2 q2
V P̂1 − P̂2 = +
 
n1 n2
Also, the random variable  
P̂1 − P̂2 − (p1 − p2 )
q
p1 p1 p2 q2
n1 + n2

is approximately standard normal provided that the quantities n1 p̂1 , n1 q̂1 , n2 p̂2 and n2 q̂2 are
all greater than 5. Thus, if p̂1 and p̂2 are proportions of success from random samples of size
n1 and n2 , a 100(1 − α)% confidence interval for p1 − p2 is
s s
p̂1 q̂1 p̂2 q̂2 p̂1 q̂1 p̂2 q̂2
(p̂1 − p̂2 ) − zα/2 + < p1 − p2 < (p̂1 − p̂2 ) + zα/2 + (7.17)
n1 n2 n1 n2
Example 7.19
A certain change in a process for manufacturing component parts is being considered. Sam-
ples are taken under both the existing and the new process so as to determine if the new
process results in an improvement. If 75 of 1500 items from the existing process are found
to be defective and 80 of 2000 items from the new process are found to be defective, find a
90% confidence interval for the true difference in the proportion of defectives between the
existing and the new process.
Let p1 and p2 be the true proportions of defectives for the existing and new processes,
respectively.
75 80
p̂1 = = 0.05 p̂2 = = 0.04
1500 2000
From Table A-1, z0.05 = 1.64. Substituting these values, we find the margin of error
s
0.05(0.95) 0.04
1.64 + 2000 = 0.0117
1500 0.96
Thus, the confidence limits are (0.05 − 0.04) ± 0.0117 and a 90% confidence interval is
−0.0017 < p1 − p2 < 0.0217

125
7. Statistical Intervals

Since the interval contains positive and negative values, there is no reason to believe that the new
process produces a significant decrease in the proportion of defectives over the existing method.

7.5.3. Confidence Interval for the Ratio of Two Variances


F Distribution
We have motivated the t-distribution in part by its application to problems in which there is
comparative sampling (i.e., a comparison between two sample means). For example, some of
our examples in future chapters will take a more formal approach, chemical engineer collects
data on two catalysts, biologist collects data on two growth media, or chemist gathers data
on two methods of coating material to inhibit corrosion. While it is of interest to let sample
information shed light on two population means, it is often the case that a comparison of
variability is equally important, if not more so. The F -distribution finds enormous application
in comparing sample variances. Applications of the F -distribution are found in problems
involving two or more samples. The statistic F is defined to be the ratio of two independent
chi-squared random variables, each divided by its number of degrees of freedom.
Definition 7.4
Let U and V be two independent random variables having chi-squared distributions
with ν1 and ν2 degrees of freedom, respectively. The distribution of the statistic
(U/ν1 )/(V /ν2 ) is given by the density function

Γ ν1 +ν
ν1 /2
xν1 /2−1
2
 
2 ν1
f (x) = (ν1 +ν2 )/2 x>0
Γ ν21 Γ ν22
 
ν2

1+ ν1
ν2 x

We will make considerable use of the random variable F in future chapters. However,
the density function will not be used and is given only for completeness. The curve of the
F -distribution depends not only on the two parameters ν1 and ν2 but also on the order in
which we state them. Once these two values are given, we can identify the curve. Typical
F -distributions are shown in Figure 7.7.

ν1 = 6, ν2 = 10
ν1 = 10, ν2 = 20
ν1 = 16, ν2 = 4

Figure 7.7: F density curves

Let fα be the f -value above which we find an area equal to α. This is illustrated by the
shaded region in Figure 7.8(b). Table A-4 gives values of fα only for α = 0.10, α = 0.05,
α = 0.025, α = 0.01 and α = 0.005 for various combinations of the degrees of freedom ν1 and
ν2 . Hence, the f -value with 6 and 10 degrees of freedom, leaving an area of 0.05 to the right,
is f0.05 (6, 10) = 3.22. Table A-4 can also be used to find values of f1−α using the formula
1
f1−α (ν1 , ν2 ) = (7.18)
fα (ν2 , ν1 )
For example,
1 1
f0.95 (6, 10) = = = 0.246
f0.05 (10, 6) 4.06

126
7.5. Confidence Interval Comparing Two Population Parameters

f1−α fα
(a) (b)

Figure 7.8: Critical numbers of the F -distribution

Comparing Two Population Variances


A point estimate of the ratio of two population variances σ12 /σ22 is given by the ratio s21 /s22 of
the sample variances. Hence, the statistic S12 /S22 is called an estimator of σ12 /σ22 .
If σ12 and σ22 are the variances of normal populations, we can establish an interval estimate
of σ12 /σ22 by using the statistic
S 2 /σ 2
F = 12 12
S2 /σ2
The random variable F has an F -distribution with ν1 = n1 − 1 and ν2 = n2 − 1 degrees of
freedom. Therefore, we may write

P[f1−α/2 (ν1 , ν2 ) < F < fα/2 (v1 , v2 )] = 1 − α

where f1−α/2 (ν1 , ν2 ) and fα/2 (v1 , v2 ) are values of the F distribution with ν1 and ν2 degrees of
freedom, leaving areas of 1 − α/2 and α/2 to the right, respectively.
Substituting for F and rearranging the inequality, we obtain

1 S12 σ2 S12
" ! !#
P < 12 < fα/2 (ν2 , ν1 ) =1−α
fα/2 (ν1 , ν2 ) S22 σ2 S22

Thus, if s21 and s22 are the variances of independent random samples of size n1 and n2 from
normal populations, a 100(1 − α)% confidence interval for σ12 /σ22 is

1 s21 σ2 s2
! !
< 12 < fα/2 (ν2 , ν1 ) 21 (7.19)
fα/2 (ν1 , ν2 ) s22 σ2 s2

An approximate 100(1 − α)% confidence interval for σ1 /σ2 is obtained by taking the square
root of each endpoint of the interval in Equation (7.19)
Example 7.20
A confidence interval for the difference in the mean orthophosphorus contents, measured
in milligrams per liter, at two stations on the river was constructed in Example 7.17 on
page 124 by assuming the normal population variance to be unequal. Justify this assumption
by constructing 98% confidence intervals for σ12 /σ22 , where σ12 and σ22 are the variances of
the populations of orthophosphorus contents at station 1 and station 2, respectively.
We have n1 = 15, n2 = 12, s1 = 3.07 and s2 = 0.80. For a 98% confidence interval, α = 0.02.
Interpolating in Table A-4, we find f0.01 (14, 11) ≈ 4.30 and f0.01 (11, 14) ≈ 3.87. Therefore,

127
7. Statistical Intervals

the 98% confidence interval for σ12 /σ22 is

1 3.072 σ2 3.072
  ! !
< 12 < 3.87
4.30 0.802 σ2 0.802

which simplifies to 3.425 < σ12 /σ22 < 56.991.

Since this interval does not allow for the possibility of σ12 /σ22 being equal to 1, we were correct in
assuming that σ12 6= σ22 in Example 7.17.

Table A-6 on page 160 lists two-sided and one-sided confidence intervals for the parameters
µ, µ1 − µ2 , p, p1 − p2 , σ 2 and σ12 /σ22 .

Exercises
7-1. Consider the one-sided confidence interval ex- find a 96% two-sided confidence interval for the pop-
pressions for a mean of a normal population. ulation mean of all bulbs produced by this firm.
a) What value of zα would result in a 90% confidence 7-7. Suppose that in Exercise 7-4 we wanted the error
interval? in estimating the mean life from the two-sided con-
b) What value of zα would result in a 95% confidence fidence interval to be five hours at 95% confidence.
interval? What sample size should be used?
c) What value of zα would result in a 99% confidence 7-8. The heights of a random sample of 50 college
interval? students showed a mean of 174.5 centimeters and a
7-2. A random sample has been taken from a nor- standard deviation of 6.9 centimeters.
mal distribution and the following confidence intervals a) Construct a 99% two-sided confidence interval for
constructed using the same data: 37.53 < µ < 49.87 the mean height of all college students.
and 35.59 < µ < 51.81). b) What can we assert with 99% confidence about
a) What is the value of the sample mean? the possible size of our error if we estimate the
b) One of these intervals is a 99% confidence interval mean height of all college students to be 174.5
and the other is a 95% confidence interval. Which centimeters?
one is the 95% confidence interval and why? 7-9. A random sample of 12 graduates of a certain
7-3. The diameter of holes for a cable harness is secretarial school typed an average of 79.3 words per
known to have a normal distribution with σ = 0.01 minute with a standard deviation of 7.8 words per
inch. A random sample of size 10 yields an average minute. Assuming a normal distribution for the num-
diameter of 1.5045 inch. Find a 99% two-sided confi- ber of words typed per minute, find a 95% two-sided
dence interval on the mean hole diameter. confidence interval for the average number of words
typed by all graduates of this school.
7-4. The life in hours of a 75-watt light bulb is known
to be normally distributed with σ = 25 hours. A ran- 7-10. Dairy cows at large commercial farms often re-
dom sample of 20 bulbs has a mean life of x = 1014 ceive injections of bST (Bovine Somatotropin), a hor-
hours. mone used to spur milk production. Bauman et al.
a) Construct a 95% two-sided confidence interval on (Journal of Dairy Science, 1989) reported that 12
the mean life. cows given bST produced an average of 28.0 kg/d of
b) Construct a 95% lower-confidence bound on the milk. Assume that the standard deviation of milk pro-
mean life. Compare the lower bound of this con- duction is 2.25 kg/d.
fidence interval with the one in part (a). a) Find a 99% two-sided confidence interval for the
true mean milk production.
7-5. Many cardiac patients wear an implanted pace-
maker to control their heartbeat. A plastic connector b) If the farms want the confidence interval to be no
module mounts on the top of the pacemaker. Assum- wider than ±1.25 kg/d, what level of confidence
ing a standard deviation of 0.0015 inch and an ap- would they need to use?
proximately normal distribution, find a 95% two-sided 7-11. In a random sample of 1000 homes in a cer-
confidence interval for the mean of the depths of all tain city, it is found that 228 are heated by oil. Find
connector modules made by a certain manufacturing 99% two-sided confidence intervals for the proportion
company. A random sample of 75 modules has an of homes in this city that are heated by oil.
average depth of 0.310 inch. 7-12. A manufacturer of MP3 players conducts a set
7-6. An electrical firm manufactures light bulbs that of comprehensive tests on the electrical functions of
have a length of life that is approximately normally its product. All MP3 players must pass all tests prior
distributed with a standard deviation of 40 hours. If to being sold. Of a random sample of 500 MP3 play-
a sample of 30 bulbs has an average life of 780 hours, ers, 15 failed one or more tests. Find a 90% two-sided

128
Exercises

confidence interval for the proportion of MP3 players 7-21. A type of thread is being studied for its tensile
from the population that pass all tests. strength properties. Fifty pieces were tested under
7-13. How large a sample is needed if we wish to be similar conditions, and the results showed an average
99% confident that our sample proportion in Exer- tensile strength of 78.3 kilograms and a standard de-
cise 7-11 will be within 0.05 of the true proportion of viation of 5.6 kilograms. Assuming a normal distribu-
homes in the city that are heated by oil? tion of tensile strengths, give a lower 95% prediction
limit on a single observed tensile strength value. In
7-14. A random sample of 20 students yielded a mean
addition, give a lower 95% tolerance limit that is ex-
of x = 72 and a variance of s2 = 16 for scores on a
ceeded by 99% of the tensile strength values.
college placement test in mathematics. Assuming the
scores to be normally distributed, construct a 98% 7-22. A random sample of 25 tablets of buffered as-
two-sided confidence interval for σ 2 . pirin contains, on average, 325.05 mg of aspirin per
tablet, with a standard deviation of 0.5 mg. Find
7-15. A manufacturer of car batteries claims that the
the 95% (two-sided) tolerance limits that will contain
batteries will last, on average, 3 years with a vari-
90% of the tablet contents for this brand of buffered
ance of 1 year. If 5 of these batteries have lifetimes
aspirin. Assume that the aspirin content is normally
of 1.9, 2.4, 3.0, 3.5, and 4.2 years, construct a 95%
distributed.
two-sided confidence interval for σ 2 and decide if the
manufacturer’s claim that σ 2 = 1 is valid. Assume 7-23. A group of human factor researchers are con-
the population of battery lives to be approximately cerned about reaction to a stimulus by airplane pilots
normally distributed. in a certain cockpit arrangement. An experiment was
conducted in a simulation laboratory, and 15 pilots
7-16. A random sample of 100 automobile owners
were used with average reaction time of 3.2 seconds
in the state of Virginia shows that an automobile is
with a sample standard deviation of 0.6 second. It is
driven on average 23,500 kilometers per year with
of interest to characterize the extreme (i.e., worst case
a standard deviation of 3900 kilometers. Assume
scenario). To that end, do the following:
the distribution of measurements to be approximately
normal. a) Give a particular important one-sided 99% confi-
dence bound on the mean reaction time. What
a) Construct a 99% two-sided confidence interval for
assumption, if any, must you make on the distri-
the average number of kilometers an automobile
bution of reaction times?
is driven annually in Virginia.
b) Give a 99% one-sided prediction interval and give
b) What can we assert with 99% confidence about
an interpretation of what it means. Must you
the possible size of our error if we estimate the
make an assumption about the distribution of re-
average number of kilometers driven by car own-
action times to compute this bound?
ers in Virginia to be 23,500 kilometers per year?
c) Compute a one-sided tolerance bound with 99%
7-17. A machine produces metal pieces that are cylin- confidence that involves 95% of reaction times.
drical in shape. A sample of pieces is taken, and the Again, give an interpretation and assumptions
diameters are found to be 1.01, 0.97, 1.03, 1.04, 0.99, about the distribution, if any.
0.98, 0.99, 1.01, and 1.03 centimeters. Find a 99%
7-24. A study was conducted to determine if a cer-
two-sided confidence interval for the mean diameter of
tain treatment has any effect on the amount of metal
pieces from this machine, assuming an approximately
removed in a pickling operation. A random sample of
normal distribution.
100 pieces was immersed in a bath for 24 hours with-
7-18. The following measurements were recorded for out the treatment, yielding an average of 12.2 millime-
the drying time, in hours, of a certain brand of latex ters of metal removed and a sample standard deviation
paint: of 1.1 millimeters. A second sample of 200 pieces was
3.4 2.5 4.8 2.9 3.6 exposed to the treatment, followed by the 24-hour im-
2.8 3.3 5.6 3.7 2.8 mersion in the bath, resulting in an average removal
4.4 4.0 5.2 3.0 4.8 of 9.1 millimeters of metal with a sample standard de-
Assuming that the measurements represent a random viation of 0.9 millimeter. Compute a 98% two-sided
sample from a normal population, find a 95% two- confidence interval estimate for the difference between
sided prediction interval for the drying time for the the population means. Does the treatment appear to
next trial of the paint. reduce the mean amount of metal removed?
7-19. Consider Exercise 7-9. Compute the 95% two- 7-25. A local government awarded grants to the agri-
sided prediction interval for the next observed number cultural departments of 9 universities to test the yield
of words per minute typed by a graduate of the secre- capabilities of two new varieties of wheat. Each vari-
tarial school. ety was planted on a plot of equal area at each uni-
7-20. A type of thread is being studied for its tensile versity, and the yields, in kilograms per plot, were
strength properties. Fifty pieces were tested under recorded as follows:
similar conditions, and the results showed an average University
tensile strength of 78.3 kilograms and a standard de- Va 1 2 3 4 5 6 7 8 9
viation of 5.6 kilograms. Assuming a normal distribu- 1 38 23 35 41 44 29 37 31 38
tion of tensile strengths, give a lower 95% prediction 2 45 25 31 38 50 33 36 40 43
limit on a single observed tensile strength value. In Find a 95% two-sided confidence interval for the mean
addition, give a lower 95% tolerance limit that is ex- difference between the yields of the two varieties, as-
ceeded by 99% of the tensile strength values. suming the differences of yields to be approximately

129
7. Statistical Intervals

normally distributed. Explain why pairing is neces- No Nitrogen Nitrogen


sary in this problem. 0.32 0.26
0.53 0.43
0.28 0.47
0.37 0.49
7-26. The following data represent the running times 0.47 0.52
of films produced by two motion-picture companies. 0.43 0.75
0.36 0.79
0.42 0.86
0.38 0.62
Company Time (minutes) 0.43 0.46
I 103 94 110 87 98
Construct a 95% two-sided confidence interval for the
II 97 82 123 92 175 88 118
difference in the mean stem weight between seedlings
that receive no nitrogen and those that receive 368
Compute a 90% two-sided confidence interval for the ppm of nitrogen. Assume the populations to be nor-
difference between the average running times of films mally distributed with equal variances.
produced by the two companies. Assume that the 7-28. Ten engineering schools in the United States
running-time differences are approximately normally were surveyed. The sample contained 250 electrical
distributed with unequal variances. engineers, 80 being women; 175 chemical engineers, 40
being women. Compute a 90% two-sided confidence
interval for the difference between the proportions of
women in these two fields of engineering. Is there a
7-27. In a study conducted at Virginia Tech on the significant difference between the two proportions?
development of ectomycorrhizal, a symbiotic relation-
7-29. A survey of 1000 students found that 274 chose
ship between the roots of trees and a fungus, in which
minerals are transferred from the fungus to the trees professional baseball team A as their favorite team. In
and sugars from the trees to the fungus, 20 north- a similar survey involving 760 students, 240 of them
ern red oak seedlings exposed to the fungus Pisolithus chose team A as their favorite. Compute a 95% two-
tinctorus were grown in a greenhouse. All seedlings sided confidence interval for the difference between the
were planted in the same type of soil and received the proportions of students favoring team A in the two
same amount of sunshine and water. Half received surveys. Is there a significant difference?
no nitrogen at planting time, to serve as a control, 7-30. Construct a 90% two-sided confidence interval
and the other half received 368 ppm of nitrogen in the for σ12 /σ22 in Exercise 7-26. Should we have assumed
form NaNO3 . The stem weights, in grams, at the end σ12 = σ22 in constructing our confidence interval for
of 140 days were recorded as follows: µI = µII ?

130
8. Test of Hypothesis for a Single Population
Often, the problem confronting the scientist or engineer is not so much the estimation of a
population parameter, as discussed in Chapter 7, but rather the formation of a data-based
decision procedure that can produce a conclusion about some scientific system. For example,
a medical researcher may decide on the basis of experimental evidence whether coffee drinking
increases the risk of cancer in humans; an engineer might have to decide on the basis of sample
data whether there is a difference between the accuracy of two kinds of gauges; or a sociologist
might wish to collect appropriate data to enable him or her to decide whether a person’s
blood type and eye color are independent variables. In each of these cases, the scientist or
engineer postulates or conjectures something about a system. In addition, each must make
use of experimental data and make a decision based on the data. In each case, the conjecture
can be put in the form of a statistical hypothesis. Procedures that lead to the acceptance or
rejection of statistical hypotheses such as these comprise a major area of statistical inference.

Learning Objectives
After careful study of this chapter, you should be able to do the following:

1. Structure engineering decision-making problems as hypothesis tests

2. Test hypotheses on the mean of a normal distribution using either a z-test or a t-test
procedure

3. Test hypotheses on a population proportion

4. Test hypotheses on the variance or standard deviation of a normal distribution

5. Use the p-value approach for making decisions in hypothesis tests

6. Explain and use the relationship between confidence intervals and hypothesis tests

8.1. Hypothesis Testing


8.1.1. Statistical Hypothesis
So far statistical inferences about an unknown population parameter θ have been based upon
the calculation of a point estimate and the construction of a confidence interval. An additional
methodology discussed in this chapter is hypothesis testing, which allows an experimenter to
assess the plausibility or credibility of a specific statement. The statements are called hypothe-
ses, and the decision-making procedure is called hypothesis testing. This is one of the most
useful aspects of statistical inference, because many types of decision-making problems, tests,
or experiments in the engineering world can be formulated as hypothesis-testing problems.
Furthermore, as we will see, a very close connection exists between hypothesis testing and
confidence intervals.
Statistical hypothesis testing and confidence interval estimation of parameters are the fun-
damental methods used at the data analysis stage of a comparative experiment in which the
engineer is interested, for example, in comparing the mean of a population to a specified value.
These simple comparative experiments are frequently encountered in practice and provide a

131
8. Test of Hypothesis for a Single Population

good foundation for the more complex experimental design problems that we will discuss in
later chapters.
Definition 8.1
A statistical hypothesis is a statement about the parameters of a population.

Because we use probability distributions to represent populations, a statistical hypothesis


may also be thought of as a statement about the probability distribution of a random variable.
The hypothesis will usually involve one or more parameters of this distribution.
The truth or falsity of a statistical hypothesis is never known with absolute certainty unless
we examine the entire population. This, of course, would be impractical in most situations.
Instead, we take a random sample from the population of interest and use the data contained
in this sample to provide evidence that either supports or does not support the hypothesis.
Evidence from the sample that is inconsistent with the stated hypothesis leads to a rejection
of the hypothesis.
It should be made clear to the reader that the decision procedure must include an awareness
of the probability of a wrong conclusion. For example, suppose that the hypothesis postulated
by the engineer is that the fraction defective p in a certain process is 0.10. The experiment
is to observe a random sample of the product in question. Suppose that 100 items are tested
and 12 items are found defective. It is reasonable to conclude that this evidence does not
refute the condition that the binomial parameter p = 0.10, and thus it may lead one not to
reject the hypothesis. However, it also does not refute p = 0.12 or perhaps even p = 0.15.
As a result, the reader must be accustomed to understanding that rejection of a hypothesis
implies that the sample evidence refutes it. Put another way, rejection means that there is a
small probability of obtaining the sample information observed when, in fact, the hypothesis
is true. For example, for our proportion-defective hypothesis, a sample of 100 revealing 20
defective items is certainly evidence for rejection. Why? If, indeed, p = 0.10, the probability
of obtaining 20 or more defectives is
100
100
!
(0.10)x (0.90)100−x ≈ 0.002
X

x=20
x

With the resulting small risk of a wrong conclusion, it would seem safe to reject the hypothesis
that p = 0.10. In other words, rejection of a hypothesis tends to all but “rule out” the
hypothesis. On the other hand, it is very important to emphasize that failure to reject does
not rule out other possibilities. As a result, the firm conclusion is established by the data
analyst when a hypothesis is rejected.

Null and Alternative Hypotheses


An experimenter may be interested in the plausibility of the statement µ = 20, say. In other
words, an experimenter may be interested in the plausibility that the population mean is equal
to a specific fixed value. If this fixed value is denoted by µ0 , then the experimenter’s statement
may formally be described by a null hypothesis

H0 : µ = µ0

The word hypothesis indicates that this statement will be tested with an appropriate data set.
It is useful to associate a null hypothesis with an alternative hypothesis, which is defined
to be the “opposite” of the null hypothesis. The alternative hypothesis usually represents the
question to be answered or the theory to be tested, and thus its specification is crucial. The
null hypothesis above has an alternative hypothesis

H1 : µ 6= µ0

132
8.1. Hypothesis Testing

This is known as a two-sided problem since the alternative hypothesis concerns values of µ
both larger and smaller than µ0 . In a one-sided problem the experimenter allows the null
hypotheses to be broader so as to indicate that the specified value µ0 provides either an upper
or a lower bound for the population mean µ.
With one-sided sets of hypotheses, considerable care needs to be directed toward deciding
which should be the null hypothesis and which should be the alternative hypothesis. For
instance, in the fabric absorption example, why not take the null hypothesis to be that the
cotton fabric is suitable for dyeing? This matter is addressed with the discussion of p-values.
Example 8.1
State the null and alternative hypotheses for each problem.

(a) The mean pull-off force of a connector depends on cure time. An experiment is per-
formed to demonstrate that the pull-off force is below 25 newtons.
H0 : µ ≥ 25
H1 : µ < 25

(b) A textile fiber manufacturer is investigating a new drapery yarn, which the company
claims has a mean thread elongation of (at least) 12 kilograms.
H0 : µ ≤ 12
H1 : µ > 12

(c) A manufacturer claims that its cars achieve an average of at least 35 miles per gallon in
highway driving. A consumer interest group tests this claim by driving a random selection
of the cars in highway conditions and measuring their fuel efficiency.
H0 : µ ≥ 35
H1 : µ < 35

(d) Suppose that a fabric is unsuitable for dyeing if its water pickup is less than 55%. Is
the cottonfabric under consideration suitable for dyeing
H0 : µ ≤ 55%
H1 : µ > 55%

(e) The machine that produces metal cylinders is set to make cylinders with a diameter of
50 mm. Is it calibrated correctly?
Regardless of the machine setting there is always some variation in the cylinders produced,
so it makes sense to conclude that the machine is calibrated correctly if the mean cylinder
diameter µ is equal to the set amount.
H0 : µ = 50
H1 : µ 6= 50

(f) A supplier claims that its products made from a graphite-epoxy composite material have
a tensile strength of 40. An experimenter may test this claim by collecting a random sample
of products and measuring their tensile strengths.
H0 : µ = 40
H1 : µ 6= 40

133
8. Test of Hypothesis for a Single Population

As the reader gains more understanding of hypothesis testing, he or she should note that
the analyst arrives at one of the two following conclusions:
reject H0 in favor of H1 because of sufficient evidence in the data

fail to reject H0 because of insufficient evidence in the data.

8.1.2. Testing a Hypothesis


To illustrate the concepts used in testing a statistical hypothesis about a population, we present
the following example. A certain type of cold vaccine is known to be only 25% effective after
a period of 2 years. To determine if a new and somewhat more expensive vaccine is superior
in providing protection against the same virus for a longer period of time, suppose that 20
people are chosen at random and inoculated. (In an actual study of this type, the participants
receiving the new vaccine might number several thousand. The number 20 is being used here
only to demonstrate the basic steps in carrying out a statistical test.) If more than 8 of those
receiving the new vaccine surpass the 2-year period without contracting the virus, the new
vaccine will be considered superior to the one presently in use. The requirement that the
number exceed 8 is somewhat arbitrary but appears reasonable in that it represents a modest
gain over the 5 people who could be expected to receive protection if the 20 people had been
inoculated with the vaccine already in use. We are essentially testing the null hypothesis that
the new vaccine is equally effective after a period of 2 years as the one now commonly used.
The alternative hypothesis is that the new vaccine is in fact superior. This is equivalent to
testing the hypothesis that the binomial parameter for the probability of a success on a given
trial is p = 1/4 against the alternative that p > 1/4. This is usually written as follows:

H0 : p = 0.25
H1 : p > 0.25

Test Statistic
The test statistic on which we base our decision is X, the number of individuals in our test
group who receive protection from the new vaccine for a period of at least 2 years. The possible
values of X, from 0 to 20, are divided into two groups: those numbers less than or equal to
8 and those greater than 8. All possible scores greater than 8 constitute the critical region.
The last number that we observe in passing into the critical region is called the critical value.
In our illustration, the critical value is the number 8. Therefore, if x > 8, we reject H0 in favor
of the alternative hypothesis H1 . If x ≤ 8, we fail to reject H0 .

Decision Errors
The decision procedure just described could lead to either of two wrong conclusions. For
instance, the new vaccine may be no better than the one now in use (H0 true) and yet, in
this particular randomly selected group of individuals, more than 8 surpass the 2-year period
without contracting the virus. We would be committing an error by rejecting H0 in favor of
H1 when, in fact, H0 is true. Such an error is called a type I error.
A second kind of error is committed if 8 or fewer of the group surpass the 2-year period
successfully and we are unable to conclude that the vaccine is better when it actually is better
(H1 true). Thus, in this case, we fail to reject H0 when in fact H0 is false. This is called a
type II error.
In testing any statistical hypothesis, there are four possible situations that determine whether
our decision is correct or in error. These four situations are summarized in the Table 8.1.
The probability of committing a type I error, also called the level of significance, is
denoted by α. In our illustration, a type I error will occur when more than 8 individuals

134
8.1. Hypothesis Testing

Table 8.1: Possible Situations for Testing a Statistical Hypothesis


H0 is true H0 is false
Do not reject H0 Correct decision Type II error
reject H0 Type I error Correct decision

inoculated with the new vaccine surpass the 2-year period without contracting the virus and
researchers conclude that the new vaccine is better when it is actually equivalent to the one in
use. Hence, if X is the number of individuals who remain free of the virus for at least 2 years,
20
20
!
α = P[type I error] = P X > 8 p = 0.25 = (0.25)x (0.75)20−x = 0.0409
  X

x=9
x

We say that the null hypothesis, p = 1/4, is being tested at the α = 0.0409 level of significance.
Sometimes the level of significance is called the size of the test. A critical region of size 0.0409
is very small, and therefore it is unlikely that a type I error will be committed. Consequently,
it would be most unusual for more than 8 individuals to remain immune to a virus for a 2-year
period using a new vaccine that is essentially equivalent to the one now on the market.

p-value
The plausibility of a null hypothesis is measured with a p-value, which is a probability that
takes a value between 0 and 1. A p-value is sometimes referred to as the observed level of
significance. A p-value is constructed from a data set as illustrated in Figure 8.1. A useful way
of interpreting a p-value is to consider it as the plausibility or credibility of the null hypothesis.
The p-value is directly proportional to the plausibility of the null hypothesis, so that
The smaller the p-value, the less plausible is the null hypothesis.

Hypotheses
H 0 , H1

p-value
plausibility of H0 based on
the data set x1 , x2 , . . . , xn

Data Set
x1 , x2 , . . . , xn

Figure 8.1: p-value construction

Figure 8.2 shows how an experimenter can interpret different levels of a p-value. If the
p-value is very small, less than 1% say, then an experimenter can conclude that the null
hypothesis is not a plausible statement. In other words, a p-value less than 0.01 indicates to
the experimenter that the null hypothesis H0 is not a credible statement. The experimenter can
then consider the alternative hypothesis H1 to be true. In such situations, the null hypothesis
is said to have been rejected in favor of the alternative hypothesis.
H0 H0
not plausible plausible
...
0 0.01 0.10 1.0

Figure 8.2: p-value interpretation

If a p-value larger than 10% is obtained, then an experimenter should conclude that there is
no substantial evidence that the null hypothesis is not a plausible statement. In other words,

135
8. Test of Hypothesis for a Single Population

a p-value larger than 0.10 implies that there is no substantial evidence that the null hypothesis
H0 is false. The experimenter has learned that the null hypothesis is a credible statement
based upon the fact that there is no strong “inconsistency” between the data set and the null
hypothesis.
It is important to realize that when a p-value larger than 0.10 is obtained, the experimenter
should not conclude that the null hypothesis has been proven. If a null hypothesis is accepted,
then this simply means that the null hypothesis is a plausible statement. However, there will
be many other plausible statements and consequently many other different null hypotheses
that can also be accepted. The acceptance of a null hypothesis therefore indicates that the
data set does not provide enough evidence to reject the null hypothesis, but it does not indicate
that the null hypothesis has been proven to be true.
A p-value in the range 1%–10% is in an intermediate area. There is some evidence that
the null hypothesis is not plausible, but the evidence is not overwhelming. In a sense the
experiment is inconclusive but suggests that perhaps a further look at the problem is warranted.
If it is possible, the experimenter may wish to collect more information, that is, a larger data
set, to help clarify the matter. Sometimes a cutoff value of 0.05 is employed and the null
hypothesis is accepted if the p-value is larger than 0.05 and is rejected if the p-value is smaller
than 0.05.
With a two-sided hypothesis testing problem

H0 : µ = µ0 versus H1 : µ 6= µ0

rejection of the null hypothesis allows the experimenter to conclude that µ 6= µ0 . Acceptance
of the null hypothesis indicates that µ0 is a plausible value of µ, together with many other
plausible values. Acceptance of the null hypothesis does not prove that µ is equal to µ0 .
With the one-sided hypothesis testing problem

H0 : µ = µ0 versus H1 : µ > µ0

rejection of the null hypothesis allows the experimenter to conclude that µ > µ0 . Acceptance
of the null hypothesis, however, indicates that it is plausible that µ ≤ µ0 , but that this has
not been proven. Consequently, it is seen that the “strongest” inference is available when the
null hypothesis is rejected.
The preceding consideration is important when an experimenter decides which should be
the null hypothesis and which should be the alternative hypothesis for one-sided problems. In
order to “prove” or establish the statement µ > µ0 it is necessary to take it as the alternative
hypothesis. It can then be established by demonstrating that its opposite µ ≤ µ0 is implausible.

A null hypothesis cannot be proven to be true; it can only be shown to be implausible.

For the problem in Example 8.1(f), the responsibility is on the experimenter to disprove the
supplier’s claim that µ = 40. That is why it is appropriate to take the null hypothesis as

H0 : µ = 40

A small p-value will demonstrate that this null hypothesis is not plausible and consequently
will establish that the supplier’s claim is not credible. If the p-value is not small, then the
experimenter must conclude that there is not enough evidence to disprove the supplier’s claim.
It may be helpful to realize that the supplier is being given the benefit of the doubt or, putting
it in legal terms, the supplier is “innocent” until proven “guilty.” In this sense, “guilt” (the
alternative hypothesis H1 : µ 6= 40) is established by showing that the supplier’s “innocence”
(the null hypothesis) is implausible. If “innocence” is plausible (a large p-value), then the null
hypothesis is accepted and the supplier is acquitted. The important point is that the acquittal
is as a result of the failure to prove guilt, and not as a result of a proof of innocence.

136
8.1. Hypothesis Testing

Definition 8.2
The p-value is the probability of obtaining this data set or worse when the null hypothesis
is true.

A “worse” data set is one for which the null hypothesis is less plausible than it is for the
actual observed data set. The phrase when the null hypothesis is true indicates that the
probability calculation is made under the assumption that the null hypothesis is true, which
in practice means calculating a probability under the assumption that µ = µ0 .
This definition of a p-value explains the interpretation of p-values discussed. A p-value
smaller than some specified level of significance α reveals that if the null hypothesis H0 is true,
then the chance of observing the kind of data observed (or “worse”) is less than α. If the null
hypothesis is true, then it is unlikely that the experimenter obtains the kind of data set that
has been obtained. It is this argument that leads the experimenter to conclude that the null
hypothesis is implausible.
On the other hand, a p-value larger than α reveals that if the null hypothesis H0 is true,
then the chance of observing the kind of data observed is at least α. In other words, if the
null hypothesis is true, then it is not at all unlikely that the experimenter obtains the kind of
data set that has been obtained. Consequently, the null hypothesis is a plausible statement
and should be “accepted.”

8.1.3. Connection Between Hypothesis Tests and Confidence Intervals


A close relationship exists between the test of a hypothesis about any parameter, say θ, and
the confidence interval for θ. If [u, v] is a 100(1 − α)% confidence interval for the parameter θ,
the test of size α of the hypothesis
H0 : θ = θ0
H1 : θ 6= θ0
will lead to rejection of H0 if and only if θ0 is not in the 100(1 − α)% confidence interval [u, v].

8.1.4. General Procedure for Hypothesis Tests


This chapter develops hypothesis-testing procedures for many practical problems. Use of the
following sequence of steps in applying hypothesis-testing methodology is recommended.

6-step Hypothesis Test

A. Hypotheses: State the null and alternative hypotheses.

B. Level of significance: State the level of significance α.

C. Test statistic: Determine an appropriate test statistic.

D. Computation: Compute any necessary sample quantities, the value of the test
statistic, and the p-value.

E. Decision: Decide whether or not H0 should be rejected.

F. Conclusion: Report the decision in the problem context.

Statistical Versus Practical Significance


We noted previously that reporting the results of a hypothesis test in terms of a p-value is
very useful because it conveys more information than just the simple statement “reject H0 ”

137
8. Test of Hypothesis for a Single Population

or “fail to reject H0 .” That is, rejection of H0 at the 0.05 level of significance is much more
meaningful if the value of the test statistic is well into the critical region, greatly exceeding
the 5% critical value, than if it barely exceeds that value.
Even a very small p-value can be difficult to interpret from a practical viewpoint when
we are making decisions because, although a small p-value indicates statistical significance in
the sense that H0 should be rejected in favor of H1 , the actual departure from H0 that has
been detected may have little (if any) practical significance (engineers like to say “engineering
significance”). This is particularly true when the sample size n is large.

8.2. Test on the Mean of a Population


8.2.1. Variance Known
In this section, we consider hypothesis testing about the mean µ of a single normal population
where the variance of the population σ 2 is known. We will assume that a random sample
X1 , X2 , . . . , Xn has been taken from the population. Based on our previous discussion, the
sample mean X is an unbiased point estimator of µ with variance σ 2 /n.
Suppose that we wish to test the hypotheses

H0 : µ = µ 0 H1 : µ 6= µ0

where µ0 is a specified constant. We have a random sample X1 , X2 , . . . , Xn from a normal


population. Because X has a normal distribution (i.e., the sampling distribution of X is

normal) with mean µ0 and standard deviation σ/ n if the null hypothesis is true, we could
calculate a p-value based on the computed value of the sample mean X.
It is usually more convenient to standardize the sample mean and use a test statistic based
on the standard normal distribution. That is, the test procedure for H0 : µ = µ0 uses the test
statistic:
X − µ0
Z= √ (8.1)
σ/ n
If the null hypothesis H0 : µ = µ0 is true, E X = µ0 , and it follows that the distribution of Z
 

is the standard normal distribution.


The hypothesis testing procedure is as follows. Take a random sample of size n and compute
the value of the sample mean x. To test the null hypothesis using the p-value approach, we
would find the probability of observing a value of the sample mean that is at least as extreme
as x , given that the null hypothesis is true. The standard normal z-value that corresponds to
x is found from the test statistic in Equation (8.1).
x − µ0
z= √
σ/ n

The probability we are seeking is P[Z > |z|]. The reason that |z| is used is that the value of z
could be either positive or negative, depending on the observed sample mean. Because this is
a two-tailed test, this is only one-half of the p-value. Therefore, for the two- sided alternative
hypothesis, the p-value is
p-value = 2 · P[Z < −|z|] (8.2)
Now let’s consider the one-sided alternatives. Suppose that we are testing

H0 : µ = µ0 H1 : µ > µ0

Once again, suppose that we have a random sample of size n and that the sample mean is
x . We compute the test statistic from Equation (8.1) and obtain z. Because the test is an
upper-tailed test, only values of x that are greater than µ0 are consistent with the alternative

138
8.2. Test on the Mean of a Population

hypothesis. Therefore, the p-value would be the probability that the standard normal random
variable is greater than the value z. This p-value is computed as

p-value = P[Z > z] (8.3)

The lower-tailed test involves the hypotheses

H0 : µ = µ0 H1 : µ < µ0

Suppose that we have a random sample of size n and that the sample mean is x. We compute
the test statistic from Equation (8.1) and obtain z. Because the test is a lower-tailed test, only
values of x that are less than µ0 are consistent with the alternative hypothesis. Therefore, the
p-value would be the probability that the standard normal random variable is less than the
value z. This p-value is computed as

p-value = P[Z < z] (8.4)

The reference distribution for this test is the standard normal distribution. The test is
usually called a z-test.
We can also use the fixed significance level approach with the z-test. The only thing we have
to do is determine where to place the critical regions for the two-sided and one-sided alternative
hypotheses. First consider the two-sided alternative H1 : µ 6= µ0 . Now if H0 : µ = µ0 is true,
the probability is 1 − α that the test statistic Z falls between −zα/2 and zα/2 . The regions
associated with zα/2 and −zα/2 are illustrated in Figure 8.3(a). Note that the probability is
α that the test statistic Z will fall in the region Z > zα/2 or Z < −zα/2 , H0 : µ = µ0 is true.
Clearly, a sample producing a value of the test statistic that falls in the tails of the distribution
of Z would be unusual if H0 : µ = µ0 is true; therefore, it is an indication that H0 is false.
Thus, we should reject H0 if either

z > zα/2 or z < −zα/2

and we should fail to reject H0 if


−zα/2 ≤ z ≤ zα/2
We may also develop fixed significance level testing procedures for the one-sided alternatives.
Consider the upper-tailed case H1 : µ > µ0 . In defining the critical region for this test, we
observe that a negative value of the test statistic Z would never lead us to conclude that
H0 : µ = µ0 is false. Therefore, we would place the critical region in the upper tail of the
standard normal distribution and reject H0 if the computed value z is too large. Refer to
Figure 8.3(b). That is, we would reject H0 if

z > zα

Similarly, to test the lower-tailed case H1 : µ < µ0 , we would calculate the test statistic z
and reject H0 if the value of z is too small. That is, the critical region is in the lower tail of
the standard normal distribution as in Figure 8.3(c), and we reject H0 if

z < −zα

Example 8.2
A random sample of 100 recorded deaths in the United States during the past year showed
an average life span of 71.8 years. Assuming a population standard deviation of 8.9 years,
does this seem to indicate that the mean life span today is greater than 70 years? Use a
0.05 level of significance.

139
8. Test of Hypothesis for a Single Population

Two-tailed test Upper-tailed test Lower-tailed test

−zα/2 zα/2 zα −zα


(a) (b) (c)

Figure 8.3: Critical regions for fixed significance level α

A. H0 : µ = 70
H1 : µ > 70

B. α = 0.05
X−µ
C. Z = √
σ/ n
x − µ0 71.8 − 70
D. z = √ = √ = 2.0225
σ/ n 8.9/ 100
p = P[Z > 2.02] = 0.021 692

E. Reject H0 since p < 0.05.

F. There is evidence that the mean life span today exceeds 70 years. The corresponding
p-value of the data set is p = 0.0217.

It is worth noting that the null hypothesis H0 can not be rejected at a fixed level of signifi-
cance α = 0.01.
Example 8.3
A manufacturer of sports equipment has developed a new synthetic fishing line that the
company claims has a mean breaking strength of 8 kilograms with a standard deviation
of 0.5 kilogram. Test the hypothesis that µ = 8 kilograms against the alternative that
µ 6= 8 kilograms if a random sample of 50 lines is tested and found to have a mean breaking
strength of 7.8 kilograms. Use a 0.01 level of significance.

A. H0 : µ = 8
H1 : µ 6= 8

B. α = 0.01
X−µ
C. Z = √
σ/ n
x̄ − µ0 7.8 − 8
D. z = √ = √ = −2.8284
σ/ n 0.5/ 50
p = 2 · P[Z < −| − 2.83|] = 2(0.002 327) = 0.004 654

E. Reject H0 since p < 0.01.

F. Based on a random sample of size 50, there is sufficient evidence that the mean
breaking strength of a new synthetic fishing line is different from 8 kg with p = 0.0047.
The data suggests that the mean breaking strength to be less than 8 kg.

140
8.2. Test on the Mean of a Population

8.2.2. Variance Unknown


One would certainly suspect that tests on a population mean µ with σ 2 unknown, like confi-
dence interval estimation, should involve the use of Student t-distribution. Strictly speaking,
the application of Student t for both confidence intervals and hypothesis testing is developed
under the following assumptions. The random variables X1 , X2 , . . . , Xn represent a random
sample from a normal distribution with unknown µ and σ 2 . Then the random variable
X−µ
T = √ (8.5)
S/ n
has a Student t-distribution with ν = n − 1 degrees of freedom. The structure of the test is
identical to that for the case of σ known, with the exception that the value σ in the test statistic
is replaced by the computed estimate s and the standard normal distribution is replaced by a
t-distribution.
The reader should recall from the previous chapter that the t-distribution is symmetric
around the value zero. Thus, this two-tailed critical region applies in a fashion similar to
that for the case of known σ. For the two-sided alternative hypothesis at significance level α,
the two-tailed critical regions apply. For H1 : µ > µ0 , rejection results when t > tα,ν . For
H1 : µ < µ0 , the critical region is given by t < −tα,ν .
Example 8.4
A supplier claims that its products made from a graphite-epoxy composite material have
a tensile strength of 40. When the tensile strengths of 30 randomly selected products are
measured, the mean x = 39.018 and standard deviation s = 2.299 are obtained. Test the
hypothesis against H1 : µ < 40 at 5% level of significance.

A. H0 : µ = 40
H1 : µ < 40

B. α = 0.05
X − µ0
C. T(ν) = √ , ν =n−1
S/ n
x − µ0 39.018 − 40
D. t = √ = √ = −2.33955
s/ n 2.299/ 30
π
√ Z
cos28 x dx = 2.528 326 236a
2
k = 2 29
0 !−15
x2
Z 0
h i 1
p = P T(29) < −2.33955 = 0.5 − +1 dx = 0.013 201
k −2.33955 29

E. Reject H0 since p < 0.05.

F. The sample shows that the mean tensile strength is not 40 as claimed by the supplier.
The mean tensile strength is less than 40, with a p-value of p = 0.013201.
a
see Equation (D-13) of the appendix

Using the fixed significance level test, the critical region for the rejection of the null hypoth-
esis is t < −t0.05,29 where t0.05,29 = 1.699 from Table A-2. Since the observed t value is less
than −1.699, the decision is to reject H0 in favor of H1 .
Example 8.5
A growing concern of employers is time spent in activities like surfing the Internet and e-
mailing friends during work hours. The San Luis Obispo Tribune summarized the findings
from a survey of a large sample of workers in an article that ran under the headline “Who

141
8. Test of Hypothesis for a Single Population

Goofs Off 2 Hours a Day? Most Workers, Survey Says” (August 3, 2006). Suppose that
the company wants to determine whether the average amount of wasted time during an
8-hour work day for its employees is less than the reported 120 minutes. Each person in
a random sample of 10 employees was contacted and asked about daily wasted time at
work. (Participants would probably have to be guaranteed anonymity to obtain truthful
responses!) The resulting data are the following:

108 112 117 130 111 131 113 113 105 128

Do these data provide evidence that the mean wasted time for this company is less than
120 minutes? Carry out a hypothesis test with α = 0.05.

A. H0 : µ = 120
H1 : µ < 120

B. α = 0.05
X − µ0
C. T(ν) = √
S/ n
D. x = 116.8, s = 9.45
x − µ0
t= √ = −1.0709
s/ n
√ Z π2
k=2 9 cos8 x dx = 2.577 087 724
0 !−5
x2
Z 0
h i 1
p = P T(9) < −1.0709 = 0.5 − +1 dx = 0.156 046
k −1.0709 9

E. Because p-value > 0.05, we do not reject H0 .

F. There is not sufficient evidence to conclude that the mean wasted time per 8-hour
work day for employees at this company is less than 120 minutes.

Computer Printout for t-test


It should be of interest for the reader to see an annotated computer printout showing the
result of a single-sample t-test. The computer software R Software is used to perform a t-test
for the data set in Example 8.5. The printout is shown in Figure 8.4. The mean x is 116.8, the
computed value of the test statistic is t = −1.0709 and the p-value is p = 0.156 at 9 degrees
of freedom. The output also gives the 95% one-sided confidence interval with an upper bound
122.2776.
> wasted.time <- c(108,112,117,130,111,131,113,113,105,128)
> t.test(wasted.time, mu=120, alternative="less", conf.level=0.95)

One Sample t-test

data: wasted.time
t = -1.0709, df = 9, p-value = 0.156
alternative hypothesis: true mean is less than 120
95 percent confidence interval:
-Inf 122.2776
sample estimates:
mean of x
116.8

Figure 8.4: R Software printout for a one-sample t-test

142
8.2. Test on the Mean of a Population

Example 8.6
The sodium content of twenty 300-gram boxes of organic cornflakes was determined. The
data (in milligrams) are as follows: 131.15, 130.69, 130.91, 129.54, 129.64, 128.77, 130.72,
128.33, 128.24, 129.65, 130.14, 129.29, 128.71, 129.00, 129.39, 130.42, 129.53, 130.12, 129.78,
130.92. Can you support a claim that mean sodium content of this brand of cornflakes differs
from 130 milligrams at α = 0.05 level of significance? Refer to the printout in Figure 8.5.
The mean sodium content of the twenty 300-gram boxes of organic flakes is 129.747. The
value of the test statistic is t = −1.291, and a p-value p = 0.2122 at ν = 19 degrees of
freedom. Based on these results, there is insufficient evidence to support a claim that the
mean sodium content differs from 130 milligrams.

> t.test(sodium,mu=130,alternative="two",conf.level=0.95)

One Sample t-test

data: sodium
t = -1.291, df = 19, p-value = 0.2122
alternative hypothesis: true mean is not equal to 130
95 percent confidence interval:
129.3368 130.1572
sample estimates:
mean of x
129.747

Figure 8.5: Printout for Example 8.6.

One-Sample Statistics
N Mean Std. Deviation S.E. Mean
pH 10 7.07 .04 .01

One-Sample Test
Test Value = 7
95% Confidence Interval
Sig. (2- Mean of the Difference
t df tailed) Difference Lower Upper
pH 1.80 9 .106 .03 -.01 .06

Figure 8.6: PSPP printout for a one-sample t-test

Example 8.7
Suppose that an engineer is interested in testing the bias in a pH meter. Data are collected
on a neutral substance (pH = 7.0). A sample of the measurements were taken with the data
as follows:

7.07 7.00 7.10 6.97 7.00 7.03 7.01 7.01 6.98 7.08

It is, then, of interest to test

H0 : µ = 7.0
H1 : µ 6= 7.0

Interpret the output of the software package PSPP shown in Figure 8.6.

143
8. Test of Hypothesis for a Single Population

The data set yields a mean x = 7.07 and a standard deviation s = 0.04. The test statistic
t = 1.80 yields a p-value p = 0.106 which suggests that results are inconclusive. There is no
strong evidence suggesting a rejection of H0 (based on α value of 0.5 or 0.10). We certainly
cannot truly conclude that the pH meter is unbiased.

8.3. Test on the Population Proportion


An observation x from a random variable X with a binomial distribution can be used to test
a hypothesis concerning the success probability p. A two-sided hypothesis testing problem
would be
H0 : p = p0 H1 : p 6= p0
for a particular fixed value p0 . This is appropriate if an experimenter wishes to determine
whether there is significant evidence that the success probability is different from p0 . One-
sided sets of hypotheses
H0 : p = p0 H1 : p < p0
and
H0 : p = p0 H1 : p > p0
can also be used.
The p-values for these hypothesis tests can be calculated using the cumulative distribution
function of the binomial distribution, which for reasonably large values of n can be approxi-
mated by a normal distribution. If the normal approximation is employed, then
P − p0
Z=q
P (1−P )
n

is taken to have approximately a standard normal distribution, so that if P = p0 , the “z-


statistic”
P − p0
q
p0 (1−p0 )
n
can be taken to be an observation from a standard normal distribution. Notice that the
hypothesized value p0 is used inside the square root term of the expression, and that when the
top and the bottom of the expression are multiplied by n, the z-statistic can be rewritten as
X − np0
Zp (8.6)
np0 (1 − p0 )
The exact p-value for the two-sided hypothesis testing problem is usually calculated as

2 · P[X ≥ x|p = p0 ] , x ≥ np0


(
P[−] value =
2 · P[X ≤ x|p = p0 ] , x ≤ np0

where the random variable X has a binomial distribution. Notice that under the null hypothesis
H0 , the expected value of the number of successes is np0 . Consequently, “worse” in the
definition of the p-value means values of the random variable X farther away from np0 than
is observed. This is values larger than x when x > np0 (p̂ > p0 ), and values smaller than x
when x < np0 (p̂ < p0 ). The tail probabilities of the binomial distribution

P[X ≥ x] and P[X ≤ x]

are then multiplied by 2 since it is a two-sided problem with the alternative hypothesis
H1 : p 6= p0 allowing values of p both smaller and larger than p0 . Of course if p̂ = p0 , then the

144
8.3. Test on the Population Proportion

p-value can be taken to be equal to 1, and there is clearly no evidence that the null hypothesis
is not plausible.
When the normal approximation to the distribution of p̂ is appropriate (in this case it can be
considered to be acceptable as long as np0 and n(1 − p0 ) are both larger than 5), the statistic
p̂ − p0 x − np0
z=q =p
p0 (1−p0 ) np0 (1 − p0 )
n
is calculated, and the p-value is
2 · P[Z > z] z>0
(
p-value =
2 · P[Z < z] z<0
where the random variable Z has the standard normal distribution. In either case, the p-value
can be written as
p-value = 2 · P[Z < −|z|]
The normal approximation can be improved by employing a continuity correction of 0.5 in
the numerator of the z-statistic. If x − np0 > 0.5, a z-statistic
x − np0 − 0.5
z=p
np0 (1 − p0 )
can be used, and if x − np0 < −0.5 a z-statistic
x − np0 + 0.5
z=p
np0 (1 − p0 )
can be used. Notice that the continuity correction serves to bring the value of the z-statistic
closer to 0. The effect of employing the continuity correction becomes less important as the
sample size n gets larger.
Example 8.8
A biologist is interested in whether opossums give birth to male and female progeny with
equal probabilities. A group of opossums is observed, and out of 23 births, 14 are male and
9 are female. Test the hypothesis at 5% level of significance.

A. H0 : p = 0.5
H1 : p 6= 0.5

B. α = 0.05

C. X, the number of male offsprings


23
23
!
D. p = 2 · P[X ≥ 14] = 2 (0.5)x (0.5)23−x = 0.4049 since x > np0
X

x=14
x

E. Do not reject H0 .

F. With such large p-value p = 0.4049 there is no reason to doubt the validity of the null
hypothesis. Based on this data set the biologist realizes that there is not sufficient
evidence to conclude that male and female births are not equally likely.
Since np0 = n(1 − p0 ) = 23(0.5) = 11.5 > 5, a normal approximation to the distribution of
X should be reasonable. The value of the z-statistic with continuity correction is
x − np0 − 0.5 14 − 23(0.5) − 0.5
z=p = p = 0.8341
np0 (1 − p0 ) 23(0.5)(2.5)
which gives a p-value of
p = 2 · P[Z < −0.83] = 0.406 539

145
8. Test of Hypothesis for a Single Population

Example 8.9
A commonly prescribed drug for relieving nervous tension is believed to be only 60% effec-
tive. Experimental results with a new drug administered to a random sample of 100 adults
who were suffering from nervous tension show that 70 received relief. Is this sufficient ev-
idence to conclude that the new drug is superior to the one commonly prescribed? Use a
0.05 level of significance.

A. H0 : p = 0.60
H1 : p > 0.60

B. α = 0.05
X − np0
C. Z = p
np0 (1 − p0 )
x − np0 70 − 100(0.6)
D. z = p =p = 2.0412
np0 (1 − p0 ) 100(0.6)(0.4)
p = P[Z < −2.04] = 0.020 675

E. Reject H0

F. The sample provides sufficient evidence that the new drug is superior to the one
commonly prescribed, with a p-value p = 0.0207.

8.4. Test on the Variance of a Population


In this section, we are concerned with testing hypotheses concerning population variances or
standard deviations. Applications of one-sample tests on variances are certainly not difficult
to motivate. Engineers and scientists are con-fronted with studies in which they are required
to demonstrate that measurements involving products or processes adhere to specifications set
by consumers. The specifications are often met if the process variance is sufficiently small.
Attention is also focused on comparative experiments between methods or processes, where
inherent reproducibility or variability must formally be compared.
Let us first consider the problem of testing the null hypothesis H0 that the population
variance σ 2 equals a specified value σ02 against one of the usual alternatives σ 2 < σ02 , σ 2 > σ02 ,
or σ 2 6= σ02 . The appropriate statistic on which to base our decision is the chi-squared statistic
of Equation (6.7), which was used in Chapter 7 to construct a confidence interval for σ 2 .
Therefore, if we assume that the distribution of the population being sampled is normal, the
chi-squared value for testing σ 2 = σ02 , is given by

(n − 1)s2
x2 = (8.7)
σ02

where n is the sample size, s2 is the sample variance, and σ02 is the value of σ 2 given by the
null hypothesis. If H0 is true, x2 is a value of the chi-squared distribution with ν = n − 1
degrees of freedom. Hence, for a two-tailed test at the α-level of significance, the critical region
is χ2 < χ21−α/2,ν or χ2 > χ2α/2,ν . For the one-sided alternative σ 2 < σ02 , the critical region is
χ2 < χ21−α,ν , and for the one-sided alternative σ 2 > σ02 , the critical region is χ2 > χ2α,ν .
Example 8.10
An automated filling machine is used to fill bottles with liquid detergent. A random sample
of 20 bottles results in a sample variance of fill volume of s2 = 0.0153 (fluid ounces)2 . If the
variance of fill volume exceeds 0.01 (fluid ounces)2 , an unacceptable proportion of bottles
will be underfilled or overfilled. Is there evidence in the sample data to suggest that the

146
Exercises

manufacturer has a problem with underfilled or overfilled bottles? Use α = 0.05, and assume
that fill volume has a normal distribution.

A. H0 : σ 2 = 0.01
H1 : σ 2 > 0.01

B. α = 0.05
(n − 1)S 2
C. χ2(ν) =
σ02

(n − 1)s2 19(0.0153)
D. x2 = 2 = = 29.07
rσ0 0.01
18! π
k= = 86 376 969.03a
9! 217
1 29.07 √ 17 −x
Z
p = P χ2(ν) > x2 = 1 −
h i
x e dx = 0.064 892
k 0
E. Do not reject H0 .

F. With a p-value p = 0.065, there is no strong evidence that the variance of fill vol-
ume exceeds 0.01 (fluid ounces)2 . So there is no strong evidence of a problem with
incorrectly filled bottles.
a
see Equation (D-15) of the appendix

The critical region for the rejection of H0 is χ2 > 30.14 obtained from Table A-3.

Exercises
8-1. For the alternative hypothesis H1 : µ 6= 7 and 8-5. A manufacturer produces crankshafts for an au-
variance known, calculate the p-value for each of the tomobile engine. The crankshafts wear after 100,000
following test statistics. miles (0.0001 inch) is of interest because it is likely to
a) z = 2.05 b) z = −1.84 c) z = 0.4 have an impact on warranty claims. A random sample
of n = 15 shafts is tested and x = 2.78. It is known
8-2. For the alternative hypothesis H1 : µ > 10 and that σ = 0.9 and that wear is normally distributed.
variance known, calculate the p-value for each of the Test the hypotheses H0 : µ = 3 and H1 : µ 6= 3 using
following test statistics. α = 0.05
a) z = 2.05 b) z = −1.84 c) z = 0.4
8-6. The life in hours of a battery is known to be ap-
8-3. Output from a software package follows: proximately normally distributed with standard devi-
One-sample Z ation σ = 1.25 hours. A random sample of 10 batteries
has a mean life of x = 40.5 hours. Is there evidence to
Test of µ = 35 vs µ 6= 35
support the claim that battery life exceeds 40 hours?
The assumed standard deviation = 8
Use α = 0.05.
Variable N Mean StDev SE Mean z p
x 25 35.710 1.475 ? ? ? 8-7. Consider the following computer output:
Test and CI for One Proportion
a) Fill in the missing items. What conclusions would Test of p = 0.4 vs P[6=] 0.4
you draw? X N p̂ 95% CI z p
b) Is this a one-sided or a two-sided test? 98 275 ? (0.299759, 0.412968) ? ?
c) What would the p-value be if the alternative hy-
a) Is this a one-sided or a two-sided test?
pothesis is H1 : µ > 35?
b) Complete the missing items.
8-4. The mean water temperature downstream from
a discharge pipe at a power plant cooling tower should 8-8. Suppose that 500 parts are tested in manufac-
be no more than 100◦ F. Past experience has indicated turing and 10 are rejected. Test the hypothesis H0 :
that the standard deviation of temperature is 2◦ F. The p = 0.03 against H1 : p < 0.03 at α = 0.05. Find the
water temperature is measured on nine randomly cho- p-value.
sen days, and the average temperature is found to be 8-9. The advertised claim for batteries for cell phones
98◦ F. Perform the six-step hypothesis test procedure is set at 48 operating hours with proper charging pro-
at α = 0.05. cedures. A study of 5000 batteries is carried out and

147
8. Test of Hypothesis for a Single Population

15 stop operating prior to 48 hours. Do these ex- with hollow heads and very thin faces can result in
perimental results support the claim that less than much longer tee shots, especially for players of mod-
0.2 percent of the company’s batteries will fail dur- est skills. This is due partly to the “spring-like ef-
ing the advertised time period, with proper charging fect” that the thin face imparts to the ball. Firing a
procedures? Use a hypothesis-testing procedure with golf ball at the head of the club and measuring the
α = 0.01. ratio of the ball’s outgoing velocity to the incoming
8-10. For the hypothesis test H0 : µ = 10 against velocity can quantify this spring-like effect. The ra-
H1 : µ > 10 with variance unknown and n = 15, com- tio of velocities is called the coefficient of restitution
pute the p-value for each of the following test statis- of the club. An experiment was performed in which
tics. 15 drivers produced by a particular club maker were
selected at random and their coefficients of restitution
a) t = 2.05 b) t = −1.84 c) t = 0.4
measured. In the experiment, the golf balls were fired
8-11. Consider the following computer output: from an air cannon so that the incoming velocity and
One-sample T spin rate of the ball could be precisely controlled. It
Test of µ = 12 vs µ 6= 12 is of interest to determine whether there is evidence
Variable N Mean StDev SE Mean t p (with α = 0.05) to support a claim that the mean co-
x 10 12.564 ? 0.296 ? ? efficient of restitution exceeds 0.82. The observations
follow:
a) How many degrees of freedom are there on the 0.8411 0.8191 0.8182 0.8125 0.8750
t-test statistic? 0.8580 0.8532 0.8483 0.8276 0.7983
0.8042 0.8730 0.8282 0.8359 0.8660
b) Fill in the missing items.
What can you conclude based on the results of the
c) Calculate the p-value. What conclusions would
tests?
you draw?
8-15. Consider the test of H0 : σ 2 = 7 against
8-12. An article in the ASCE Journal of Energy Engi-
H1 : σ 2 6= 7. What are the critical values for the
neering (1999, Vol. 125, pp. 59-75) describes a study
test statistic χ2 for the following significance levels
of the thermal inertia properties of autoclaved aerated
and sample sizes?
concrete used as a building material. Five samples of
the material were tested in a structure, and the av- a) α = 0.01 and n = 20 b) α = 0.05 and n = 12
erage interior temperatures (◦ C) reported were as fol- c) α = 0.10 and n = 15
lows: 23.01, 22.22, 22.04, 22.62, and 22.59. Test the
8-16. Consider the test of H0 : σ 2 = 5 against
hypotheses H0 : µ = 22.5 versus H1 : µ 6= 22.5, using
α = 0.05. Find the p-value. H1 : σ 2 < 5. Calculate the p-value for each of the
following test statistics.
8-13. Cloud seeding has been studied for many
a) x2 = 25.2 and n = 20 b) x2 = 15.2 and n = 12
decades as a weather modification procedure. The
rainfall in acre-feet from 20 clouds that were selected c) x2 = 4.2 and n = 15
at random and seeded with silver nitrate follows: 18.0, 8-17. If the standard deviation of hole diameter ex-
30.7, 19.8, 27.1, 22.3, 18.8, 31.8, 23.4, 21.2, 27.9, ceeds 0.01 millimeters, there is an unacceptably high
31.9, 27.1, 25.0, 24.7, 26.9, 21.8, 29.2, 34.8, 26.7, and probability that the rivet will not fit. Suppose that
31.6. Can you support a claim that mean rainfall from n = 15 and s = 0.008 millimeter. Is there strong
seeded clouds exceeds 25 acre-feet? Use α = 0.01. evidence to indicate that the standard deviation of
Find the p-value. hole diameter exceeds 0.01 millimeter? Use α = 0.01.
8-14. The increased availability of light materials State any necessary assumptions about the underly-
with high strength has revolutionized the design and ing distribution of the data. Find the p-value for this
manufacture of golf clubs, particularly drivers. Clubs test.

148
A. Statistical Tables
A-1. Cumulative Standard Normal Distribution

P[Z < z0 ]

z
z0

z0 -0.09 -0.08 -0.07 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 -0.00
-3.9 0.000033 0.000034 0.000036 0.000037 0.000039 0.000041 0.000042 0.000044 0.000046 0.000048
-3.8 0.000050 0.000052 0.000054 0.000057 0.000059 0.000062 0.000064 0.000067 0.000069 0.000072
-3.7 0.000075 0.000078 0.000082 0.000085 0.000088 0.000092 0.000096 0.000100 0.000104 0.000108
-3.6 0.000112 0.000117 0.000121 0.000126 0.000131 0.000136 0.000142 0.000147 0.000153 0.000159
-3.5 0.000165 0.000172 0.000178 0.000185 0.000193 0.000200 0.000208 0.000216 0.000224 0.000233
-3.4 0.000242 0.000251 0.000260 0.000270 0.000280 0.000291 0.000302 0.000313 0.000325 0.000337
-3.3 0.000349 0.000362 0.000376 0.000390 0.000404 0.000419 0.000434 0.000450 0.000466 0.000483
-3.2 0.000501 0.000519 0.000538 0.000557 0.000577 0.000598 0.000619 0.000641 0.000664 0.000687
-3.1 0.000711 0.000736 0.000762 0.000789 0.000816 0.000845 0.000874 0.000904 0.000935 0.000968
-3.0 0.001001 0.001035 0.001070 0.001107 0.001144 0.001183 0.001223 0.001264 0.001306 0.001350
-2.9 0.001395 0.001441 0.001489 0.001538 0.001589 0.001641 0.001695 0.001750 0.001807 0.001866
-2.8 0.001926 0.001988 0.002052 0.002118 0.002186 0.002256 0.002327 0.002401 0.002477 0.002555
-2.7 0.002635 0.002718 0.002803 0.002890 0.002980 0.003072 0.003167 0.003264 0.003364 0.003467
-2.6 0.003573 0.003681 0.003793 0.003907 0.004025 0.004145 0.004269 0.004396 0.004527 0.004661
-2.5 0.004799 0.004940 0.005085 0.005234 0.005386 0.005543 0.005703 0.005868 0.006037 0.006210
-2.4 0.006387 0.006569 0.006756 0.006947 0.007143 0.007344 0.007549 0.007760 0.007976 0.008198
-2.3 0.008424 0.008656 0.008894 0.009137 0.009387 0.009642 0.009903 0.010170 0.010444 0.010724
-2.2 0.011011 0.011304 0.011604 0.011911 0.012224 0.012545 0.012874 0.013209 0.013553 0.013903
-2.1 0.014262 0.014629 0.015003 0.015386 0.015778 0.016177 0.016586 0.017003 0.017429 0.017864
-2.0 0.018309 0.018763 0.019226 0.019699 0.020182 0.020675 0.021178 0.021692 0.022216 0.022750
-1.9 0.023295 0.023852 0.024419 0.024998 0.025588 0.026190 0.026803 0.027429 0.028067 0.028717
-1.8 0.029379 0.030054 0.030742 0.031443 0.032157 0.032884 0.033625 0.034380 0.035148 0.035930
-1.7 0.036727 0.037538 0.038364 0.039204 0.040059 0.040930 0.041815 0.042716 0.043633 0.044565
-1.6 0.045514 0.046479 0.047460 0.048457 0.049471 0.050503 0.051551 0.052616 0.053699 0.054799
-1.5 0.055917 0.057053 0.058208 0.059380 0.060571 0.061780 0.063008 0.064255 0.065522 0.066807
-1.4 0.068112 0.069437 0.070781 0.072145 0.073529 0.074934 0.076359 0.077804 0.079270 0.080757
-1.3 0.082264 0.083793 0.085343 0.086915 0.088508 0.090123 0.091759 0.093418 0.095098 0.096800
-1.2 0.098525 0.100273 0.102042 0.103835 0.105650 0.107488 0.109349 0.111232 0.113139 0.115070
-1.1 0.117023 0.119000 0.121000 0.123024 0.125072 0.127143 0.129238 0.131357 0.133500 0.135666
-1.0 0.137857 0.140071 0.142310 0.144572 0.146859 0.149170 0.151505 0.153864 0.156248 0.158655
-0.9 0.161087 0.163543 0.166023 0.168528 0.171056 0.173609 0.176186 0.178786 0.181411 0.184060
-0.8 0.186733 0.189430 0.192150 0.194895 0.197663 0.200454 0.203269 0.206108 0.208970 0.211855
-0.7 0.214764 0.217695 0.220650 0.223627 0.226627 0.229650 0.232695 0.235762 0.238852 0.241964
-0.6 0.245097 0.248252 0.251429 0.254627 0.257846 0.261086 0.264347 0.267629 0.270931 0.274253
-0.5 0.277595 0.280957 0.284339 0.287740 0.291160 0.294599 0.298056 0.301532 0.305026 0.308538
-0.4 0.312067 0.315614 0.319178 0.322758 0.326355 0.329969 0.333598 0.337243 0.340903 0.344578
-0.3 0.348268 0.351973 0.355691 0.359424 0.363169 0.366928 0.370700 0.374484 0.378280 0.382089
-0.2 0.385908 0.389739 0.393580 0.397432 0.401294 0.405165 0.409046 0.412936 0.416834 0.420740
-0.1 0.424655 0.428576 0.432505 0.436441 0.440382 0.444330 0.448283 0.452242 0.456205 0.460172
-0.0 0.464144 0.468119 0.472097 0.476078 0.480061 0.484047 0.488034 0.492022 0.496011 0.500000

149
A. Statistical Tables

P[Z < z0 ]

z
z0

z0 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.500000 0.503989 0.507978 0.511966 0.515953 0.519939 0.523922 0.527903 0.531881 0.535856
0.1 0.539828 0.543795 0.547758 0.551717 0.555670 0.559618 0.563559 0.567495 0.571424 0.575345
0.2 0.579260 0.583166 0.587064 0.590954 0.594835 0.598706 0.602568 0.606420 0.610261 0.614092
0.3 0.617911 0.621720 0.625516 0.629300 0.633072 0.636831 0.640576 0.644309 0.648027 0.651732
0.4 0.655422 0.659097 0.662757 0.666402 0.670031 0.673645 0.677242 0.680822 0.684386 0.687933
0.5 0.691462 0.694974 0.698468 0.701944 0.705401 0.708840 0.712260 0.715661 0.719043 0.722405
0.6 0.725747 0.729069 0.732371 0.735653 0.738914 0.742154 0.745373 0.748571 0.751748 0.754903
0.7 0.758036 0.761148 0.764238 0.767305 0.770350 0.773373 0.776373 0.779350 0.782305 0.785236
0.8 0.788145 0.791030 0.793892 0.796731 0.799546 0.802337 0.805105 0.807850 0.810570 0.813267
0.9 0.815940 0.818589 0.821214 0.823814 0.826391 0.828944 0.831472 0.833977 0.836457 0.838913
1.0 0.841345 0.843752 0.846136 0.848495 0.850830 0.853141 0.855428 0.857690 0.859929 0.862143
1.1 0.864334 0.866500 0.868643 0.870762 0.872857 0.874928 0.876976 0.879000 0.881000 0.882977
1.2 0.884930 0.886861 0.888768 0.890651 0.892512 0.894350 0.896165 0.897958 0.899727 0.901475
1.3 0.903200 0.904902 0.906582 0.908241 0.909877 0.911492 0.913085 0.914657 0.916207 0.917736
1.4 0.919243 0.920730 0.922196 0.923641 0.925066 0.926471 0.927855 0.929219 0.930563 0.931888
1.5 0.933193 0.934478 0.935745 0.936992 0.938220 0.939429 0.940620 0.941792 0.942947 0.944083
1.6 0.945201 0.946301 0.947384 0.948449 0.949497 0.950529 0.951543 0.952540 0.953521 0.954486
1.7 0.955435 0.956367 0.957284 0.958185 0.959070 0.959941 0.960796 0.961636 0.962462 0.963273
1.8 0.964070 0.964852 0.965620 0.966375 0.967116 0.967843 0.968557 0.969258 0.969946 0.970621
1.9 0.971283 0.971933 0.972571 0.973197 0.973810 0.974412 0.975002 0.975581 0.976148 0.976705
2.0 0.977250 0.977784 0.978308 0.978822 0.979325 0.979818 0.980301 0.980774 0.981237 0.981691
2.1 0.982136 0.982571 0.982997 0.983414 0.983823 0.984222 0.984614 0.984997 0.985371 0.985738
2.2 0.986097 0.986447 0.986791 0.987126 0.987455 0.987776 0.988089 0.988396 0.988696 0.988989
2.3 0.989276 0.989556 0.989830 0.990097 0.990358 0.990613 0.990863 0.991106 0.991344 0.991576
2.4 0.991802 0.992024 0.992240 0.992451 0.992656 0.992857 0.993053 0.993244 0.993431 0.993613
2.5 0.993790 0.993963 0.994132 0.994297 0.994457 0.994614 0.994766 0.994915 0.995060 0.995201
2.6 0.995339 0.995473 0.995604 0.995731 0.995855 0.995975 0.996093 0.996207 0.996319 0.996427
2.7 0.996533 0.996636 0.996736 0.996833 0.996928 0.997020 0.997110 0.997197 0.997282 0.997365
2.8 0.997445 0.997523 0.997599 0.997673 0.997744 0.997814 0.997882 0.997948 0.998012 0.998074
2.9 0.998134 0.998193 0.998250 0.998305 0.998359 0.998411 0.998462 0.998511 0.998559 0.998605
3.0 0.998650 0.998694 0.998736 0.998777 0.998817 0.998856 0.998893 0.998930 0.998965 0.998999
3.1 0.999032 0.999065 0.999096 0.999126 0.999155 0.999184 0.999211 0.999238 0.999264 0.999289
3.2 0.999313 0.999336 0.999359 0.999381 0.999402 0.999423 0.999443 0.999462 0.999481 0.999499
3.3 0.999517 0.999534 0.999550 0.999566 0.999581 0.999596 0.999610 0.999624 0.999638 0.999651
3.4 0.999663 0.999675 0.999687 0.999698 0.999709 0.999720 0.999730 0.999740 0.999749 0.999758
3.5 0.999767 0.999776 0.999784 0.999792 0.999800 0.999807 0.999815 0.999822 0.999828 0.999835
3.6 0.999841 0.999847 0.999853 0.999858 0.999864 0.999869 0.999874 0.999879 0.999883 0.999888
3.7 0.999892 0.999896 0.999900 0.999904 0.999908 0.999912 0.999915 0.999918 0.999922 0.999925
3.8 0.999928 0.999931 0.999933 0.999936 0.999938 0.999941 0.999943 0.999946 0.999948 0.999950
3.9 0.999952 0.999954 0.999956 0.999958 0.999959 0.999961 0.999963 0.999964 0.999966 0.999967

150
A-2. Percentage Points tα,ν of the t Distribution

A-2. Percentage Points t α,ν of the t Distribution

 
P T > tα,ν

tα,ν

Critical value tα,ν


α
ν 0.1 0.05 0.025 0.01 0.005 0.0025 0.0005
1 3.078 6.314 12.706 31.821 63.657 127.321 636.619
2 1.886 2.920 4.303 6.965 9.925 14.089 31.599
3 1.638 2.353 3.182 4.541 5.841 7.453 12.924
4 1.533 2.132 2.776 3.747 4.604 5.598 8.610
5 1.476 2.015 2.571 3.365 4.032 4.773 6.869
6 1.440 1.943 2.447 3.143 3.707 4.317 5.959
7 1.415 1.895 2.365 2.998 3.499 4.029 5.408
8 1.397 1.860 2.306 2.896 3.355 3.833 5.041
9 1.383 1.833 2.262 2.821 3.250 3.690 4.781
10 1.372 1.812 2.228 2.764 3.169 3.581 4.587
11 1.363 1.796 2.201 2.718 3.106 3.497 4.437
12 1.356 1.782 2.179 2.681 3.055 3.428 4.318
13 1.350 1.771 2.160 2.650 3.012 3.372 4.221
14 1.345 1.761 2.145 2.624 2.977 3.326 4.140
15 1.341 1.753 2.131 2.602 2.947 3.286 4.073
16 1.337 1.746 2.120 2.583 2.921 3.252 4.015
17 1.333 1.740 2.110 2.567 2.898 3.222 3.965
18 1.330 1.734 2.101 2.552 2.878 3.197 3.922
19 1.328 1.729 2.093 2.539 2.861 3.174 3.883
20 1.325 1.725 2.086 2.528 2.845 3.153 3.850
21 1.323 1.721 2.080 2.518 2.831 3.135 3.819
22 1.321 1.717 2.074 2.508 2.819 3.119 3.792
23 1.319 1.714 2.069 2.500 2.807 3.104 3.768
24 1.318 1.711 2.064 2.492 2.797 3.091 3.745
25 1.316 1.708 2.060 2.485 2.787 3.078 3.725
26 1.315 1.706 2.056 2.479 2.779 3.067 3.707
27 1.314 1.703 2.052 2.473 2.771 3.057 3.690
28 1.313 1.701 2.048 2.467 2.763 3.047 3.674
29 1.311 1.699 2.045 2.462 2.756 3.038 3.659
30 1.310 1.697 2.042 2.457 2.750 3.030 3.646
35 1.306 1.690 2.030 2.438 2.724 2.996 3.591
40 1.303 1.684 2.021 2.423 2.704 2.971 3.551
50 1.299 1.676 2.009 2.403 2.678 2.937 3.496
60 1.296 1.671 2.000 2.390 2.660 2.915 3.460
80 1.292 1.664 1.990 2.374 2.639 2.887 3.416
100 1.290 1.660 1.984 2.364 2.626 2.871 3.390
120 1.289 1.658 1.980 2.358 2.617 2.860 3.373
∞ 1.282 1.645 1.960 2.326 2.576 2.807 3.291

151
A. Statistical Tables

A-3. Percentage Points χ2α,ν of the Chi-Squared Distribution


h i
P X(ν) α,ν = α
2 > χ2

0
χ2α,ν

Critical value χ2α,ν


α
ν 0.995 0.99 0.975 0.95 0.9 0.1 0.05 0.025 0.01 0.005
1 0.00004 0.0002 0.001 0.004 0.02 2.71 3.84 5.02 6.63 7.88
2 0.01 0.02 0.05 0.10 0.21 4.61 5.99 7.38 9.21 10.60
3 0.07 0.11 0.22 0.35 0.58 6.25 7.81 9.35 11.34 12.84
4 0.21 0.30 0.48 0.71 1.06 7.78 9.49 11.14 13.28 14.86
5 0.41 0.55 0.83 1.15 1.61 9.24 11.07 12.83 15.09 16.75
6 0.68 0.87 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.55
7 0.99 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.28
8 1.34 1.65 2.18 2.73 3.49 13.36 15.51 17.53 20.09 21.95
9 1.73 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59
10 2.16 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19
11 2.60 3.05 3.82 4.57 5.58 17.28 19.68 21.92 24.72 26.76
12 3.07 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.30
13 3.57 4.11 5.01 5.89 7.04 19.81 22.36 24.74 27.69 29.82
14 4.07 4.66 5.63 6.57 7.79 21.06 23.68 26.12 29.14 31.32
15 4.60 5.23 6.26 7.26 8.55 22.31 25.00 27.49 30.58 32.80
16 5.14 5.81 6.91 7.96 9.31 23.54 26.30 28.85 32.00 34.27
17 5.70 6.41 7.56 8.67 10.09 24.77 27.59 30.19 33.41 35.72
18 6.26 7.01 8.23 9.39 10.86 25.99 28.87 31.53 34.81 37.16
19 6.84 7.63 8.91 10.12 11.65 27.20 30.14 32.85 36.19 38.58
20 7.43 8.26 9.59 10.85 12.44 28.41 31.41 34.17 37.57 40.00
21 8.03 8.90 10.28 11.59 13.24 29.62 32.67 35.48 38.93 41.40
22 8.64 9.54 10.98 12.34 14.04 30.81 33.92 36.78 40.29 42.80
23 9.26 10.20 11.69 13.09 14.85 32.01 35.17 38.08 41.64 44.18
24 9.89 10.86 12.40 13.85 15.66 33.20 36.42 39.36 42.98 45.56
25 10.52 11.52 13.12 14.61 16.47 34.38 37.65 40.65 44.31 46.93
26 11.16 12.20 13.84 15.38 17.29 35.56 38.89 41.92 45.64 48.29
27 11.81 12.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96 49.64
28 12.46 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99
29 13.12 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34
30 13.79 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67
35 17.19 18.51 20.57 22.47 24.80 46.06 49.80 53.20 57.34 60.27
40 20.71 22.16 24.43 26.51 29.05 51.81 55.76 59.34 63.69 66.77
50 27.99 29.71 32.36 34.76 37.69 63.17 67.50 71.42 76.15 79.49
60 35.53 37.48 40.48 43.19 46.46 74.40 79.08 83.30 88.38 91.95
80 51.17 53.54 57.15 60.39 64.28 96.58 101.88 106.63 112.33 116.32
100 67.33 70.06 74.22 77.93 82.36 118.50 124.34 129.56 135.81 140.17
120 83.85 86.92 91.57 95.70 100.62 140.23 146.57 152.21 158.95 163.65

152
A-4. Critical Values of the F-distribution
f0.10 (ν1 , ν2 )
ν1
ν2 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 ∞
1 39.86 49.50 53.59 55.83 57.24 58.20 58.91 59.44 59.86 60.19 60.71 61.22 61.74 62.00 62.26 62.53 62.79 63.06 63.30
2 8.53 9.00 9.16 9.24 9.29 9.33 9.35 9.37 9.38 9.39 9.41 9.42 9.44 9.45 9.46 9.47 9.47 9.48 9.49
3 5.54 5.46 5.39 5.34 5.31 5.28 5.27 5.25 5.24 5.23 5.22 5.20 5.18 5.18 5.17 5.16 5.15 5.14 5.13
4 4.54 4.32 4.19 4.11 4.05 4.01 3.98 3.95 3.94 3.92 3.90 3.87 3.84 3.83 3.82 3.80 3.79 3.78 3.76
5 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 3.30 3.27 3.24 3.21 3.19 3.17 3.16 3.14 3.12 3.11
6 3.78 3.46 3.29 3.18 3.11 3.05 3.01 2.98 2.96 2.94 2.90 2.87 2.84 2.82 2.80 2.78 2.76 2.74 2.72
7 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2.72 2.70 2.67 2.63 2.59 2.58 2.56 2.54 2.51 2.49 2.47
8 3.46 3.11 2.92 2.81 2.73 2.67 2.62 2.59 2.56 2.54 2.50 2.46 2.42 2.40 2.38 2.36 2.34 2.32 2.30
9 3.36 3.01 2.81 2.69 2.61 2.55 2.51 2.47 2.44 2.42 2.38 2.34 2.30 2.28 2.25 2.23 2.21 2.18 2.16
10 3.29 2.92 2.73 2.61 2.52 2.46 2.41 2.38 2.35 2.32 2.28 2.24 2.20 2.18 2.16 2.13 2.11 2.08 2.06
11 3.23 2.86 2.66 2.54 2.45 2.39 2.34 2.30 2.27 2.25 2.21 2.17 2.12 2.10 2.08 2.05 2.03 2.00 1.98
12 3.18 2.81 2.61 2.48 2.39 2.33 2.28 2.24 2.21 2.19 2.15 2.10 2.06 2.04 2.01 1.99 1.96 1.93 1.91
13 3.14 2.76 2.56 2.43 2.35 2.28 2.23 2.20 2.16 2.14 2.10 2.05 2.01 1.98 1.96 1.93 1.90 1.88 1.85
14 3.10 2.73 2.52 2.39 2.31 2.24 2.19 2.15 2.12 2.10 2.05 2.01 1.96 1.94 1.91 1.89 1.86 1.83 1.80
15 3.07 2.70 2.49 2.36 2.27 2.21 2.16 2.12 2.09 2.06 2.02 1.97 1.92 1.90 1.87 1.85 1.82 1.79 1.76
16 3.05 2.67 2.46 2.33 2.24 2.18 2.13 2.09 2.06 2.03 1.99 1.94 1.89 1.87 1.84 1.81 1.78 1.75 1.72
17 3.03 2.64 2.44 2.31 2.22 2.15 2.10 2.06 2.03 2.00 1.96 1.91 1.86 1.84 1.81 1.78 1.75 1.72 1.69
18 3.01 2.62 2.42 2.29 2.20 2.13 2.08 2.04 2.00 1.98 1.93 1.89 1.84 1.81 1.78 1.75 1.72 1.69 1.66
19 2.99 2.61 2.40 2.27 2.18 2.11 2.06 2.02 1.98 1.96 1.91 1.86 1.81 1.79 1.76 1.73 1.70 1.67 1.64

A-4. Critical Values of the F-distribution


20 2.97 2.59 2.38 2.25 2.16 2.09 2.04 2.00 1.96 1.94 1.89 1.84 1.79 1.77 1.74 1.71 1.68 1.64 1.61
21 2.96 2.57 2.36 2.23 2.14 2.08 2.02 1.98 1.95 1.92 1.87 1.83 1.78 1.75 1.72 1.69 1.66 1.62 1.59
22 2.95 2.56 2.35 2.22 2.13 2.06 2.01 1.97 1.93 1.90 1.86 1.81 1.76 1.73 1.70 1.67 1.64 1.60 1.57
23 2.94 2.55 2.34 2.21 2.11 2.05 1.99 1.95 1.92 1.89 1.84 1.80 1.74 1.72 1.69 1.66 1.62 1.59 1.55
24 2.93 2.54 2.33 2.19 2.10 2.04 1.98 1.94 1.91 1.88 1.83 1.78 1.73 1.70 1.67 1.64 1.61 1.57 1.54
25 2.92 2.53 2.32 2.18 2.09 2.02 1.97 1.93 1.89 1.87 1.82 1.77 1.72 1.69 1.66 1.63 1.59 1.56 1.52
26 2.91 2.52 2.31 2.17 2.08 2.01 1.96 1.92 1.88 1.86 1.81 1.76 1.71 1.68 1.65 1.61 1.58 1.54 1.51
27 2.90 2.51 2.30 2.17 2.07 2.00 1.95 1.91 1.87 1.85 1.80 1.75 1.70 1.67 1.64 1.60 1.57 1.53 1.50
28 2.89 2.50 2.29 2.16 2.06 2.00 1.94 1.90 1.87 1.84 1.79 1.74 1.69 1.66 1.63 1.59 1.56 1.52 1.48
29 2.89 2.50 2.28 2.15 2.06 1.99 1.93 1.89 1.86 1.83 1.78 1.73 1.68 1.65 1.62 1.58 1.55 1.51 1.47
30 2.88 2.49 2.28 2.14 2.05 1.98 1.93 1.88 1.85 1.82 1.77 1.72 1.67 1.64 1.61 1.57 1.54 1.50 1.46
40 2.84 2.44 2.23 2.09 2.00 1.93 1.87 1.83 1.79 1.76 1.71 1.66 1.61 1.57 1.54 1.51 1.47 1.42 1.38
60 2.79 2.39 2.18 2.04 1.95 1.87 1.82 1.77 1.74 1.71 1.66 1.60 1.54 1.51 1.48 1.44 1.40 1.35 1.30
120 2.75 2.35 2.13 1.99 1.90 1.82 1.77 1.72 1.68 1.65 1.60 1.55 1.48 1.45 1.41 1.37 1.32 1.26 1.20
∞ 2.71 2.31 2.09 1.95 1.85 1.78 1.72 1.68 1.64 1.61 1.55 1.49 1.43 1.39 1.35 1.30 1.25 1.18 1.08
153
154

A. Statistical Tables
f0.05 (ν1 , ν2 )
ν1
ν2 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 ∞
1 161.45 199.50 215.71 224.58 230.16 233.99 236.77 238.88 240.54 241.88 243.91 245.95 248.01 249.05 250.10 251.14 252.20 253.25 254.19
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 19.41 19.43 19.45 19.45 19.46 19.47 19.48 19.49 19.49
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.53
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.91 5.86 5.80 5.77 5.75 5.72 5.69 5.66 5.63
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.37
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.00 3.94 3.87 3.84 3.81 3.77 3.74 3.70 3.67
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.23
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.28 3.22 3.15 3.12 3.08 3.04 3.01 2.97 2.93
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.71
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.91 2.85 2.77 2.74 2.70 2.66 2.62 2.58 2.54
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.79 2.72 2.65 2.61 2.57 2.53 2.49 2.45 2.41
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.69 2.62 2.54 2.51 2.47 2.43 2.38 2.34 2.30
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 2.60 2.53 2.46 2.42 2.38 2.34 2.30 2.25 2.21
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60 2.53 2.46 2.39 2.35 2.31 2.27 2.22 2.18 2.14
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.48 2.40 2.33 2.29 2.25 2.20 2.16 2.11 2.07
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.42 2.35 2.28 2.24 2.19 2.15 2.11 2.06 2.02
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45 2.38 2.31 2.23 2.19 2.15 2.10 2.06 2.01 1.97
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.34 2.27 2.19 2.15 2.11 2.06 2.02 1.97 1.92
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.88
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.28 2.20 2.12 2.08 2.04 1.99 1.95 1.90 1.85
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.25 2.18 2.10 2.05 2.01 1.96 1.92 1.87 1.82
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30 2.23 2.15 2.07 2.03 1.98 1.94 1.89 1.84 1.79
23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27 2.20 2.13 2.05 2.01 1.96 1.91 1.86 1.81 1.76
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 2.18 2.11 2.03 1.98 1.94 1.89 1.84 1.79 1.74
25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24 2.16 2.09 2.01 1.96 1.92 1.87 1.82 1.77 1.72
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.15 2.07 1.99 1.95 1.90 1.85 1.80 1.75 1.70
27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.20 2.13 2.06 1.97 1.93 1.88 1.84 1.79 1.73 1.68
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19 2.12 2.04 1.96 1.91 1.87 1.82 1.77 1.71 1.66
29 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.18 2.10 2.03 1.94 1.90 1.85 1.81 1.75 1.70 1.65
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 2.09 2.01 1.93 1.89 1.84 1.79 1.74 1.68 1.63
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 2.00 1.92 1.84 1.79 1.74 1.69 1.64 1.58 1.52
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99 1.92 1.84 1.75 1.70 1.65 1.59 1.53 1.47 1.40
120 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91 1.83 1.75 1.66 1.61 1.55 1.50 1.43 1.35 1.27
∞ 3.85 3.00 2.61 2.38 2.22 2.11 2.02 1.95 1.89 1.84 1.76 1.68 1.58 1.53 1.47 1.41 1.33 1.24 1.11
f0.025 (ν1 , ν2 )
ν1
ν2 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 ∞
1 647.79 799.50 864.16 899.58 921.85 937.11 948.22 956.66 963.28 968.63 976.71 984.87 993.10 997.25 1001.4 1005.6 1009.8 1014.0 1017.8
2 38.51 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.39 39.40 39.41 39.43 39.45 39.46 39.46 39.47 39.48 39.49 39.50
3 17.44 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.47 14.42 14.34 14.25 14.17 14.12 14.08 14.04 13.99 13.95 13.91
4 12.22 10.65 9.98 9.60 9.36 9.20 9.07 8.98 8.90 8.84 8.75 8.66 8.56 8.51 8.46 8.41 8.36 8.31 8.26
5 10.01 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.68 6.62 6.52 6.43 6.33 6.28 6.23 6.18 6.12 6.07 6.02
6 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.52 5.46 5.37 5.27 5.17 5.12 5.07 5.01 4.96 4.90 4.86
7 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.90 4.82 4.76 4.67 4.57 4.47 4.41 4.36 4.31 4.25 4.20 4.15
8 7.57 6.06 5.42 5.05 4.82 4.65 4.53 4.43 4.36 4.30 4.20 4.10 4.00 3.95 3.89 3.84 3.78 3.73 3.68
9 7.21 5.71 5.08 4.72 4.48 4.32 4.20 4.10 4.03 3.96 3.87 3.77 3.67 3.61 3.56 3.51 3.45 3.39 3.34
10 6.94 5.46 4.83 4.47 4.24 4.07 3.95 3.85 3.78 3.72 3.62 3.52 3.42 3.37 3.31 3.26 3.20 3.14 3.09
11 6.72 5.26 4.63 4.28 4.04 3.88 3.76 3.66 3.59 3.53 3.43 3.33 3.23 3.17 3.12 3.06 3.00 2.94 2.89
12 6.55 5.10 4.47 4.12 3.89 3.73 3.61 3.51 3.44 3.37 3.28 3.18 3.07 3.02 2.96 2.91 2.85 2.79 2.73
13 6.41 4.97 4.35 4.00 3.77 3.60 3.48 3.39 3.31 3.25 3.15 3.05 2.95 2.89 2.84 2.78 2.72 2.66 2.60
14 6.30 4.86 4.24 3.89 3.66 3.50 3.38 3.29 3.21 3.15 3.05 2.95 2.84 2.79 2.73 2.67 2.61 2.55 2.50
15 6.20 4.77 4.15 3.80 3.58 3.41 3.29 3.20 3.12 3.06 2.96 2.86 2.76 2.70 2.64 2.59 2.52 2.46 2.40
16 6.12 4.69 4.08 3.73 3.50 3.34 3.22 3.12 3.05 2.99 2.89 2.79 2.68 2.63 2.57 2.51 2.45 2.38 2.32
17 6.04 4.62 4.01 3.66 3.44 3.28 3.16 3.06 2.98 2.92 2.82 2.72 2.62 2.56 2.50 2.44 2.38 2.32 2.26
18 5.98 4.56 3.95 3.61 3.38 3.22 3.10 3.01 2.93 2.87 2.77 2.67 2.56 2.50 2.44 2.38 2.32 2.26 2.20
19 5.92 4.51 3.90 3.56 3.33 3.17 3.05 2.96 2.88 2.82 2.72 2.62 2.51 2.45 2.39 2.33 2.27 2.20 2.14
20 5.87 4.46 3.86 3.51 3.29 3.13 3.01 2.91 2.84 2.77 2.68 2.57 2.46 2.41 2.35 2.29 2.22 2.16 2.09
21 5.83 4.42 3.82 3.48 3.25 3.09 2.97 2.87 2.80 2.73 2.64 2.53 2.42 2.37 2.31 2.25 2.18 2.11 2.05
22 5.79 4.38 3.78 3.44 3.22 3.05 2.93 2.84 2.76 2.70 2.60 2.50 2.39 2.33 2.27 2.21 2.14 2.08 2.01

A-4. Critical Values of the F-distribution


23 5.75 4.35 3.75 3.41 3.18 3.02 2.90 2.81 2.73 2.67 2.57 2.47 2.36 2.30 2.24 2.18 2.11 2.04 1.98
24 5.72 4.32 3.72 3.38 3.15 2.99 2.87 2.78 2.70 2.64 2.54 2.44 2.33 2.27 2.21 2.15 2.08 2.01 1.94
25 5.69 4.29 3.69 3.35 3.13 2.97 2.85 2.75 2.68 2.61 2.51 2.41 2.30 2.24 2.18 2.12 2.05 1.98 1.91
26 5.66 4.27 3.67 3.33 3.10 2.94 2.82 2.73 2.65 2.59 2.49 2.39 2.28 2.22 2.16 2.09 2.03 1.95 1.89
27 5.63 4.24 3.65 3.31 3.08 2.92 2.80 2.71 2.63 2.57 2.47 2.36 2.25 2.19 2.13 2.07 2.00 1.93 1.86
28 5.61 4.22 3.63 3.29 3.06 2.90 2.78 2.69 2.61 2.55 2.45 2.34 2.23 2.17 2.11 2.05 1.98 1.91 1.84
29 5.59 4.20 3.61 3.27 3.04 2.88 2.76 2.67 2.59 2.53 2.43 2.32 2.21 2.15 2.09 2.03 1.96 1.89 1.82
30 5.57 4.18 3.59 3.25 3.03 2.87 2.75 2.65 2.57 2.51 2.41 2.31 2.20 2.14 2.07 2.01 1.94 1.87 1.80
40 5.42 4.05 3.46 3.13 2.90 2.74 2.62 2.53 2.45 2.39 2.29 2.18 2.07 2.01 1.94 1.88 1.80 1.72 1.65
60 5.29 3.93 3.34 3.01 2.79 2.63 2.51 2.41 2.33 2.27 2.17 2.06 1.94 1.88 1.82 1.74 1.67 1.58 1.49
120 5.15 3.80 3.23 2.89 2.67 2.52 2.39 2.30 2.22 2.16 2.05 1.94 1.82 1.76 1.69 1.61 1.53 1.43 1.33
∞ 5.04 3.70 3.13 2.80 2.58 2.42 2.30 2.20 2.13 2.06 1.96 1.85 1.72 1.65 1.58 1.50 1.41 1.29 1.13
155
156

A. Statistical Tables
f0.01 (ν1 , ν2 )
ν1
ν2 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 ∞
1 4052.2 4999.5 5403.4 5624.6 5763.7 5859.0 5928.4 5981.1 6022.5 6055.9 6106.3 6157.3 6208.7 6234.6 6260.7 6286.8 6313.0 6339.4 6362.7
2 98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40 99.42 99.43 99.45 99.46 99.47 99.47 99.48 99.49 99.50
3 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 27.23 27.05 26.87 26.69 26.60 26.50 26.41 26.32 26.22 26.14
4 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55 14.37 14.20 14.02 13.93 13.84 13.75 13.65 13.56 13.47
5 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 9.89 9.72 9.55 9.47 9.38 9.29 9.20 9.11 9.03
6 13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.72 7.56 7.40 7.31 7.23 7.14 7.06 6.97 6.89
7 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.47 6.31 6.16 6.07 5.99 5.91 5.82 5.74 5.66
8 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.67 5.52 5.36 5.28 5.20 5.12 5.03 4.95 4.87
9 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 5.11 4.96 4.81 4.73 4.65 4.57 4.48 4.40 4.32
10 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.71 4.56 4.41 4.33 4.25 4.17 4.08 4.00 3.92
11 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 4.54 4.40 4.25 4.10 4.02 3.94 3.86 3.78 3.69 3.61
12 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30 4.16 4.01 3.86 3.78 3.70 3.62 3.54 3.45 3.37
13 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 4.10 3.96 3.82 3.66 3.59 3.51 3.43 3.34 3.25 3.18
14 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 3.94 3.80 3.66 3.51 3.43 3.35 3.27 3.18 3.09 3.02
15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.67 3.52 3.37 3.29 3.21 3.13 3.05 2.96 2.88
16 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.55 3.41 3.26 3.18 3.10 3.02 2.93 2.84 2.76
17 8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.46 3.31 3.16 3.08 3.00 2.92 2.83 2.75 2.66
18 8.29 6.01 5.09 4.58 4.25 4.01 3.84 3.71 3.60 3.51 3.37 3.23 3.08 3.00 2.92 2.84 2.75 2.66 2.58
19 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.30 3.15 3.00 2.92 2.84 2.76 2.67 2.58 2.50
20 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 3.23 3.09 2.94 2.86 2.78 2.69 2.61 2.52 2.43
21 8.02 5.78 4.87 4.37 4.04 3.81 3.64 3.51 3.40 3.31 3.17 3.03 2.88 2.80 2.72 2.64 2.55 2.46 2.37
22 7.95 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26 3.12 2.98 2.83 2.75 2.67 2.58 2.50 2.40 2.32
23 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21 3.07 2.93 2.78 2.70 2.62 2.54 2.45 2.35 2.27
24 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 3.17 3.03 2.89 2.74 2.66 2.58 2.49 2.40 2.31 2.22
25 7.77 5.57 4.68 4.18 3.85 3.63 3.46 3.32 3.22 3.13 2.99 2.85 2.70 2.62 2.54 2.45 2.36 2.27 2.18
26 7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.18 3.09 2.96 2.81 2.66 2.58 2.50 2.42 2.33 2.23 2.14
27 7.68 5.49 4.60 4.11 3.78 3.56 3.39 3.26 3.15 3.06 2.93 2.78 2.63 2.55 2.47 2.38 2.29 2.20 2.11
28 7.64 5.45 4.57 4.07 3.75 3.53 3.36 3.23 3.12 3.03 2.90 2.75 2.60 2.52 2.44 2.35 2.26 2.17 2.08
29 7.60 5.42 4.54 4.04 3.73 3.50 3.33 3.20 3.09 3.00 2.87 2.73 2.57 2.49 2.41 2.33 2.23 2.14 2.05
30 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.07 2.98 2.84 2.70 2.55 2.47 2.39 2.30 2.21 2.11 2.02
40 7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.89 2.80 2.66 2.52 2.37 2.29 2.20 2.11 2.02 1.92 1.82
60 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63 2.50 2.35 2.20 2.12 2.03 1.94 1.84 1.73 1.62
120 6.85 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 2.47 2.34 2.19 2.03 1.95 1.86 1.76 1.66 1.53 1.40
∞ 6.66 4.63 3.80 3.34 3.04 2.82 2.66 2.53 2.43 2.34 2.20 2.06 1.90 1.81 1.72 1.61 1.50 1.35 1.16
f0.005 (ν1 , ν2 )
ν1
ν2 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 ∞
1 16211 20000 21615 22500 23056 23437 23715 23925 24091 24224 24426 24630 24836 24940 25044 25148 25253 25359 25452
2 198.50 199.00 199.17 199.25 199.30 199.33 199.36 199.37 199.39 199.40 199.42 199.43 199.45 199.46 199.47 199.47 199.48 199.49 199.50
3 55.55 49.80 47.47 46.19 45.39 44.84 44.43 44.13 43.88 43.69 43.39 43.08 42.78 42.62 42.47 42.31 42.15 41.99 41.85
4 31.33 26.28 24.26 23.15 22.46 21.97 21.62 21.35 21.14 20.97 20.70 20.44 20.17 20.03 19.89 19.75 19.61 19.47 19.34
5 22.78 18.31 16.53 15.56 14.94 14.51 14.20 13.96 13.77 13.62 13.38 13.15 12.90 12.78 12.66 12.53 12.40 12.27 12.16
6 18.63 14.54 12.92 12.03 11.46 11.07 10.79 10.57 10.39 10.25 10.03 9.81 9.59 9.47 9.36 9.24 9.12 9.00 8.89
7 16.24 12.40 10.88 10.05 9.52 9.16 8.89 8.68 8.51 8.38 8.18 7.97 7.75 7.64 7.53 7.42 7.31 7.19 7.09
8 14.69 11.04 9.60 8.81 8.30 7.95 7.69 7.50 7.34 7.21 7.01 6.81 6.61 6.50 6.40 6.29 6.18 6.06 5.96
9 13.61 10.11 8.72 7.96 7.47 7.13 6.88 6.69 6.54 6.42 6.23 6.03 5.83 5.73 5.62 5.52 5.41 5.30 5.20
10 12.83 9.43 8.08 7.34 6.87 6.54 6.30 6.12 5.97 5.85 5.66 5.47 5.27 5.17 5.07 4.97 4.86 4.75 4.65
11 12.23 8.91 7.60 6.88 6.42 6.10 5.86 5.68 5.54 5.42 5.24 5.05 4.86 4.76 4.65 4.55 4.45 4.34 4.24
12 11.75 8.51 7.23 6.52 6.07 5.76 5.52 5.35 5.20 5.09 4.91 4.72 4.53 4.43 4.33 4.23 4.12 4.01 3.92
13 11.37 8.19 6.93 6.23 5.79 5.48 5.25 5.08 4.94 4.82 4.64 4.46 4.27 4.17 4.07 3.97 3.87 3.76 3.66
14 11.06 7.92 6.68 6.00 5.56 5.26 5.03 4.86 4.72 4.60 4.43 4.25 4.06 3.96 3.86 3.76 3.66 3.55 3.45
15 10.80 7.70 6.48 5.80 5.37 5.07 4.85 4.67 4.54 4.42 4.25 4.07 3.88 3.79 3.69 3.58 3.48 3.37 3.27
16 10.58 7.51 6.30 5.64 5.21 4.91 4.69 4.52 4.38 4.27 4.10 3.92 3.73 3.64 3.54 3.44 3.33 3.22 3.13
17 10.38 7.35 6.16 5.50 5.07 4.78 4.56 4.39 4.25 4.14 3.97 3.79 3.61 3.51 3.41 3.31 3.21 3.10 3.00
18 10.22 7.21 6.03 5.37 4.96 4.66 4.44 4.28 4.14 4.03 3.86 3.68 3.50 3.40 3.30 3.20 3.10 2.99 2.89
19 10.07 7.09 5.92 5.27 4.85 4.56 4.34 4.18 4.04 3.93 3.76 3.59 3.40 3.31 3.21 3.11 3.00 2.89 2.79
20 9.94 6.99 5.82 5.17 4.76 4.47 4.26 4.09 3.96 3.85 3.68 3.50 3.32 3.22 3.12 3.02 2.92 2.81 2.70
21 9.83 6.89 5.73 5.09 4.68 4.39 4.18 4.01 3.88 3.77 3.60 3.43 3.24 3.15 3.05 2.95 2.84 2.73 2.63
22 9.73 6.81 5.65 5.02 4.61 4.32 4.11 3.94 3.81 3.70 3.54 3.36 3.18 3.08 2.98 2.88 2.77 2.66 2.56

A-4. Critical Values of the F-distribution


23 9.63 6.73 5.58 4.95 4.54 4.26 4.05 3.88 3.75 3.64 3.47 3.30 3.12 3.02 2.92 2.82 2.71 2.60 2.50
24 9.55 6.66 5.52 4.89 4.49 4.20 3.99 3.83 3.69 3.59 3.42 3.25 3.06 2.97 2.87 2.77 2.66 2.55 2.44
25 9.48 6.60 5.46 4.84 4.43 4.15 3.94 3.78 3.64 3.54 3.37 3.20 3.01 2.92 2.82 2.72 2.61 2.50 2.39
26 9.41 6.54 5.41 4.79 4.38 4.10 3.89 3.73 3.60 3.49 3.33 3.15 2.97 2.87 2.77 2.67 2.56 2.45 2.34
27 9.34 6.49 5.36 4.74 4.34 4.06 3.85 3.69 3.56 3.45 3.28 3.11 2.93 2.83 2.73 2.63 2.52 2.41 2.30
28 9.28 6.44 5.32 4.70 4.30 4.02 3.81 3.65 3.52 3.41 3.25 3.07 2.89 2.79 2.69 2.59 2.48 2.37 2.26
29 9.23 6.40 5.28 4.66 4.26 3.98 3.77 3.61 3.48 3.38 3.21 3.04 2.86 2.76 2.66 2.56 2.45 2.33 2.23
30 9.18 6.35 5.24 4.62 4.23 3.95 3.74 3.58 3.45 3.34 3.18 3.01 2.82 2.73 2.63 2.52 2.42 2.30 2.19
40 8.83 6.07 4.98 4.37 3.99 3.71 3.51 3.35 3.22 3.12 2.95 2.78 2.60 2.50 2.40 2.30 2.18 2.06 1.95
60 8.49 5.79 4.73 4.14 3.76 3.49 3.29 3.13 3.01 2.90 2.74 2.57 2.39 2.29 2.19 2.08 1.96 1.83 1.71
120 8.18 5.54 4.50 3.92 3.55 3.28 3.09 2.93 2.81 2.71 2.54 2.37 2.19 2.09 1.98 1.87 1.75 1.61 1.45
∞ 7.91 5.33 4.30 3.74 3.37 3.11 2.92 2.77 2.64 2.54 2.38 2.21 2.02 1.92 1.81 1.69 1.56 1.39 1.18
157
A. Statistical Tables

A-5. Factors for Tolerance Levels

Values of k for Two-Sided Intervals


Confidence Level
0.90 0.95 0.99
Sample Probability of Coverage
Size 0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99
2 15.978 18.800 24.167 32.019 37.674 48.430 160.193 188.491 242.300
3 5.847 6.919 8.974 8.380 9.916 12.861 18.930 22.401 29.055
4 4.166 4.943 6.440 5.369 6.370 8.299 9.398 11.150 14.527
5 3.949 4.152 5.423 4.275 5.079 6.634 6.612 7.855 10.260
6 3.131 3.723 4.870 3.712 4.414 5.775 5.337 6.345 8.301
7 2.902 3.452 4.521 3.369 4.007 5.248 4.613 5.488 7.187
8 2.743 3.264 4.278 3.136 3.732 4.891 4.147 4.936 6.468
9 2.626 3.125 4.098 2.967 3.532 4.631 3.822 4.550 5.966
10 2.535 3.018 3.959 2.839 3.379 4.433 3.582 4.265 5.594
11 2.463 2.933 3.849 2.737 3.259 4.277 3.397 4.045 5.308
12 2.404 2.863 3.758 2.655 3.162 4.150 3.250 3.870 5.079
13 2.355 2.805 3.682 2.587 3.081 4.044 3.130 3.727 4.893
14 2.314 2.756 3.618 2.529 3.012 3.955 3.029 3.608 4.737
15 2.278 2.713 3.562 2.480 2.954 3.878 2.945 3.507 4.605
16 2.246 2.676 3.514 2.437 2.903 3.812 2.872 3.421 4.492
17 2.219 2.643 3.471 2.400 2.858 3.754 2.808 3.345 4.393
18 2.194 2.614 3.433 2.366 2.819 3.702 2.753 3.279 4.307
19 2.172 2.588 3.399 2.337 2.784 3.656 2.703 3.221 4.230
20 2.152 2.564 3.368 2.310 2.752 3.615 2.659 3.168 4.161
21 2.135 2.543 3.340 2.286 2.723 3.577 2.620 3.121 4.100
22 2.118 2.524 3.315 2.264 2.697 3.543 2.584 3.078 4.044
23 2.103 2.506 3.292 2.244 2.673 3.512 2.551 3.040 3.993
24 2.089 2.489 3.270 2.225 2.651 3.483 2.522 3.004 3.947
25 2.077 2.474 3.251 2.208 2.631 3.457 2.494 2.972 3.904
30 2.025 2.413 3.170 2.140 2.529 3.350 2.385 2.841 3.733
40 1.959 2.334 3.066 2.052 2.445 3.213 2.247 2.677 3.518
50 1.916 2.284 3.001 1.996 2.379 3.126 2.162 2.576 3.385
60 1.887 2.248 2.955 1.958 2.333 3.066 2.103 2.506 3.293
70 1.865 2.222 2.920 1.929 2.299 3.021 2.060 2.454 3.225
80 1.848 2.202 2.894 1.907 2.272 2.986 2.026 2.414 3.173
90 1.834 2.185 2.872 1.889 2.251 2.958 1.999 2.382 3.130
100 1.822 2.172 2.854 1.874 2.233 2.934 1.977 2.355 3.096

158
A-5. Factors for Tolerance Levels

Values of k for One-Sided Intervals


Confidence Level
0.90 0.95 0.99
Sample Probability of Coverage
Size 0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99
2 10.253 13.090 18.500 20.581 26.260 37.094 103.029 131.426 185.617
3 4.258 5.311 7.340 6.155 7.656 10.553 13.995 17.370 23.896
4 3.188 3.957 5.438 4.162 5.144 7.042 7.380 9.083 12.387
5 2.742 3.400 4.666 3.407 4.203 5.741 5.362 6.578 8.939
6 2.494 3.092 4.243 3.006 3.708 5.062 4.411 5.406 7.335
7 2.333 2.894 3.972 2.755 3.399 4.642 3.859 4.728 6.412
8 2.219 2.754 3.783 2.582 3.187 4.354 3.497 4.285 5.812
9 2.133 2.650 3.641 2.454 3.031 4.143 3.240 3.972 5.389
10 2.066 2.568 3.532 2.355 2.911 3.981 3.048 3.738 5.074
11 2.011 2.503 3.443 2.275 2.815 3.852 2.898 3.556 4.829
12 1.966 2.448 3.371 2.210 2.736 3.747 2.777 3.410 4.633
13 1.928 2.402 3.309 2.155 2.671 3.659 2.677 3.290 4.472
14 1.895 2.363 3.257 2.109 2.614 3.585 2.593 3.189 4.337
15 1.867 2.329 3.212 2.068 2.566 3.520 2.521 3.102 4.222
16 1.842 2.299 3.172 2.033 2.524 3.464 2.459 3.028 4.123
17 1.819 2.272 3.137 2.002 2.486 3.414 2.405 2.963 4.037
18 1.800 2.249 3.105 1.974 2.453 3.370 2.357 2.905 3.960
19 1.782 2.227 3.077 1.949 2.423 3.331 2.314 2.854 3.892
20 1.765 2.028 3.052 1.926 2.396 3.295 2.276 2.808 3.832
21 1.750 2.190 3.028 1.905 2.371 3.263 2.241 2.766 3.777
22 1.737 2.174 3.007 1.886 2.349 3.233 2.209 2.729 3.727
23 1.724 2.159 2.987 1.869 2.328 3.206 2.180 2.694 3.681
24 1.712 2.145 2.969 1.853 2.309 3.181 2.154 2.662 3.640
25 1.702 2.132 2.952 1.838 2.292 3.158 2.129 2.633 3.601
30 1.657 2.080 2.884 1.777 2.220 3.064 2.030 2.515 3.447
40 1.598 2.010 2.793 1.697 2.125 2.941 1.902 2.364 3.249
50 1.559 1.965 2.735 1.646 2.065 2.862 1.821 2.269 3.125
60 1.532 1.933 2.694 1.609 2.022 2.807 1.764 2.202 3.038
70 1.511 1.909 2.662 1.581 1.990 2.765 1.722 2.153 2.974
80 1.495 1.890 2.638 1.559 1.964 2.733 1.688 2.114 2.924
90 1.481 1.874 2.618 1.542 1.944 2.706 1.661 2.082 2.883
100 1.470 1.861 2.601 1.527 1.927 2.684 1.639 2.056 2.850

159
160

A. Statistical Tables
A-6. Statistical Intervals

parameter θ estimate θ̂ lower limit upper limit interval type standard error se Θ̂ conditions
 

θ̂ − zα/2 · se Θ̂ θ̂ + zα/2 · se Θ̂ two-sided


   
µ x
σ
θ̂ − zα · se Θ̂ one-sided √ σ known
 
n
θ̂ + zα · se Θ̂ one-sided
 

θ̂ − zα/2 · se Θ̂ θ̂ + zα/2 · se Θ̂ two-sided


   
s σ unknown, n ≥ 30
θ̂ − zα · se Θ̂ one-sided √
 
n
θ̂ + zα · se Θ̂ one-sided
 

θ̂ − tα/2,ν · se Θ̂ θ̂ + tα/2,ν · se Θ̂ two-sided


   
s
θ̂ − tα · se Θ̂ one-sided √ σ unknown, ν = n − 1
 
n
θ̂ + tα · se Θ̂ one-sided
 

θ̂ − zα/2 · se Θ̂ θ̂ + zα/2 · se Θ̂ two-sided


   
µ1 − µ2 x1 − x2 s
σ12 σ22 σ1 , σ2 known
θ̂ − zα · se Θ̂ one-sided +
 
n1 n2
θ̂ + zα · se Θ̂ one-sided
 

θ̂ − zα/2 · se Θ̂ θ̂ + zα/2 · se Θ̂ two-sided


   
σ1 , σ2 unknown,
s
s21 s2
θ̂ − zα · se Θ̂ one-sided + 2
 
n1 n2 n1 ≥ 30, n2 ≥ 30
θ̂ + zα · se Θ̂ one-sided
 

θ̂ − tα/2,ν · se Θ̂ θ̂ + tα/2,ν · se Θ̂ two-sided σ1 = σ2 but unknown,


   
s
1 1
sp + (n1 − 1)s21 + (n2 − 1)s22
θ̂ − tα · se Θ̂ one-sided s2p = ,
 
n1 n2
n1 + n2 − 2
θ̂ + tα · se Θ̂ one-sided ν = n1 + n2 − 2
 

θ̂ − tα/2,ν · se Θ̂ θ̂ + tα/2,ν · se Θ̂ two-sided σ1 6= σ2 and unknown,


   
s
s21 s2
θ̂ − tα · se Θ̂ one-sided + 2
 
s22 2
  2  
s1
+
 
n1 n2 
n1 n2

ν =  (s2 /n )2 (s2 /n )2 
 
1 1
n1 −1 + 2 2
n2 −1
θ̂ + tα · se Θ̂ one-sided
 

s
θ̂ − tα/2,ν · se Θ̂ θ̂ + tα/2,ν · se Θ̂ two-sided √d paired observations,
   
D d
n
θ̂ − tα · se Θ̂ one-sided di = yi − xi ,
 

θ̂ + tα · se Θ̂ one-sided ν =n−1
 

θ̂ − zα/2 · se Θ̂ θ̂ + zα/2 · se Θ̂ two-sided


   
p p̂ s
θ̂ − zα · se Θ̂
 
one-sided p̂(1 − p̂) p̂ – sample proportion of success
n
θ̂ + zα · se Θ̂ one-sided
 

θ̂ − zα/2 · se Θ̂ θ̂ + zα/2 · se Θ̂ two-sided


   
p1 − p2 p̂1 − p̂2 s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
θ̂ − zα · se Θ̂ one-sided + p̂1 , p̂2 – sample proportions
 
n1 n2
θ̂ + zα · se Θ̂ one-sided
 

(n − 1)s2 (n − 1)s2
σ2 s2 two-sided ν =n−1
χ2α/2,ν χ21−α/2,ν
(n − 1)s2
one-sided
χ2α,ν
(n − 1)s2
one-sided
χ21−α,ν
1 s21 s2
! !
fα/2 (ν2 , ν1 ) 12 two-sided ν1 = n1 − 1, ν2 = n2 − 1

A-6. Statistical Intervals


σ12 s21 fα/2 (ν1 , ν2 ) s22 s2
σ22 s22 1 s21
!
one-sided
fα (ν1 , ν2 ) s22
s2
!
fα (ν2 , ν1 ) 12 one-sided
s2
161
B. Statistical Calculations with Casio
fx-570/991ES PLUS

163
Math

3=

Math

Input an initial value for X (Here, input 1): 1=

Math

= 7 ==

Math

= 13 ==

Statistical Calculations (STAT)


To start a statistical calculation, perform the key operation N3(STAT)
to enter the STAT Mode and then use the screen that appears to select the
type of calculation you want to perform.
To select this type of statistical calculation:
(Regression formula shown in parentheses) Press this key:
Single-variable (X) 1(1-VAR)
Paired-variable (X, Y), linear regression ( y = A + Bx) 2(A+BX)
Paired-variable (X, Y), quadratic regression
3( _+CX2)
( y = A + Bx + Cx2)
Paired-variable (X, Y), logarithmic regression
4(ln X)
( y = A + Blnx)
Paired-variable (X, Y), e exponential regression
( y = AeBx) 5(e^X)
Paired-variable (X, Y), ab exponential regression
6(A•B^X)
( y = ABx)
Paired-variable (X, Y), power regression ( y = AxB) 7(A•X^B)
Paired-variable (X, Y), inverse regression
8(1/X)
( y = A + B/x)
Pressing any of the above keys (1 to 8) displays the Stat Editor.
Note: When you want to change the calculation type after entering the
STAT Mode, perform the key operation 11(STAT)1(Type) to display
the calculation type selection screen.

Inputting Data
Use the Stat Editor to input data. Perform the following key operation to
display the Stat Editor: 11(STAT)2(Data).
The Stat Editor provides 80 rows for data input when there is an X column
only, 40 rows when there are X and FREQ columns or X and Y columns, or
26 rows when there are X, Y, and FREQ columns.
E-22
Note: Use the FREQ (frequency) column to input the quantity (frequency) of
identical data items. Display of the FREQ column can be turned on (displayed)
or off (not displayed) using the Stat Format setting on the setup menu.

1 To select linear regression and input the following data:


(170, 66), (173, 68), (179, 75)
STAT

N3(STAT)2(A+BX)

STAT

170 = 173 = 179 =ce

STAT

66 = 68 = 75 =

Important: • All data currently input in the Stat Editor is deleted whenever
you exit the STAT Mode, switch between the single-variable and a paired-
variable statistical calculation type, or change the Stat Format setting on
the setup menu. • The following operations are not supported by the Stat
Editor: m, 1m(M–), 1t(STO). Pol, Rec, and multi-statements
also cannot be input with the Stat Editor.
To change the data in a cell: In the Stat Editor, move the cursor to the cell
that contains the data you want to change, input the new data, and then
press =.
To delete a line: In the Stat Editor, move the cursor to the line that you want
to delete and then press Y.
To insert a line: In the Stat Editor, move the cursor to the location where
you want to insert the line and then perform the following key operation:
11(STAT)3(Edit)1(Ins).
To delete all Stat Editor contents: In the Stat Editor, perform the following
key operation: 11(STAT)3(Edit)2(Del-A).

Obtaining Statistical Values from Input Data


To obtain statistical values, press A while in the Stat Editor and then
recall the statistical variable (σx, Σx2, etc.) you want. Supported statistical
variables and the keys you should press to recall them are shown below.
For single-variable statistical calculations, the variables marked with an
asterisk (*) are available.
Sum: Σx2*, Σx*, Σy2, Σy, Σxy, Σx3, Σx2y, Σx4
11(STAT) 3(Sum) 1 to 8
Number of Items: n*, Mean: o*, p, Population Standard Deviation: σx*,
σy, Sample Standard Deviation: sx*, sy
11(STAT) 4(Var) 1 to 7
Regression Coefficients: A, B, Correlation Coefficient: r, Estimated
Values: m, n
11(STAT) 5(Reg) 1 to 5
Regression Coefficients for Quadratic Regression: A, B, C, Estimated
Values: m1, m2, n
11(STAT) 5(Reg) 1 to 6
E-23
• See the table at the beginning of this section of the manual for the regression
formulas.
• m, m1, m2 and n are not variables. They are commands of the type that take
an argument immediately before them. See “Calculating Estimated Values”
for more information.
Minimum Value: minX*, minY, Maximum Value: maxX*, maxY
11(STAT) 6(MinMax) 1 to 4
Note: While single-variable statistical calculation is selected, you can input
the functions and commands for performing normal distribution calculation
from the menu that appears when you perform the following key operation:
11(STAT) 5 (Distr). See “Performing Normal Distribution Calculations”
for details.

2 To input the single-variable data x = {1, 2, 2, 3, 3, 3, 4, 4, 5}, using


the FREQ column to specify the number of repeats for each items
({xn; freqn} = {1;1, 2;2, 3;3, 4;2, 5;1}), and calculate the mean and
population standard deviation.
1N(SETUP)c4(STAT)1(ON)
N3(STAT)1(1-VAR) STAT

1 = 2 = 3 = 4 = 5 =ce
1=2=3=2=

A11(STAT)4(Var)2(o)=

A11(STAT)4(Var)3(σx)=
Results: Mean: 3 Population Standard Deviation: 1.154700538

3 To calculate the linear regression and logarithmic regression


correlation coefficients for the following paired-variable data and
determine the regression formula for the strongest correlation: (x, y)
= (20, 3150), (110, 7310), (200, 8800), (290, 9310). Specify Fix 3
(three decimal places) for results.
1N(SETUP)c4(STAT)2(OFF)
1N(SETUP)6(Fix)3
N3(STAT)2(A+BX) STAT FIX

20 = 110 = 200 = 290 =ce


3150 = 7310 =8800 = 9310=

A11(STAT)5(Reg)3(r)=
A11(STAT)1(Type)4(In X)
A11(STAT)5(Reg)3(r)=

A11(STAT)5(Reg)1(A)=

A11(STAT)5(Reg)2(B)=
Results: Linear Regression Correlation Coefficient: 0.923
Logarithmic Regression Correlation Coefficient: 0.998
Logarithmic Regression Formula: y = –3857.984 + 2357.532lnx

E-24
Calculating Estimated Values
Based on the regression formula obtained by paired-variable statistical
calculation, the estimated value of y can be calculated for a given x-value.
The corresponding x-value (two values, x1 and x2, in the case of quadratic
regression) also can be calculated for a value of y in the regression
formula.

4 To determine the estimate value for y when x = 160 in the


regression formula produced by logarithmic regression of the data
in 3 . Specify Fix 3 for the result. (Perform the following operation
after completing the operations in 3 .)
A 160 11(STAT)5(Reg)5(n)=
Result: 8106.898
Important: Regression coefficient, correlation coefficient, and estimated
value calculations can take considerable time when there are a large number
of data items.

Performing Normal Distribution Calculations


While single-variable statistical calculation is selected, you can perform
normal distribution calculation using the functions shown below from
the menu that appears when you perform the following key operation:
11(STAT)5(Distr).
P, Q, R: These functions take the argument t and determine a probability of
standard normal distribution as illustrated below.
P (t) Q (t) R (t)

0 t 0 t 0 t

't: This function is preceded by the argument X, and determines the


normalized variate .

5 For the single variable data {xn ; freqn} = {0;1, 1;2, 2;1, 3;2, 4;2, 5;2,
6;3, 7;4, 9;2, 10;1}, to determine the normalized variate ('t) when x
= 3, and P(t) at that point up to three decimal places (Fix 3).
1N(SETUP)c4(STAT)1(ON)
1N(SETUP)6(Fix)3N3(STAT)1(1-VAR)
0=1=2=3=4=5=6=7=9= STAT FIX

10=ce1=2=1=2=2=2=3=
4=2=1=

STAT FIX

A 3 11(STAT)5(Distr)4('t)=

E-25
C. Answers to Exercises
Chapter 1 2-9. (a) 0.9 (b) 0 (c) 0 (d) 0 (e) 0.1

1-6. (a) 56 (b) 92 (c) 168 (d) 148 (e) 36 2-10. (a) 0.7 (b) 0.3 (c) 0.7 (d) 0.55

1-7. 4,096 2-11. (a) 0.58 (b) 0.96

1-8. 3,628,800 2-12. (a) 0.8 (b) 0.45 (c) 0.55

1-9. 14,400 2-13. 5/9

1-10. (a) 416,965,528 (b) 113,588,800 (c) 1,033,752 2-14. (a) 0.35 (b) 0.875 (c) 0.55
2-15. 0.625
1-11. (a) 479,001,600 (b) 95,040 (c) 3,326,400
2-16. (a) 1/4 (b) 1/3 (c) 1/4
1-12. (a) 1000 (b) 720 (c) 160
2-17. (a) 0.72 (b) 0.09936
1-13. 41,947,059
2-18. (a) 0.2376 (b) 0.078
1-14. (a) 21 (b) 10
2-19. (a) 0.9 (b) 0.7
1-15. (a) 40,320 (b) 384 (c) 576
2-20. (a) no (b) 11/15
1-16. (a) 180 (b) 75 (c) 105
2-21. (a) 1/1024 (b) 1/1024 (c) 63/256
1-17. (a) 720 (b) 144 (c) 480
2-22. 0.929258
1-18. 20
2-23. (a) 0.2 (b) 0.2 (c) 0.1 (d) no
1-19. 512
2-25. (a) 0.12 (b) 2/3
1-20. 2880
2-26. (a) 16/89 (b) no printer problem
1-21. 1260
2-27. (a) 0.27 (b) 1/9
1-22. (a) 1024 (b) 243 2-28. (a) 0.2016 (b) 0.0923
1-23. 360 2-29. 6/7
1-24. 56 Chapter 3
1-25. 24 3-6. (a) 1/3 (b) 1/2 (c) 0 (d) 2/3 (e) 1/2
1-26. 362,880 3-7. (a) 1 (b) 0.8 (c) 0.7 (d) 0.7
1-27. 1,244,117,160 3-8. (a) 9/25 (b) 4/25 (c) 12/25 (d) 1

1-28. (b) 11 (c) 21 3-9. f (0) = 0.02, f (1) = 0.068, f (2) = 0.931

Chapter 2 3-10. f (0) = 0.02, f (1) = 0.26, f (2) = 0.72


4

2-1. (b) 1/4 (c) 3/4 3-11. f (x) = x
(0.0001)x (0.9999)4−x

2-2. (a) 0.3 (b) 0.77 (c) 0.7 (d) 0.22 (e) 0.85 3-12. 1/30
(f) 0.55
3-14.
0, x<0

2-3. 0.85 
0.041, 0 ≤ x < 1



2-4. (a) 1/36 (b) 5/36

0.78, 1≤x<2

F (x) =
0.94, 2≤x<3
2-5. (a) 1/3 (b) 5/15 

0.99, 3≤x<4



2-6. (a) 0.55 (b) 0.87 1, 4≤x

2-7. 7/22 3-15. (a) 1/4 (b) 1/2 (c) 1/4 (d) 2/3
2-8. (a) 0.7 (b) 0.4 (c) 0.1 (d) 0.2 (e) 0.6 (f) 0.8 3-16. µ = −0.5, σ 2 = 1.55

169
C. Answers to Exercises

3-17. µ = 2.8, σ 2 = 8.64 4-7. 1


x2 0<x<1
3-18. µ = 0.88, σ 2 = 0.8456 F (x) = 2
− 12 x2 + 2x − 1 1≤x<2
3-19. (a) 4/9 (b) 21/25 (c) 10
4-8.
3-20. yes 
0.00125x2 − 0.075x + 1.125 30 < x < 50
F (x) =
3-21. 12 −0.00125x2 + 0.175x − 17.375 50 < x < 70

3-22. µ = 2, σ 2 = 2
3
4-9. 100
3-23. µ = 0.17, σ 2 = 0.0002 √
4-10. µ = 1, σ = 6/6
3-24. (a) 0.0005 (b) 0.959925 (c) 1.24554 × 10−9 √
(d) 0.039490 4-11. µ = 50, σ = 10 2/3

3-25. 4-12. µ = 109.393, σ 2 = 33.186


0, x<0

4-13. (a) F (x) = 100x − 20.50 (b) 1/4 (c) 0.214

27
, 0≤x<1


 64
(d) µ = 0.21, σ 2 = 1
F (x) = 27
32
, 1≤x<2 120000
63
, 2≤x<3


 64 4-16. (a) 0.906582 (b) 0.998650 (c) 0.073529
1, 3≤x

(d) 0.984222 (e) 0.950299

3-26. 2-engine plane 4-17. (a) 0.682690 (b) 0.9545 (c) 0.9973 (d) 0.99865
(e) 0.433193
3-27. (a) 0.996141 (b) 0.988568
4-18. (a) 1.285 (b) 0 (c) 1.285 (d) -1.285 (e) 1.325
3-28. (a) 0,018316 (b) 0.238103 (c) 0.195369
(d) 0.029770 4-19. (a) 1.96 (b) 2.575 (c) 0.95 (d) 3.00
3-29. (a) 0.100819 (b) 0.423190 (c) 0.800852 4-20. (a) 0.933193 (b) 0.691462 (c) 0.9545
3-30. 0.000045 4-21. (a) 10 (1) 3.92 (c) 8.95 (d) 3.92 (e) 5.15
3-31. ln 20 ≈ 3 4-22. (a) 0.99379 (b) 0.135905
3-32. (a) 448/969 (b) 1/4845 (c) 956/969 4-23. (a) 0.99379 (b) 0.99379
(d) µ = 4/5, σ 2 = 256/475
  4-24. (a) < 0.000033 (b) 0.0241 (c) 12.14 < x < 12.66
24 12
x 3−x
3-33. (a) f (x) = 36
 (b) µ = 2, σ 2 = 22/35 4-25. (a) σ = 40.8 (b) 131.9 (c) 186.5 (d) 211.4
3 (e) 0.135905 (f) 0.02275 (g) 0.158655
(c) 1279/1785 (h) 200 < x < 240.8 (i) x < 118.4
3-34. (a) 0.706862 (b) 0.146468 (c) 0.070950 4-26. (a) 0.02275 (b) 0.32 (c) 11.5
3-35. (a) 20 (b) 0.045937 (c) 0.047918 4-27. a = 23.5. No.
3-36. (a) 0.13 (b) 0.098397 (c) 7.7 4-28. (a) 0.3679 (b) 0.2858 (c) 0.1054
3-37. (a) 1/2 (b) 1/16 (c) 1/256 (d) 3/4 (e) 1/2 4-29. (a) 0 (b) 0.0183 (c) 0.8647 (d) 0.1170
3-38. (a) 2/5 (b) 54/625/q c98/125 4-30. (a) 0.3679 (b) 0.1353 (c) 0.9502 (d) 0.3935
(e) 0.3935
Chapter 4.
4-31. (a) 0.1353 (b) 0.04029 (c) 0.1353
4-1. (a) 1/9 (b) 1000/9801
4-32. (a) 0.3935 (b) 0.0067
4-2. (a) 17/25 (b) 3/8
4-33. (a) 0.1353 (b) (0.4866 (c) 0.2031 (d) 34.54
4-3. (a) 9/16 (b) 1.1056
4-34. (a) 0.0025 (b) 0.6321 (c) 23.03 (d) 23.03
4-4. (a) 1/8 (b) 3/4 (c) 32.8284 (e) 6.93

4-5. (a) 256 (b) F (x) = 3 2


64
x − 1
256
x3 (5) /32 4-35. (a) 0.6321 (b) 6

4-6. 1 − 10000
(x+100)2
4-36. (a) 5 (b) 0.1353 (c) 11.51

170
D. Proofs

Mean and Variance of the Binomial Random Variable


The probability mass function of the binomial random variable X is
 
n x
f (x) = p (1 − p)n−x
x

According to Property (2) of a discrete random variable,


n  
X n
px (1 − p)n−x = 1 (D-1)
x=0
x

The mean µ = E[X] is


n   n  
X X n x X n x
µ = E[X] = xf (x) = x p (1 − p)n−x = x p (1 − p)n−x
x=0
x x=1
x
all x
n
X n!
= x· px (1 − p)n−x
x=1
x!(n − x)!
n
X n(n − 1)!
= x· px (1 − p)n−x
x=1
x(x − 1)!(n − x)!
n
X (n − 1)!
=n px (1 − p)n−x
x=1
(x − 1)!(n − x)!

We introduce the substitution y = x − 1 and m = n − 1. For the limits of the summation, when x = 1,
y = 0 and when x = n, y = n − 1 = m. Also, n − x = (m + 1) − (y + 1) = m − y. Thus,
n
X (n − 1)!
µ = E[X] = n px (1 − p)n−x
x=1
(x − 1)!(n − x)!
m
X m!
=n py+1 (1 − p)m−y
y=0
y!(m − y)!
m  
X m y
= np p (1 − p)m−y = np
y=0
y
| {z }
Equation (D-1)

The variance can be derived in a similar fashion. We use the alternate form given in Equation (3.4),
and noting that x2 = x(x − 1) + x.
( n   )
X X n
σ 2 = V[X] = x2 f (x) − µ2 = [x(x − 1) + x] px (1 − p)n−x − µ2
x=0
x
all x
n   n  
X n x X n x
= x(x − 1) p (1 − p)n−x + x p (1 − p)n−x −µ2
x=0
x x=0
x
| {z }
µ
n
X n(n − 1)(n − 2)!
= x(x − 1) px (1 − p)n−x + µ − µ2
x=2
x(x − 1)(x − 2)!(n − x)!

171
D. Proofs

n
X (n − 2)!
= n(n − 1) px (1 − p)n−x + µ − µ2
x=2
(x − 2)!(n − x)!

The substitution y = x − 2 and m = n − 2 simplifies the summation.


n
X (n − 2)!
σ 2 = V[X] = n(n − 1) px (1 − p)n−x + µ − µ2
x=2
(x − 2)!(n − x)!
m  
X m y
= n(n − 1)p2
p (1 − p)m−y + µ − µ2
y=0
y
= n2 p2 − np2 + np − n2 p2 = np − np2 = np(1 − p)

Mean and Variance of the Poisson Random Variable


The probability mass function of the Poisson random variable X is

e−λt (λt)x
f (x) = , x = 0, 1, 2, . . .
x!
and it satisfies

X e−λt (λt)x
=1 (D-2)
x=0
x!

The mean and variance are derived in a similar fashion as the binomial random variable.
∞ ∞ ∞
X X e−λt (λt)x X e−λt (λt)x
µ= xf (x) = x =
x=0 x=1
x! x=1
(x − 1)!
∞ ∞ ∞
X e−λt (λt)y+1 X e−λt (λt)(λt)y X e−λt (λt)y
= = = λt = λt
y=0
y! y=0
y! y=0
y!
| {z }
Equation (D-2)
∞ ∞ ∞
" #
X X X
σ =
2
x f (x) − µ =
2 2
x(x − 1)f (x) + xf (x) − µ2
x=0 x=0 x=0
∞ ∞
X e −λt
(λt) x X e−λt (λt)x
= x(x − 1) + µ − µ2 = + µ − µ2
x=2
x! x=2
(x − 2)!

X e−λt (λt)y
= (λt)2 + µ − µ2 = (λt)2 + (λt) − (λt)2 = λt
y=0
y!

S 2 is an Unbiased Estimator of σ 2
Let a population be described by a random variable X with density function f (x), mean E[X] = µ and
variance V[X] = σ 2 . Assume that random sample of size n is drawn from this population. For each
random variable Xi in the random sample,

E[Xi ] = µXi = µ (D-3)


V[Xi ] = 2
σX i
=σ 2
(D-4)

We use the definition of S 2 in Equation (6.2) to compute E S 2 .


 

" n
#
1 X
E S =E (Xi − X)2 (D-5)
 2
n − 1 i=1

172
We begin by expanding the summation expression.
n
X n
X n n n
 X X X
(Xi − X)2 = Xi2 − 2XiX + X 2 = Xi2 − 2XiX + X2
i=1 i=1 i=1 i=1 i=1
n
X n
X n
X
= Xi2 − 2X Xi + X 2 1
i=1 i=1 i=1
Xn n
X
= Xi2 − 2X nX + nX 2 = Xi2 − X 2

i=1 i=1

Thus, we can rewrite Equation (D-5) as


" n #
1 X
E S2 = E X 2 − nX 2
 
n − 1 i=1 i

The right-hand side expands by applying Equation (3.10), yielding


" n #
1 X n
E S = E E X2
 2 2
 
Xi −
n − 1 i=1 n−1

Now, by Equation (5.13) with c1 = c2 = · · · = cn = 1,


" n # n
X   X
E Xi = E X12 + · · · + Xn2 = E X12 + · · · + E Xn2 =
2
E Xi2 (D-6)
     
i=1 i=1

Thus,
n
1 X  2 n
E S2 = E Xi − E X2 (D-7)
   
n − 1 i=1 n−1
Using the definition of the variance of the random variable Xi ,
V[Xi ] = E (Xi − µXi )2 = E Xi2 − E[2Xi µXi ] + E µ2Xi = E Xi2 − 2µ2Xi + µ2Xi = E Xi2 − µ2Xi
         

The equation can be rewritten for E Xi2 .


 

E Xi2 = V[Xi ] + µ2Xi (D-8)


 

If we replace Xi by X in Equation (D-8), we obtain


E X 2 = V X + µX
2
   

which can be rewritten as


  σ2
E X2 = + µ2 (D-9)
n
We use Equations (D-3), (D-4), (D-8) and (D-9) to simplify (D-7).
n
1 X  2 n
E S2 = E Xi − E X2
   
n − 1 i=1 n−1
n
1 X
 2 
n σ
= V[Xi ] + µ2Xi − + µ2

n − 1 i=1 n−1 n
n
1 X 2
 2 
n σ
= (σ + µ ) −
2
+µ 2
n − 1 i=1 n−1 n
1 1 n
= (nσ 2 + nµ2 ) − σ2 − µ2
n−1 n−1 n−1
n n 1 n
= σ2 + µ2 − σ2 − µ2
n−1 n−1 n−1 n−1
1
 
n
= − σ2 = σ2
n−1 n−1

173
D. Proofs

Shortest Interval With Degree of Confidence (1 − α)


Let the unknowns u and v be the endpoints of the interval satisfying

P[u < Z < v] = 1 − α

for a given α where 0 < α < 1. It is clear that v is a function of u (or the other way around) given the
condition above.
Let L = v − u be the length to the interval. Minimizing L requires that dL du = 0 and solving for the
unknown u. In this case,
dL dv
= −1
du du
Hence,
dv
=1 (D-10)
du
Now,
1 − α = P[u < Z < v] = F (v) − F (u)

where
Z z
F (z) = f (t) dt
−∞

and f (z) is the standard normal density function. Differentiating with respect to u,

d d dv
(1 − α) = (F (v) − F (u)) = f (v) − f (u)
du du du

Since the left-hand side is zero, it follows that

dv
f (v) = f (u) (D-11)
du

Combining Equations (D-10) and (D-11),

f (v) = f (u)

or
1 1 2 1 1 2
√ e− 2 v = √ e− 2 u
2π 2π
which leads to the equation
v 2 = u2

whose solutions are


u = v, and u = −v

When u = v,
P[u < Z < v] = P[v < Z < v] = 0

indicating that it is not a solution. On the other hand, when u = −v, the confidence limits have the
same numerical values but have opposite signs. We take u < 0 and v > 0.
Because f (z) is symmetric about z = 0, the area α is split equally at the “tails”, each with an area
α/2. The value of v is denoted by zα/2 and write

P[−zα/2 < Z < zα/2 ] = 1 − α

174
Probability Calculation with Student t-distribution
The density function of the T random variable with ν = n − 1 degrees of freedom is
−(ν+1)/2
Γ ν+1
 
x2
f (x) = 2
√ · + 1
Γ ν2

νπ ν

Let √
Γ ν2 νπ
k=
Γ ν+1

2
where the value of k can be obtained using a “calculator expression,” so that
−(ν+1)/2
1 0
x2
Z 
P T(ν) < t = 0.5 − +1 t<0 (D-12)
 
dx,
k t ν

can be determined with a scientific calculator.


If ν = 1, 3, 5, . . . then ν/2 is a half-integer. Repeated use of Property (ii) of the gamma function will
reduce ν/2 to 1/2 (see Example 6.9(b) on page 99).
√ √
Γ ν2 νπ = ν2 − 1 ν2 − 2 · · · 23 12 Γ 12
   
νπ
(ν − 2)(ν − 4) · · · (3)(1) √
= π ν
(2)(2) · · · (2)(2)

On the other hand (ν + 1)/2 is positive integer. By Property (iii) of the gamma function,

Γ ν+1 = ν+12 −1 !=
ν−1
!
  
2 2

Therefore,
(ν − 2)(ν − 4) · · · (3)(1)π √
k= ν
(2)(2) · · · (2)(2) ν−1 !

2

When ν = 2, 4, 6, . . ., ν/2 is a positive integer and


√ √
Γ ν2 νπ = ν2 − 1 ! νπ

On the other hand, (ν + 1)/2 is half integer,

Γ ν+1 = ν+12 −1 − 2 · · · (3)(1)Γ


ν+1 1
   
2 2 2
(ν − 1)(ν − 3) · · · (3)(1) √
= π
(2)(2) · · · (2)(2)

and
− 1 ! (2)(2) · · · (2)(2) √
ν
 
k= 2
ν
(ν − 1)(ν − 3) · · · (3)(1)
Both expressions for k can be expressed as
π

Z 2
k=2 ν (cos x)ν−1 dx (D-13)
0

The integral is called the Wallis integral formula.

Probability Calculation with Chi-squared distribution


The density function of the chi-squared random variable χ2 is
1 √
f (x) = √ xν−2 e−x
Γ ν
2 2ν

Let √
k=Γ ν
2 2ν

175
D. Proofs

where a “calculator expression” for k is desired such that


h i 1 Z x2 √
P χ2(ν) < x2 = xν−2 e−x dx (D-14)
k 0

can be determined using a scientific calculator. If ν = 2, 4, 6, . . ., ν/2 is a positive integer. Hence, by


Property (iii) of the gamma function,
Γ ν2 = ν2 − 1 !
 

On the other hand, if ν = 1, 3, 5, . . ., then ν/2 is half integer.


√
Γ ν2 = ν2 − 1 ν2 − 2 · · · 23 1
   
2 π

that is equivalent to √
(ν − 1)! π
2ν−1 ν−1 !

2
Therefore, an expression for k is

(ν−1)!
ν = 1, 3, 5, . . .
p π
 ν−1
 2ν−2 ,
k= 2 !
√ (D-15)
2 −1 ! 2 , ν = 2, 4, 6, . . .
ν

 ν

176
Bibliography
[1] A. Hayter, Probability and Statistics for Engineers and Scientists. Brooks/Cole, Cengage
Learning, 4th ed., 2012.

[2] D. C. Montgomery and G. C. Runger, Applied Statistics and Probability for Engineers.
John Wiley and Sons, Inc., 6th ed., 2014.

[3] A. M. Mood, F. Graybill, and D. Boes, Introduction to the Theory of Statistics. McGraw-
Hill Inc., 3rd ed., 1974.

[4] R. Peck, D. Olsen, and J. L. Devore, An Introduction to Statistics and Data Analysis.
Cengage Learning, 5th ed., 2016.

[5] S. M. Ross, Introduction to Probability and Statistics for Engineers and Scientists. Elsevier
Academic Press, 3rd ed., 2004.

[6] T. T. Soong, Fundamentals of Probability and Statistics for Engineers. John Wiley and
Sons Ltd, 2004.

[7] R. E. Walpole, R. H. Myers, S. L. Myers, and K. Ye, Probability and Statistics for Engineers
and Scientists. Pearson Education, Inc., 9th ed., 2012.

177
Index

Bayes’ theorem, 24, 25 hypothesis, 132


Bernoulli process, 39 alternative, 132
bias, 90 null, 132
hypothesis testing, 131
Central Limit Theorem, 95 one-sided, 136
Chebyshev’s theorem, 37 two-sided, 136
combination, 10
confidence interval, 106 joint distribution, 69
large-sample, 111 bivariate, 69
confidence level, 106 conditional mean, 75
continuous distribution conditional probability, 73
chi-squared, 100, 114 conditional variance, 75
exponential, 63 independent random variables, 75
normal, 56 linear combination, 79
standard normal, 59 marginal distribution, 72
Student’s t, 101, 109 probability density function, 71
uniform, 55 probability mass function, 70
continuous random variable, 30, 51
cumulative distribution function, 53 level of significance, 134
mean of, 54
probability density function, 51 margin of error, 108
standard deviation, 54 mean squared error, 93
variance of, 54 memoryless property, 42, 64
correlation, 77 minimum variance unbiased estimator, 91
covariance, 76
p-value, 135
degree of confidence, 106 parameter, 87
DeMorgan’s laws, 6 partition, 9
discrete distribution, 30 permutation, 6
binomial, 40 circular, 8
geometric, 45 of a subset, 7
hypergeometric, 43 of similar elements, 9
negative binomial, 44 point estimate, 89
Poisson, 42 point estimator, 90
uniform, 39 Poisson process, 42
discrete random variable, 30 pooled variance, 123
cumulative distribution function, 33 prediction error, 117
mean of, 35 probability, 13
probability mass function, 31 addition rules, 16, 17
standard deviation, 37 conditional, 18
variance of, 36 histogram, 32
product rules, 19–22
events, 5 total, 23, 24
complement, 5
exhaustive, 16 random sample, 88
independent, 21, 22 random variable, 29
intersection, 5 binomial, 40
mutually exclusive, 6 chi-squared, 100
union, 5 continuous uniform, 55
discrete uniform, 39
gamma function, 99 exponential, 63

179
INDEX

function of, 38 continuous, 4


geometric, 45 discrete, 4
hypergeometric, 43 sample variance, 88, 99
linear function, 39 sampling distribution, 94
negative binomial, 44 standard error, 93
normal, 56 statistic, 88
Poisson, 42 step function, 33
standard normal, 59
Student’s t, 101 test statistic, 138, 141, 144, 146
relative frequency, 15, 35 tolerance interval, 118
reproductive property, 81 coverage, 118
tree diagram, 4, 23
sample mean, 88, 94
sample space, 3 unbiased estimator, 90

180

You might also like