Numerical & Statistical Anylysis For Cheme's Part2

CBE 301
Application of Numerical
and Statistical Techniques
in Chemical and Biomolecular Engineering
Lecture Notes
Part 2

c Manolis Doxastakis
Chemical and Biomolecular Engineering
University of Tennessee, Knoxville, 2021
How To Use These Notes:
• As the cover page suggests, this text is a set of Lecture Notes, NOT a textbook!
Suggested titles of textbooks will be provided by the instructor and you can find
additional resources online and in the library.
• A large number of sources were used, in addition to textbooks as well as notes written
by Prof. M. Nikolaou (Univ. of Houston).
• Certain topics covered in detail in textbooks are presented herein rather telegraphically
while others are elaborated on, particularly when they refer to material not often
covered in textbooks.
• In many places, throughout the notes some space has been intentionally left blank, for
the student to understand a certain topic by being forced to fill in the missing material.
That is frequently done during lecture time.
• In other places assignments are given for homework not to hand in (ÒP Hwnthi).
• Additional practice problems from past exams will be provided
• The examples have been carefully selected to correspond to a variety of problems of
interest to the evolving nature of chemical and biomolecular engineering. While the
emphasis is on numerical methods, the physical picture is also important.
• There are several basic software tools used throughout: MATLAB, Mathematica,
and Excel. The student should be familiar with computational tools along with the
mathematical and programming principles of computation. Each tool does certain
tasks particularly well and may be adequate for others; therefore you are most efficient
when you learn how to use multiple tools.
• The code included with some examples is intentionally kept simple, to illustrate con-
cepts. Professional code is a lot more complicated, although the numerical recipe
involved is usually not very different. The emphasis herein is to critically demonstrate
applications of the discussed mathematical methods rather than learning the technical
use of the software
• The nature of the material requires active participation of the student. Therefore,
study and perform the numerical examples on your own and expand problems by
altering parameters and methods.
Notation:
◦ Uppercase, boldface: Matrices. e.g. M
◦ Lowercase, boldface: vectors. e.g. v
◦ Lowercase, italics: scalars. e.g. f
Contents
1 INTRODUCTION: STATISTICS AND PROBABILITY CONCEPTS 1
2 PROBABILITY 5
2.1 Sample spaces and events 6
2.2 Combination rules 7
2.3 Probability laws 10
2.4 Conditional probability and independent events 12
3 RANDOM VARIABLES AND DISTRIBUTIONS 15
3.1 Discrete distributions 16
3.1.1 Calculations with discrete distribution functions . . . . . . . . . . . . 18
3.1.2 Expected value and other parameters of a discrete distribution . . . . 18
3.1.3 The geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.4 The binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.5 The Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Continuous distributions 30
3.2.1 Expected value and other parameters of a continuous distribution . . 32
3.2.2 Graphical representation of mean, median, and mode of distribution . 34
3.2.3 The normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Joint distributions 40
3.3.1 Discrete 2-D random variables and distributions . . . . . . . . . . . . 40
3.3.2 Continuous 2-D random variables and distributions . . . . . . . . . . 41
4 STATISTICAL INFERENCE 48
4.1 Descriptive statistics 48
4.2 Graphical methods for data description 51
4.2.1 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.2 Box-and-whisker plots (Box-plots) . . . . . . . . . . . . . . . . . . . . 53
4.2.3 Pie charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Analytical methods for data representation 54
4.3.1 Sample Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 The Central Limit Theorem 55
4.5 Point estimation 64
4.5.1 Estimators for µ, p and σ 2 . . . . . . . . . . . . . . . . . . . . . . . . 64
4.6 Interval estimation 68
4.6.1 Estimate confidence interval for µ . . . . . . . . . . . . . . . . . . . . 70
4.6.2 Estimate confidence interval for p . . . . . . . . . . . . . . . . . . . . 76
4.6.3 Estimate confidence interval for σ, σ 2 . . . . . . . . . . . . . . . . . . 76
4.7 Hypothesis testing 82
4.7.1 Possible Errors In Testing A Statistical Hypothesis . . . . . . . . . . 84
4.7.2 Significance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.7.3 Hypothesis and significance tests on µ . . . . . . . . . . . . . . . . . 86
4.7.4 Hypothesis and significance tests on proportion p . . . . . . . . . . . 89
4.7.5 Hypothesis and significance tests on σ 2 , σ . . . . . . . . . . . . . . . 89
4.8 Comparing population parameters 90
4.8.1 Point estimation of µ1 − µ2 . . . . . . . . . . . . . . . . . . . . . . . 90
4.8.2 Simplified confidence interval on µ1 − µ2 (or p1 − p2 ) for large samples 91
4.8.3 Confidence interval on µ1 − µ2 . . . . . . . . . . . . . . . . . . . . . . 92
4.8.4 Hypothesis and significance tests on σ12 − σ22 . . . . . . . . . . . . . . 93
4.8.5 Paired observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5 REGRESSION AND CORRELATION 96
5.1 Linear regression = linear least squares 98
5.1.1 Properties of least-squares estimators . . . . . . . . . . . . . . . . . . 102
5.1.2 Confidence intervals and hypothesis testing in linear least squares . . 104
5.1.3 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2 Multiple linear regression 110
5.3 General least squares 111
5.4 Polynomial least squares 115
5.5 Multiple linear least squares 117
5.6 Confidence intervals in multiple linear regression 119
6 STATISTICAL QUALITY CONTROL 122
6.1 Background 122
6.2 Quality, hypothesis testing, and Shewhart charts 122
6.3 Process capability and six-sigma 125
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
1 INTRODUCTION: STATISTICS AND PROBABILITY CONCEPTS

Example 1 Distribution of properties in a population∗
Figure 1: Graphical representation methods for Particle Size Distribution; Histogram and Cumu-
lative Arithmetic Curve
(a) (b)
Ideal
Actual
Channeling
Channeling
Dead Zone
(c) (d)
By Passing
By Passing
Dead Zones
Dead Zones
(e) (f)
Figure 2: Observed residence time distributions (RTD). (a) RTD for near plug-flow reactor; (b)
RTD for near perfectly mixed CSTR, (c) Packed-bed reactor, (d) RTD for packed-bed reactor in
(c), (e) CSTR with short-circuiting flow (bypass) and dead zone, (f) RTD for CSTR in (e).
∗
H. Brittain, Pharmaceutical Technology, 2002 and H. S. Fogler, Elements of Chemical Reaction Engi-
neering
-1-
Maxwell−Boltzmann Molecular Speed

Distribution for Noble Gases
probability density (s/m)
4
He
20
Ne
40
Ar
132
Xe
0 500 1000 1500 2000 2500
Speed (m/s)
Figure 3: Maxwell-Boltzmann speed distribution in noble gases (from

en.wikipedia.org/wiki/Image:MaxwellBoltzmann.gif#file)
Example 2 Sampling from a population, test of proportion
A paper company receives 200,000 cartons from a supplier. It has been agreed originally
that the shipment of cartons should contain no more than 10 percent defective items. In
practice the quality-assurance group could sample n cartons, selected randomly, and either
accept or reject the shipment according to the number of defective items found in the sample.
How should just a test be designed and what are the associated errors with the choice made?
Example 3 Determine the thickness of a film
Two samples were selected from different locations in a plastic film sheet. The thickness of
the respective samples was measured at 10 close but equally spaced points as:
Sample 1
1.473 1.484 1.484 1.425 1.448 1.367 1.276 1.485 1.390 1.350
Sample 2
1.310 1.501 1.485 1.435 1.348 1.417 1.500 1.469 1.474 1.452
Is the thickness between the two samples significantly different?
-2-
Example 4 Corrosion on pipe coatings∗
Pairs of pipes have been buried in 11 different locations to determine corrosion on nonbitu-
minous pipe coatings for underground use. One type includes a lead-coated steel pipe and
the other a bare steel pipe. The extent of corrosion on the 11 pairs has been determined as:
Soil type A B C D E F G H I J K
Lead-coated steel pipe 27.3 18.4 11.9 11.3 14.8 20.8 17.9 7.8 14.7 19.0 65.3
Bare steel pipe 41.4 18.9 21.7 16.8 9.0 19.3 32.1 7.4 20.7 34.4 76.2
Example 5 Parameter Estimation
Gilliland and Sherwood (1934) obtained mass transfer data for the evaporation of nine dif-
ferent liquids falling down the inside of a vertical wet-wall column into a turbulent air stream
flowing counter current to the liquid. They considered several air flow rates. The data in
the following table represent a relatively small sample of the data reported by these authors.
where Sh, Re, and Sc are the Sherwood, Reynolds, and Schmidt dimensionless numbers,
respectively. It is postulated that
Sh = B1 ReB2 ScB3
- What are the best values of the parameters B1, B2, and B3?
- What are confidence margins for these parameters?

∗
Perry’s Chemical Engineers’ Handbook, 6th Edition, (1984)
-3-
Example 6 Statistical quality control of VCM
Vista Chemical Co. produces VCM in Lake Charles, LA via a continuous process. Finished
VCM is shipped to Vista’s plants in Aberdeen, MS, and Oklahoma City, OK to make poly-
vinylchloride (PVC) resin. One of the final steps in the VCM process is a caustic soda
(NaOH) “wash” to remove by-products. Following the caustic wash, the VCM passes through
a knock-out drum to remove any entrained caustic. Caustic removed from the VCM stream
gradually accumulates in the knock-out drum; the drum is periodically drained, based on
the level as determined visually from a sight glass on the drum. Then knock-out drum is not
100% effective in removing entrained caustic from the VCM stream. Above 1.0 ppm, residual
caustic in the finished product severely impacts numerous PVC resign quality parameters.
Residual caustic in the finished VCM stream is sampled 4 times/day and tracked. The upper
specification limit (USL) is 1.0 ppm.
• What criteria should be used to monitor this process and avoid exceeding the 1.0 ppm
limit, after 15 data points have been collected (see below)?
• Did you consider any potential error in your measurements?
-4-
2 PROBABILITY
Course Learning Objectives
Explain the concept of discrete, continuous, joint probability distributions
• Understand the mathematical concept of probability, sample space and events
• Apply combination rules for events and enumerate permutations and combinations
when sample space/events known
• Know the probability laws
• Understand the concept of conditional probability and independence
• Apply Bayes’ theorem
Probability
Personal Relative Frequency Classical
Example Example Example
Percent chance of suc- Out of 1000 plastic cups 4 Probability of getting a 3 after
cess for a first time were defective. Chance of find- tossing a dice is 1/6. (Assum-
venture, e.g. selling ing defective cups in future ing all six numbers are equally
lettuce on the web. batches is ≈ 4/1000. likely (?) to appear).
(Level of Belief)
# of times # of ways
event A occured event A can occur
P [event A] ≈ P [A] ≈
# of times # of ways
experiment was run experiment can proceed
- How do you test for

loaded dice? ,
-5-
2.1 Sample spaces and events

Definition 1: Sample spaces, points and events
S = Sample Space
= Set of all possible outcomes for the experiment.
Sample point = an element of S
Event = any subset of S.
Impossible Event = empty space ∅
Example 7 Sample space and events for tossing of two dice

 




1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 



 
2/1, 2/2, 2/3, . . .

 


 


 

3/1, . . .

 

Sample Space =




4/1, . . . 



 
5/1, . . .

 


 


 

6/1, 6/2, 6/3, 6/4, 6/5, 6/6

 

Sample Point: 3/6
Sample Point: 6/3
Sample Point: 1/2
Event: 1/6
Event: Getting at least one 6 = {1/6, 2/6, 3/6, 4/6, 5/6, 6/6, 6/1, 6/2, 6/3, 6/4, 6/5}
Event: Getting a sum of 7 = {1/6, 2/5, 3/4, 4/3, 5/2, 6/1}
Event: Getting 1 from a dice and 6

from another dice = {1/6, 6/1}
-6-
2.2 Combination rules

• Venn diagrams: http://en.wikipedia.org/wiki/Venn diagram)
Definition 2: Event A or B
A ∪ B (Union) Graphically:
Definition 3: Event A and B
A ∩ B (Intersection) Graphically:
Definition 4: Event not A
A0 (Complement with respect to S) Graphically:
Example 8 Combining events
• Event: Getting 7 or 11 = (Getting 7) ∪ (Getting 11)=

{1/6, 2/5, 3/4, 4/3, 5/2, 6/1} ∪ {5/6, 6/5} =
{1/6, 2/5, 3/4, 4/3, 5/2, 6/1, 5/6, 6/5}
P=8/36, if all points are equiprobable
• Event: Getting ≥ 8 and ≤ 10 = (Getting ≥ 8) ∩ (Getting ≤ 10)=
{2/6, 3/5, 4/4, 5/3, 6/2, 3/6, 4/5, 5/4, . . . , 6/6} ∩ {1/1, 1/2, 1/3, . . . , 4/6, 5/5, 6/4} =
{2/6, 3/5, 4/4, 5/3, 6/2, 3/6, 4/5, 5/4, 6/3, 4/6, 5/5, 6/4}
P=12/36, if all points are equiprobable
• Event: Getting any sum between 2 and 12 = S
• Event: not (Getting 7) = S - (Getting 7)= S - {1/6, 2/5, 3/4, 4/3, 5/2, 6/1}
P=(36-6)/36, if all points are equiprobable
Definition 5: Mutually exclusive events

Events A, B mutually exclusive ⇔A ˆ ∩B =∅
Events A1 , A2 , . . . , An mutually exclusive ⇔A
ˆ i ∩ Aj = ∅ for all i 6= j in [1, n]
CAUTION: Do not confuse mutually exclusive events with independent events
-7-
Example 9 Mutually exclusive events

A=Getting a 7= { } B=Getting an 8= { }
A∩B =∅
When possible outcomes of an experiment are (believed to be) equally likely
# of ways A can occur

P [A] =
# of ways experiment can proceed
- How to compute # of ways experiment can proceed?

Definition 6: Permutation
Arrangement of objects in a definite order.
Definition 7: Combination
Selection of objects without regard to other.
Example 10 Permutations of 3 distinct objects

Andy, Bruce and Carol have a block of three tickets for a Volunteers game. In how many
different ways can they sit to watch the game?
Permutations of 3 objects: ABC ACB

BAC BCA
CAB CBA
Example 11 Permutations of 2 distinct objects out of 5
Andy, Bruce, Carol, Diane and Elaine are traveling in the same car. In how many different
ways can they arrange seating in the front two seats?
Permutations of 2 objects out of 5: AB AC AD AE BC BD BE CD CE DE

BA CA DA EA CB DB EB DC EC ED
Example 12 Combinations of 2 objects out of 5
Bubba’s Pizza offers five different toppings A, B, C, D and E. How many combinations of
two toppings are there?
Combinations of 2 objects out of 5: AB AC AD AE

BC BD BE
CD CE
DE
-8-
Theorem 1: Number of permutations of n distinct objects
n Pn = n!
Theorem 2: Number of permutations of r out of n distinct objects
n!
n Pr = = n(n − 1)(n − 2) . . . (n − r + 1)
(n − r)!
Theorem 3: Number of combinations of r out of n objects

 
n  n!
n Cr = =

r r!(n − r)!
Example 13 Example 10 revisited
By Theorem 1,
3 P3 = 3! = 6
By Theorem 2,
5!
5 P2 = = 20
3!
By Theorem 3,
5! 5·4·3·2·1
n = 5, r = 2 ⇒ n Cr = = = 10
2!3! 2·1·3·2·1
-9-
2.3 Probability laws

Axioms (=Claims, considered to be true) of probability:
1. P [S] = 1
2. P [A] ≥ 0 ∀A ⊂ S
3. P [A1 ∪ A2 ∪ A3 . . .] = P [A1 ] + P [A2 ] + P [A3 ] + . . . for every finite or infinite collection

{A1 , A2 , A3 , . . .} of subsets of S that are mutually disjoint ( Ai ∩ Aj = ∅ if i 6= j)
Graphically:
Example 16 Laws of probability
Blood type distribution in the US:

A: 41%, B: 9%, AB: 4%, O: 46%
S={A, B, AB, O}
Probability of a person having A, B, or AB:
= P [A1 ∪ A2 ∪ A3 ] = P [A1 ] + P [A2 ] + P [A3 ] = 0.41 + 0.09 + 0.04 = 0.54
where A1 = {A}, A2 = {B}, and A3 = {AB}.
Theorem 4: Probability of impossible event is zero
P [∅] = 0
Proof: S = S ∪ ∅ ⇒ P [S] = P [S ∪ ∅] ⇒ P [S] = P [S] + P [∅] ⇒ P [∅] = 0
-10-
Theorem 5: Probability of complementary events
P [A0 ] = 1 − P [A]
Proof: S = A0 ∪ A ⇒ P [S] = P [A0 ∪ A] ⇒ 1 = P [A0 ] + P [A] ⇒ P [A0 ] = 1 − P [A]
Graphically:
Theorem 6: General addition rule
P [A1 ∪ A2 ] =P [A1 ] + P [A2 ] − P [A1 ∩ A2 ]

P [A1 ∪ A2 ∪ A3 ] =P [A1 ] + P [A2 ] − P [A1 ∩ A2 ] − P [A1 ∩ A3 ] − P [A2 ∩ A3 ]+
+ P [A1 ∩ A2 ∩ A3]
Graphically:
Example 17 Connect two engines in parallel or in serial mode?

Given:
P [engine 1 works] = 0.9
| {z }
A1
P [engine 2 works] = 0.9

| {z }
A2
P [both engines work] = 0.81

| {z }
A1 ∩A2
P [engine 1 or engine 2 works] =?

P [A1 ∪ A2 ] = P [A1 ] + P [A2 ] − P [A1 ∩ A2 ] = 0.9 + 0.9 − 0.81 = 0.99
-11-
2.4 Conditional probability and independent events

Definition 8: Conditional probability
Probability [event A2 will occur if [or given] A1 has occured] is defined as
P [A1 ∩ A2 ]
P [A2 |A1 ]=
ˆ
P [A1 ]
Graphically:
Theorem 7: The multiplication rule

P [A1 ∩ A2 ] = P [A2 |A1 ]P [A1 ]
Definition 9: Independent events

A1 . A2 independent ⇐⇒P
ˆ [A1 ∩ A2 ] = P [A1 ]P [A2 ]
Example 18 Are lead and mercury conc. in water independent?

Seawater sample analysis results:
EVENTS: A1 : Toxic levels of Pb are found GIVEN: P [A1 ] = 0.32
A2 : Toxic levels of Hg are found P [A2 ] = 0.16
A1 ∩ A2 : Toxic levels of both Pb and Hg are found P [A1 ∩A2 ] = 0.10
• Are A1 , A2 independent?

P [A1 ]P [A2 ] = (0.32)(0.16) = 0.05 
⇒ not independent
P [A1 ∩ A2 ] = 0.10 
• What is the probability (Toxic levels of Pb or Hg are found)?
P [A1 ∪ A2 ] = P [A1 ] + P [A2 ] − P [A1 ∩ A2 ] = 0.32 + 0.16 − 0.10 = 0.38
• What is the probability (Toxic levels of Pb are found after toxic levels of Hg are found)?
P [A2 ∩ A1 ] 0.10
P [A1 |A2 ] = = = 0.63
P [A2 ] 0.16
-12-
Theorem 8: Probabilities of independent events

 
 P [A2 |A1 ] = P [A2 ] if P [A1 ] 6= 0 
A1 , A2 independent ⇔
 P [A1 |A2 ] = P [A1 ] if P [A2 ] 6= 0 
Definition 10: Multiple independent events
A1 , A2 , . . . , Am independent ⇐⇒P
ˆ [A1 ∩ A2 ∩ . . . ∩ Am ] = P [A1 ]P [A2 ] . . . P [Am ]
Theorem 9: Bayes’ theorem

Let A1 ∪ A2 ∪ . . . ∪ An = S, Ai ∩ Aj = ∅, P [B] 6= 0. Then
n
X
P [B] = P [B|Ai ]P [Ai ]
i=1
P [B|Aj ]P [Aj ] P [B|Aj ]P [Aj ]

P [Aj |B] = = Pn ∀ j = 1, 2, . . . , n
P [B] i=1 P [B|Ai ]P [Ai ]
Graphically (sets,trees) :
-13-
Example 19 BAYES’ Theorem
EVENTS: A: Individual has type A blood

B: Individual has type B blood
AB: Individual has type AB blood
O: Individual has type O blood
TA: Individual is identified as having type A blood after lab analysis
GIVEN: P [A] = 0.41, P [TA|A] = 0.88

P [B] = 0.09, P [TA|B] = 0.04
P [AB] = 0.04, P [TA|AB] = 0.10
P [O] = 0.46, P [TA|O] = 0.04
• What is P [TA]?
P [TA] = P [TA|A]P [A] + P [TA|B]P [B] + P [TA|AB]P [AB ] + P [TA|O]P [O] = 0.37
• What is P [A|TA]?
P [TA|A]P [A]
P [A|TA] = = 0.92
P [TA]
ÒP Hwnthi: Why is P [A] + P [B] + P [AB] + P [O] = 1 but

P [TA|A] + P [TA|B] + P [TA|AB] + P [TA|O] 6= 1
Graphically (sets,trees) :
-14-
3 RANDOM VARIABLES AND DISTRIBUTIONS

Explain the concept of discrete, continuous, joint probability distributions
State the significance of the Central Limit Theorem and its relationship to the Normal
Distribution
• Understand the concept of discrete/continuous random variables and distributions.
• Evaluate the probability, cumulative probability, expected value and variance of a

variable given the density distribution.
• Understand 2D density functions and the concepts of marginal densities, indepen-

dence, covariance and correlation.
• Formulate the conditional density in 2D density function applications.
Definition 11: Random variable

A variable whose value is determined by chance (i.e. can take values from an event space S)
Example 20 Random variables
X=number of defective items in a production batch of 1000 items.
Observed numerical value of X is x. Prossible values of x : 0 ≤ x ≤ 1000.
X=number of days of the year that plant emissions do not meet federal or state standards.
Observed numerical value of X is x. Possible values of x : 0 ≤ x ≤ 365.
Definition 12: Discrete random variable
A random variable that can assume a finite or countably infinite (ÒP Hwnthi: What does
that mean?) number of possible values (i.e. values can be indexed by integers-1, 2, 3, . . . , ∞)
Example 21 Discrete random variable
X=number of days a machine is functional.
Possible values of x : 0 ≤ x, i.e. x = 0, 1, 2, 3, . . . , ∞.
Definition 13: Continuous random variable
A random variable that can assume values from an interval of the real numbers.
Example 22 Continuous random variables
X=Fraction of Atlanta-Knoxville route a car covers with 1 gallon of gasoline: 0 ≤ x ≤ 1.
X=Temperature in a reactor. 0 ≤ x.
-15-
3.1 Discrete distributions

Definition 14: Discrete density function(DDF)
f (x)=P
ˆ [X = x] (1)
(AKA Probability function, Distribution function, Distribution, Probability distribution,

Probability mass function, Probability density function, Density function, Density)
- Required Properties:
X
f (x) ≥ 0, f (x) = 1
all x
Example 23 Discrete density function

x
 1
, x = 1, 2, 3, . . .



f (x) =  2
0, otherwise



• is f (x) a DDF?
∞ x
X X 1 1/2
f (x) ≥ 0, f (x) = = =1 (How ?)
all x x=1 2 1 − 1/2
• Read about Geometric series

• P [X = 3] =?
Figure 4: Example 23 f (x) in a Excel spreadsheet for the first ten values of x where f (x) 6= 0
• Check:
Mathematica
Sum [(1/2)ˆ x , {x , 1 , Infinity }]
-16-
Definition 15: Discrete cumulative distribution function
ˆ [X ≤ x]
F (x)=P (2)
Theorem 10: Connection between distribution and cumulative distributions

The discrete random variable X can take values x1 , x2 , . . .. Then:
X
F (xn ) = f (x) (3)
x≤xn
and
f (xn ) = F (xn ) − F (xn−1 ) (4)
Proof: Based on the laws of probabilities
Example 24 Discrete Cumulative Distribution for Example 23
k=x k 1 2 3 x−1 x
X 1 1 1 1 1 1
F (x) = = + + + ... + +
k=1 2 2 2 2 2 2
Figure 5: Example 24 F (x) in a Excel spreadsheet for the first ten values of x
• Check that f (X = 4) = F (4) − F (3)
-17-
3.1.1 Calculations with discrete distribution functions
Example 25 Example 23 and Example 24 revisited
• P [X ≥ 6] =?
P [X ≥ 6] = P [{X = 6} ∪ {X = 7} ∪ {X = 8} ∪ {X = 9} ∪ {X = 10} + . . .] =
= P [X = 6] + P [X = 7] + P [X = 8] + P [X = 9] + P [X = 10] + . . .
or
P [X ≥ 6] = 1 − P [A0 ] = 1 − P [X ≤ 5] = 1 − F (5) = 1 − 0.96785 = 0.03215
| {z }
A
• P [3 ≤ X ≤ 5] =?
Example 26 Probability of imperfections
The probability distribution of X, the number of imperfections per 10 meters of a synthetic

fiber in continuous rolls of uniform width, is given by:
x 0 1 2 3 4 5
f (x) 0.40 0.36 0.16 0.05 0.02 0.01
The cumulative distribution function is calculated as:
x 0 1 2 3 4 5
F (x) 0.40 0.76 0.92 0.97 0.99 1
• P [X ≤ 2] =?
• P [X ≥ 0] =?
• P [1 ≤ X ≤ 3] =?
3.1.2 Expected value and other parameters of a discrete distribution
- Why? Information about a distribution (and therefore the probabilities of values of the
random variable X by knowing just a few number of parameters
Example 27 Parameters that describe a distribution
Per capita income, Rich-Poor income gap, grade point average, points per game, average
pore size.
-18-
Definition 16: Expected value (mean, average) of discrete random variable

The expected value (mean, average) of the discrete random variable X with DDF f is
X
µ ≡ µX ≡ E[X]=
ˆ xf (x) (5)
all x
|x|f (x) < ∞

P
provided all x
Example 28 Average value of X that follows the distribution function in Example 23
∞ 2 3
X 1 1 1 1/2
µ= xf (x) = 1 + 2 +3 + ... = =2
x=1 2 2 2 (1 − 1/2)2
• Check:
Mathematica
Sum [( a )ˆ x , {x , 1 , Infinity }]
Sum [ x *( a )ˆ x , {x , 1 , Infinity }]
Theorem 11: Linearity of expected value
E[cX] = cE[X]
E[X + Y ] = E[X] + E[Y ] (6)
Theorem 12: Expected value of constant is constant
E[c] = c (7)
Definition 17: Expected value of function of discrete random variable

The expected value of function H(X) of the random variable X with DDF f is
X
E[H(X)]=
ˆ H(x)f (x) (8)
all x
|H(x)|f (x) < ∞

P
provided all x
-19-
Definition 18: Variance of discrete random variable

The varience of the discrete random variable X with DDF f and average µ is
h i
Var(X) ≡ σ 2 =E
ˆ (X − µ)2 (9)
Example 29 Variance of X that follows the distribution function in Example 23
∞ 2 3
1 1 1
σ2 = (x − µ)2 f (x) = (1 − 2)2 + (2 − 2)2 + (3 − 2)2
X
+ ... = 2
x=1 2 2 2
Theorem 13: Relationship between average and variance

h i
ˆ X 2 − (E[X])2
σ 2 =E (10)
Proof:
Theorem 14: Properties of variance
Var(c) = 0
Var(cX) = c2 Var(X)
X Y independent ⇒ Var(X + Y ) = Var(X) + Var(Y )
Proof:
Definition 19: Standard deviation of a random variable

The standard deviation of the random variable X with DDF f and average µ is
q √
ˆ Var(X) ≡
σ= σ2 (11)
ÒP Hwnthi: Is σX+Y = σX + σY ?
-20-
Example 30 Average and variance in Example 26
Px=5
• µ= x=0 xf (x) = 0.96 imperfections on the average per 10 meters of synthetic fiber
Px=5
• Variance σ 2 = − µ)2 f (x)=1.0984
x=0 (x
√
• Standard deviation = σ 2 = 1.048
Px=5 2
• Alternative formula for σ 2 = x f (x) − µ2
x=0 = 2.02 − 0.962 = 1.0984
Px=5
ÒP Hwnthi: What is the value of E(X − µ), that is x=0 (x − µ)f (x) ?
MATLAB
x =[0:5];
fx =[0.4 ,0.36 ,0.16 ,0.05 ,0.02 ,0.01];
mu = sum ( x .* fx )
sigma2 = sum (( x - mu ).*( x - mu ).* fx )
sigma2alt = sum ( x .* x .* fx ) - mu * mu
Mathematica
x = Range [0 , 5];
fx = {0.4 , 0.36 , 0.16 , 0.05 , 0.02 , 0.01};
mu = x . fx
sigma2 = (( x - mu )ˆ2). fx
sigma2alt = ( x ˆ2). fx - mu * mu
• Careful how software performs element-by-element operations on vectors/matrices

ÒP Hwnthi: Make sure you learn how to import data (i.e. a vector) from files
-21-
3.1.3 The geometric distribution
Example 31 The probability of a lucky streak
After several experiments, a microchip manufacturer found that the probability of producing
a defective wafer D is p = 0.010. After starting the production line, the manufacturer wants
to know what is the probability of producing a defective D wafer exactly after 1, 2, 3, . . . etc
good wafers G have been produced.
S={D, GD, GGD, GGGD, . . .}
X can take values D, GD, GGD . . . or for convenience 1, 2, 3, . . .
P [X = 1] =
ˆ P [X = D] = p = 0.010
ˆ P [X = GD] = (1 − p)p = (0.990)(0.010) = 0.0099
P [X = 2] =
ˆ P [X = GGD] = (1 − p)(1 − p)p = (1 − p)2 p = (0.990)2 (0.010)
P [X = 3] =
..
.
ˆ P [X = |GG.{z. .G} D] = (1 − p)x−1 p
P [X = x] =
x−1
Definition 20: Geometric distribution
- Bernoulli trials
- Outcome of each Bernoulli trial is one of two alternatives with probabilities p, q = 1−p
(0 < p < 1)
- X =number of Bernoulli trial at which alternative 1 occurs for the first time
Theorem 15: Density function, average and variance of geometric distribution
1 1−p
f (x) = (1 − p)x−1 p, x = 1, 2, 3, . . . , µ= , σ2 =
p p2
ÒP Hwnthi: Is
P
all x f (x) = 1?
-22-
Example 32 Geometric distribution for different values of p using software
MATLAB
x =1:1:15;
p =0.3; y03 = p *(1 - p ).ˆ( x -1);
p =0.5; y05 = p *(1 - p ).ˆ( x -1);
p =0.8; y08 = p *(1 - p ).ˆ( x -1);
figure % Make three plots in one figure

subplot (311); h1 = stem ( y03 ); legend ( ’p =0.3 ’ ); set ( h1 , ’ Marker ’ , ’ square ’)
subplot (312); h2 = stem ( y05 ); legend ( ’p =0.5 ’ ); set ( h2 , ’ Marker ’ , ’o ’)
subplot (313); h3 = stem ( y08 ); legend ( ’p =0.8 ’ ); set ( h3 , ’ Marker ’ , ’* ’)
m03 = sum ( x .* y03 ) % Expected values for variable x

m05 = sum ( x .* y05 )
m08 = sum ( x .* y08 ) % why m03 is not exactly 1/ p ?
Hx03 =( x - m03 ).ˆ2;

Hx05 =( x - m05 ).ˆ2;
Hx08 =( x - m08 ).ˆ2;
var03 = sum ( Hx03 .* y03 ) % Variance of x ( E [( X - m )ˆ2])
var05 = sum ( Hx05 .* y05 )
var08 = sum ( Hx08 .* y08 )
% Alternative computation of variance ( E [ X ˆ2] -( E [ X ])ˆ2)

avar03 = sum ( x .ˆ2.* y03 ) - ( sum ( x .* y03 ))ˆ2
0.35 0.5 0.8

p=0.3 p=0.5 p=0.8
0.45
0.3 0.7
0.4
0.6
0.25 0.35
0.5
0.3
0.2
0.25 0.4
0.15
0.2
0.3
0.1 0.15
0.2
0.1
0.05 0.1
0.05
0 0 0
0 10 20 0 10 20 0 10 20
ÒP Hwnthi Plot the Geometric Distribution in Excel or Mathematica
-23-
3.1.4 The binomial distribution
Example 33 Bernoulli trials

A coin has probability p = 0.60 to land heads up (H) when tossed. What is the probability
of getting exactly 2 heads in 5 tosses? (Recall: n Cr from Theorem 3)
P [X = HHTTT or HTHTT or HTTHT or HTTTH

or THHTT or THTHT or THTTH
or TTHHT or TTHTH
or TTTHH]
= P [X = HHTTT] + P [X = HTHTT] + P [X = HTTHT] + P [X = HTTTH]
+P [X = THHTT] + P [X = THTHT] + P [X = THTTH]
+P [X = TTHHT] + P [X = TTHTH]
+P [X = TTTHH]
 
5
= 10(0.6)(0.6)(0.4)(0.4)(0.4) = 0.23 =   0.602 (1 − 0.60)(5−2)
2
Definition 21: Binomial distribution
- Fixed number n of Bernoulli trials
- Outcome of each Bernoulli trial is one of two alternatives with probabilities p, 1 − p

(0 < p < 1)
- X =number of Bernoulli trials out of n that result in alternative one
Theorem 16: Density function, average and variance of binomial distribution

 
n n!
f (x) =   px (1 − p)n−x = px (1 − p)n−x , x = 0, 1, 2, 3, . . . , n (12)
x (n − x)!x!
µ = np σ 2 = np(1 − p)
ÒP Hwnthi: Is
P
all x f (x) = 1?  
n  x
p (1 − p)n−x = (p + (1 − p))n = . . .
P P
(Hint: all x f (x) = all x

x
-24-
Example 34 Quality of production
The probability that an item manufactured by a certain machine is defective is 5% (p = 0.05).

What is the probability that a set of 5 items will not contain more than 1 defective item?
n = 5, p = 0.05
X = number of defective items out of 5
P [X ≤ 1] = P [ {0} ∪ {1} ] = f (0) + f (1) =

| {z }
0 or 1 defective
   
5 5
=   (0.05)0 (0.95)5 +   (0.05)1 (0.95)4 = 0.7738 + 0.2036 = 0.9774
0 1
What is the expected number of defective items in a lot of 1000 items produced by the above
machine?
µ = np = (1000)(0.05) = 50
What is the probability that the number of defective items in a lot of 1000 items produced by
the previous machine will be in the intervals [µ − σ, µ + σ], [µ − 2σ, µ + 2σ], [µ − 3σ, µ + 3σ]?
σ 2 = np(1 − p) = (1000)(0.05)(1 − 0.05) = 47.5 ⇒ σ = 6.892

P [µ − σ ≤ X ≤ µ + σ] = P [44 ≤ X ≤ 56] = f (44) + . . . + f (56) = 0.655
P [µ − 2σ ≤ X ≤ µ + 2σ] = P [37 ≤ X ≤ 63] = f (37) + . . . + f (63) = 0.950
P [µ − 3σ ≤ X ≤ µ + 3σ] = P [30 ≤ X ≤ 70] = f (30) + . . . + f (70) = 0.997
Empirical Rule: For almost all distributions:

P [µ − σ ≤ X ≤ µ + σ] ≈ 0.68
P [µ − 2σ ≤ X ≤ µ + 2σ] ≈ 0.95
P [µ − 3σ ≤ X ≤ µ + 3σ] ≈ 1.0
Theorem 17: Chebyshev’s inequality

Let µ, σ be the average and standard deviation of X. Then for any positive k,
1
P [µ − kσ < X < µ + kσ] ≥ 1 −
k2
ÒP Hwnthi: Apply Chebyshev’s inequality for k = 1, 2, 3, 4, 5.
-25-
Example 35 Shape of binomial distribution
The binomial distribution can be calculated using Eq. 12 (and the cumulative by summing
all previous terms). However, because it is so common, textbooks offer tables (careful, tables
often offer cumulative and the distribution can be calculated using Eq. 4).
x f(x) F(x)
0 1.336749E-09 1.336749E-09 n= 40
1 3.564665E-08 0.000000037 p= 0.4
2 4.634065E-07 5.003899E-07
3 3.91321E-06 4.4136E-06
4 2.413146E-05 2.854506E-05
5 0.000115831 0.000144376
6 0.000450454 0.00059483
7 0.001458613 0.002053443 f(x)
8 0.004011185 0.006064628 0.14
9 0.009507995 0.015572624
0.12
10 0.019649857 0.03522248
0.1
11 0.035727012 0.070949492
12 0.057560186 0.128509678 0.08
13 0.082650523 0.211160202 0.06
14 0.106264959 0.31742516
0.04
15 0.122795063 0.440220224
16 0.127911524 0.568131748 0.02
17 0.120387317 0.688519065 0
18 0.102552159 0.791071224 0 4 8 12 16 20 24 28 32 36 40
19 0.07916307 0.870234294 f(x)

20 0.055414149 0.925648443
21 0.035183587 0.96083203
22 0.020257217 0.981089247
23 0.010568983 0.991658229
24 0.004990908 0.996649138
25 0.002129454 0.998778592 F(x)
26 0.000819021 0.999597613
1.25
27 0.000283118 0.999880731
28 8.763186E-05 0.999968363 1
29 2.417431E-05 0.999992537
30 5.909275E-06 0.999998446 0.75
31 1.270812E-06 0.999999717
32 2.382772E-07 0.999999956 0.5
33 3.850945E-08 0.999999994
34 5.285611E-09 0.999999999 0.25
35 6.040698E-10 1
0
36 5.593239E-11 1 0 4 8 12 16 20 24 28 32 36 40
37 4.031163E-12 1 F(x)
38 2.121665E-13 1
39 7.253555E-15 1
40 1.208926E-16 1
Figure 6: Binomial distribution and cumulative binomial distribution for n = 40 and p = 0.4
using Excel function BINOM.DIST
-26-
MATLAB
n =40
p =0.25:0.25:0.75
for k =1:3
for i =1:1:40
x ( i )= i ;
x2 ( i )= i * i ;
y (i , k )= nchoosek (n , i ) * p ( k )ˆ i * (1 - p ( k ))ˆ( n - i );
end
F (: , k )= cumsum ( y (: , k )); % Cumulative distribution
m = sum (x ’.* y (: , k )) % Expected values for variable x
var = sum ( x2 ’.* y (: , k )) - m * m % Variance
end
figure
subplot (321); stem (x , y (: ,1)); legend ( ’p =0.25 ’)
subplot (322); stem (x , F (: ,1))
subplot (324); stem (x , F (: ,2))
subplot (326); stem (x , F (: ,3))
0.2 1
p=0.25
0.8
0.15
0.6
0.1
0.4
0.05
0.2
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
0.2 1
p=0.50
0.8
0.15
0.6
0.1
0.4
0.05
0.2
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
0.2 1
p=0.75
0.8
0.15
0.6
0.1
0.4
0.05
0.2
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
-27-
3.1.5 The Poisson distribution
Approximates binomial distribution when p << 1: Distribution of rare events.
Definition 22: Poisson distribution
- Fixed number n of Bernoulli trials, n >> 1
- Outcome of each Bernoulli trial is one of two alternatives with probabilities p, 1 − p

(0 < p << 1)
- X =number of Bernoulli trials out of n that result in alternative one
Theorem 18: Density function, average and variance of Poisson distribution
e−k k x
f (x) = , x = 0, 1, 2, 3, . . . k>0 (13)
x!
µ=k σ2 = k
e−k kx P∞ kx
ÒP Hwnthi: Is = e−k
P P P
all x f (x) = 1? (Hint: all x f (x) = all x x! x=0 x! = . . .)
Theorem 19: Approximation of the binomial distribution by the Poisson distribution

For n → ∞ with np=const. the binomial distribution is approximately equal to the Poisson
distribution with 0 < k = np, i.e.
 
n  x e−np (np)x
 p (1 − p)n−x ≈
x x!
Example 36 Resistant bacteria in a bacterial population
108 bacterial cells multiply once. The probability that a mutant resistant to antibiotics will
result from each multiplication is 10−8 . What is the probability that when all 108 bacteria
multiply once there will be (a) no resistant mutants, (b) at least one resistant mutant in the
offspring?
e−1 10
n = 108 , p = 10−8 ⇒ k = np = 1 ⇒ f (0) = = 0.37
0!
P [At least one resistant mutant] = 1 − f (0) = 0.63
-28-
Example 37 Binomial vs. Poisson
x Binom1 Binom2 Poisson1 Poisson2

0 1.336749E-09 0.195366152 1.125352E-07 0.201896518 n= 40 n= 40
1 3.564665E-08 0.325610253 1.800563E-06 0.323034429 p1= 0.4 p2= 0.04
2 4.634065E-07 0.26455833 1.44045E-05 0.258427543 k1= 16 k2= 1.6
3 3.91321E-06 0.139628008 0.000076824 0.137828023
4 2.413146E-05 0.053814961 0.000307296 0.055131209
5 0.000115831 0.016144488 0.000983347 0.017641987
6 0.000450454 0.003924008 0.00262226 0.00470453 0.14
7 0.001458613 0.000794144 0.005993736 0.001075321
0.12
8 0.004011185 0.000136494 0.011987473 0.000215064
9 0.009507995 2.022127E-05 0.021311062 3.823364E-05 0.1
10 0.019649857 2.611914E-06 0.0340977 6.117382E-06
11 0.035727012 2.968084E-07 0.049596654 8.898011E-07 0.08
12 0.057560186 2.988696E-08 0.066128872 1.186401E-07 0.06
13 0.082650523 2.682163E-09 0.081389381 1.460186E-08
14 0.106264959 2.155309E-10 0.093016436 1.668784E-09 0.04
15 0.122795063 1.556612E-11 0.099217532 1.780037E-10
0.02
16 0.127911524 1.01342E-12 0.099217532 1.780037E-11
17 0.120387317 5.961291E-14 0.093381206 1.675329E-12 0
0 4 8 12 16 20 24 28 32 36 40
18 0.102552159 3.173836E-15 0.083005517 1.489181E-13
19 0.07916307 1.531236E-16 0.069899382 1.254047E-14 Binom1 Poisson1
20 0.055414149 6.69916E-18 0.055919506 1.003238E-15
21 0.035183587 2.658397E-19 0.042605338 7.643716E-17
22 0.020257217 9.5662E-21 0.0309857 5.559066E-18
23 0.010568983 3.119413E-22 0.02155527 3.867177E-19
24 0.004990908 9.206601E-24 0.01437018 2.578118E-20
0.5
25 0.002129454 2.455094E-25 0.009196915 1.649995E-21
26 0.000819021 5.901667E-27 0.00565964 1.015382E-22
27 0.000283118 1.275052E-28 0.003353861 6.017077E-24
28 8.763186E-05 2.466618E-30 0.001916492 3.43833E-25
29 2.417431E-05 4.252789E-32 0.001057375 1.897009E-26
30 5.909275E-06 6.497317E-34 0.000563933 1.011738E-27 0.25
31 1.270812E-06 8.732952E-36 0.000291062 5.221876E-29
32 2.382772E-07 1.023393E-37 0.000145531 2.610938E-30
33 3.850945E-08 1.03373E-39 7.056056E-05 1.265909E-31
34 5.285611E-09 8.867783E-42 0.000033205 5.95722E-33
35 6.040698E-10 6.334131E-44 1.517941E-05 2.723301E-34
0
36 5.593239E-11 3.665585E-46 6.746407E-06 1.210356E-35 0 4 8
37 4.031163E-12 1.651164E-48 2.917365E-06 5.233971E-37 Binom2 Poisson2
38 2.121665E-13 5.431462E-51 1.228364E-06 2.203777E-38
39 7.253555E-15 1.160569E-53 5.039443E-07 9.041138E-40
40 1.208926E-16 1.208926E-56 2.015777E-07 3.616455E-41
Figure 7: Binomial distribution and Poisson distribution n = 40 and p1 = 0.4 and p2 = 0.04
using Excel functions BINOM.DIST and POISSON.DIST
It is common for textbooks to offer tables of the cumulative Poisson distribution. Again
the density distribution can be calculated using Eq. 4 given tables or directly from the
formula in Eq. 13.
-29-
3.2 Continuous distributions

Definition 23: Continuous random variable X
x∈I⊂<
P [X = x] = 0 (14)
Definition 24: Continuous density function


f (x) ≥ 0








f : < → < density ⇐⇒
R∞
ˆ −∞ f (x)dx =1 (15)




 Rb
 P [a ≤ X ≤


b] = a f (x)dx
Example 38 Continuous density function
X=Pb in gasoline

 12.5x − 1.25, 0.1 ≤ x ≤ 0.5
f (x) =
 0, elsewhere
R∞
Is: −∞ f (x)dx = 1?
R 0.3
P [0.2 ≤ x ≤ 0.3] = 0.2 f (x)dx = = 0.1875
Graphically:
-30-
Definition 25: Continuous cumulative distribution (density) function
ˆ [X ≤ x]
F (x)=P (16)
Theorem 20: Connection between distributions and cumulative distributions

The continuous random variable X can take values x in an interval of the real line. Then
Z x
F (x) = f (t)dt
−∞
and
dF
f (x) =
dx
Proof: Based on the laws of probabilities
Graphically:
Example 39 Cumulative distribution for Example 38
f (x) = 12.5x − 1.25, 0.1 ≤ x ≤ 0.5


Z x




0 x < 0.1
F (x) = f (t)dt =  6.25x2 − 1.25x + 0.0625 0.1 ≤ x ≤ 0.5
−∞ 
 1

x > 0.5
P [0.2 ≤ x ≤ 0.3] = F (0.3) − F (0.2)?
-31-
3.2.1 Expected value and other parameters of a continuous distribution
Why? Get information for distribution by knowing just a few numbers.
Example 40 Parameters that describe a distribution
Per capita income, Rich-Poor income gap, grade point average, average pore size.
Definition 26: Expected value (mean, average) of continuous random variable

The expected value (mean, average) of the random variable X with CDF f is
Z ∞
µ ≡ µx ≡ E[X]=
ˆ xf (x)dx (17)
−∞
R∞
provided −∞ |x|f (x)dx < ∞
Theorem 21: Linearity of expected value
E[cX] = cE[X]
E[X + Y ] = E[X] + E[Y ] (18)
Theorem 22: Expected value of constant is constant
E[c] = c (19)
Definition 27: Expected value of function of continuous random variable

The expected value of function H(x) of the random variable X with CDF f is
Z ∞
E[H(x)]=
ˆ H(x)f (x)dx (20)
−∞
R∞
provided −∞ |H(x)|f (x)dx < ∞
Definition 28: Variance of continuous random variable

The variance of the continuous random variable X with CDF f and average µ is
h i
ˆ (X − µ)2
Var(X) ≡ σ 2 =E (21)
-32-
Theorem 23: Relationship between average and variance

h i
σ 2 = E X 2 − (E[X])2 (22)
Proof:
Theorem 24: Properties of variance
Var(c) = 0 (23)
Var(cX) = c2 Var(X) (24)
X Y independent ⇒ Var(X + Y ) = Var(X) + Var(Y ) (25)
Proof:
Definition 29: Standard deviation of a random variable

The standard deviation of the random variable X with CDF f and average µ is
q √
ˆ Var(X) ≡
σ= σ2 (26)
ÒP Hwnthi: Is σX+Y = σX + σY ?
Example 41 Average and variance in Example 38
f (x) = 12.5x − 1.25 0.1 ≤ x ≤ 0.5

Z ∞
µx ≡ E[X] ≡ xf (x)dx = = 0.3667
−∞
Var(X) = E[X 2 ] − (E[X])2 = 0.1433 − 0.1345 = 0.0088

q
σ= Var(X) = 0.094
-33-
3.2.2 Graphical representation of mean, median, and mode of distribution
- Mean
F(x)
f (x) 1.0
0.8
0.6
0.4
0.2
x x
(point of balance)
Theorem 25: Mean is point of balance

R∞
Proof: Definition 26: ⇒ −∞ (x − µ)f (x)dx = 0
Theorem 26: Mean is (signed) area between F (x) and 1

dF R∞ R1
Proof: Theorem 20: f (x) = dx
⇒ xf (x) = x dF
dx
⇒µ= −∞ x dF
dx
dx = 0 xdF
- Median
F(x)
f (x) 1.0
0.5
x x
(50/50 area split)
- Mode
F(x)
f (x) 1.0
Mode x Mode x
(peak point) (inflection point)
-34-
3.2.3 The normal distribution
Definition 30: Density function of normal distribution (Figure 8)
1 1 x−µ 2
f (x) = √ e− 2 [ σ ] (27)
σ 2π
where −∞ < x < ∞, −∞ < µ < ∞, σ > 0.

1 x−µ 2
e− 2 [ ] dx = 1?
R∞
ÒP Hwnthi: Check, is √1
−∞ σ 2π
σ
∞ 2
e−x dx by computing the double integral I 2 =
R
(Hint: Compute first the integral I = −∞
R ∞ R ∞ −(x2 +y2 )
−∞ −∞ e dxdy in polar coordinates, where x = r cos θ, y = r sin θ ⇒ x2 + y 2 = r2 .
Recall the integration formula for coordinate change in double integrals.)
Theorem 27: Average and variance of normal distribution

If X is normally distributed, then
E[X] = µ
Var(X) = σ 2
x−µ
Proof: Set z = σ
⇒ dx = σdz.
R∞ − z2
2 1 Z ∞ − z2 1 Z ∞ − z2
Then, E[X] = √1 dz = µ √ e dz +σ √
−∞ (µ + σz)e ze 2 dz = µ
2
2π
|
2π −∞
{z } |
2π −∞
{z }
=1 (why?) =0 (why?)
1 x−µ 2
− µ)2 e− 2 [ ] dx
R∞
Var(X) = E[(X − µ)2 ] = √1
−∞ (x
σ
σ 2π
x−µ
Set z = σ
⇒ dx = σdz.
2 R∞ z2 2 R∞ z2
√σ 2 − 2 √σ − 2
Then, E[(x − µ)2 ] = 2π −∞ z e dz = 2π −∞ z · z e dz
z2 z2
Integrate by parts: u = z ⇒ du = dz, ν = −e− 2 ⇒ dν = ze− 2 to get
 
∞
2
Z ∞
z2
σ2 − z2
e− 2 dz = σ2

Var(X) = √ −ze +
2π
−∞
−∞
√
| {z } | {z }
=0 = 2π
Notation: X ∼ N (µ; σ 2 ) or X = N (µ; σ 2 ): Random variable X follows the normal distribu-

tion with mean µ and standard deviation σ.
-35-
Definition 31: Standard normal distribution (Figure 8)

ˆ X−µ
Let X = N (µ; σ 2 ). Then the random variable Z = σ
⇔ X =µ
ˆ + Zσ is called Standard
Normal.
Theorem 28: Average and variance of standard normal distribution
Z ∼ N (0; 1)
0.4 0.9987
1
0.9772
0.8 0.8413
0.3
0.6
f (z)
0.2 F(z) 0.5000
0.4
0.1
0.2 0.1587
0.02775
0 0
-4 -2 0 2 4 -4 -2 0 2 0.00135 4
z z
Figure 8: Standard normal distribution f (z) and cumulative F (z)
• The probability that X will fall between x1 and x2 can be written as the probability
that Z will be between z1 = x1σ−µ and z2 = x2σ−µ for ANY normal distribution:
1 Z x2 − 12 [ x−µ 2
σ ] dx = √
1 Z z2 − 1 z 2
P [x1 < X < x2 ] = √ e e 2 dz = P [z1 < Z < z2 ]
σ 2π x1 2π z1
• Due to symmetry for ∼ N (0; 1), P [Z ≤ −z] = P [Z ≥ z].
Example 42 Calculations with normal distribution
Let X ∼ N (1; 0.0625) (µ = 1, σ = 0.25). Find P [0.9 ≤ X ≤ 1.5] using tables (Table 1) and
compare to:
MATLAB: normcdf(2,0,1)-normcdf(-0.4,0,1)
Mathematica:
CDF[NormalDistribution[0,1],2]-CDF[NormalDistribution[0,1],-0.4]
-36-
Table values: P [Z ≤ z]. Excel command: NORMDIST(z,0,1,true)
z F(z) z F(z) z F(z) z F(z) z F(z) z F(z) z F(z)

0.00 0.5000 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987
0.01 0.5040 0.51 0.6950 1.01 0.8438 1.51 0.9345 2.01 0.9778 2.51 0.9940 3.01 0.9987
0.02 0.5080 0.52 0.6985 1.02 0.8461 1.52 0.9357 2.02 0.9783 2.52 0.9941 3.02 0.9987
0.03 0.5120 0.53 0.7019 1.03 0.8485 1.53 0.9370 2.03 0.9788 2.53 0.9943 3.03 0.9988
0.04 0.5160 0.54 0.7054 1.04 0.8508 1.54 0.9382 2.04 0.9793 2.54 0.9945 3.04 0.9988
0.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.9798 2.55 0.9946 3.05 0.9989
0.06 0.5239 0.56 0.7123 1.06 0.8554 1.56 0.9406 2.06 0.9803 2.56 0.9948 3.06 0.9989
0.07 0.5279 0.57 0.7157 1.07 0.8577 1.57 0.9418 2.07 0.9808 2.57 0.9949 3.07 0.9989
0.08 0.5319 0.58 0.7190 1.08 0.8599 1.58 0.9429 2.08 0.9812 2.58 0.9951 3.08 0.9990
0.09 0.5359 0.59 0.7224 1.09 0.8621 1.59 0.9441 2.09 0.9817 2.59 0.9952 3.09 0.9990
0.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953 3.10 0.9990
0.11 0.5438 0.61 0.7291 1.11 0.8665 1.61 0.9463 2.11 0.9826 2.61 0.9955 3.11 0.9991
0.12 0.5478 0.62 0.7324 1.12 0.8686 1.62 0.9474 2.12 0.9830 2.62 0.9956 3.12 0.9991
0.13 0.5517 0.63 0.7357 1.13 0.8708 1.63 0.9484 2.13 0.9834 2.63 0.9957 3.13 0.9991
0.14 0.5557 0.64 0.7389 1.14 0.8729 1.64 0.9495 2.14 0.9838 2.64 0.9959 3.14 0.9992
0.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960 3.15 0.9992
0.16 0.5636 0.66 0.7454 1.16 0.8770 1.66 0.9515 2.16 0.9846 2.66 0.9961 3.16 0.9992
0.17 0.5675 0.67 0.7486 1.17 0.8790 1.67 0.9525 2.17 0.9850 2.67 0.9962 3.17 0.9992
0.18 0.5714 0.68 0.7517 1.18 0.8810 1.68 0.9535 2.18 0.9854 2.68 0.9963 3.18 0.9993
0.19 0.5753 0.69 0.7549 1.19 0.8830 1.69 0.9545 2.19 0.9857 2.69 0.9964 3.19 0.9993
0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965 3.20 0.9993
0.21 0.5832 0.71 0.7611 1.21 0.8869 1.71 0.9564 2.21 0.9864 2.71 0.9966 3.21 0.9993
0.22 0.5871 0.72 0.7642 1.22 0.8888 1.72 0.9573 2.22 0.9868 2.72 0.9967 3.22 0.9994
0.23 0.5910 0.73 0.7673 1.23 0.8907 1.73 0.9582 2.23 0.9871 2.73 0.9968 3.23 0.9994
0.24 0.5948 0.74 0.7704 1.24 0.8925 1.74 0.9591 2.24 0.9875 2.74 0.9969 3.24 0.9994
0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970 3.25 0.9994
0.26 0.6026 0.76 0.7764 1.26 0.8962 1.76 0.9608 2.26 0.9881 2.76 0.9971 3.26 0.9994
0.27 0.6064 0.77 0.7794 1.27 0.8980 1.77 0.9616 2.27 0.9884 2.77 0.9972 3.27 0.9995
0.28 0.6103 0.78 0.7823 1.28 0.8997 1.78 0.9625 2.28 0.9887 2.78 0.9973 3.28 0.9995
0.29 0.6141 0.79 0.7852 1.29 0.9015 1.79 0.9633 2.29 0.9890 2.79 0.9974 3.29 0.9995
0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974 3.30 0.9995
0.31 0.6217 0.81 0.7910 1.31 0.9049 1.81 0.9649 2.31 0.9896 2.81 0.9975 3.31 0.9995
0.32 0.6255 0.82 0.7939 1.32 0.9066 1.82 0.9656 2.32 0.9898 2.82 0.9976 3.32 0.9995
0.33 0.6293 0.83 0.7967 1.33 0.9082 1.83 0.9664 2.33 0.9901 2.83 0.9977 3.33 0.9996
0.34 0.6331 0.84 0.7995 1.34 0.9099 1.84 0.9671 2.34 0.9904 2.84 0.9977 3.34 0.9996
0.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978 3.35 0.9996
0.36 0.6406 0.86 0.8051 1.36 0.9131 1.86 0.9686 2.36 0.9909 2.86 0.9979 3.36 0.9996
0.37 0.6443 0.87 0.8078 1.37 0.9147 1.87 0.9693 2.37 0.9911 2.87 0.9979 3.37 0.9996
0.38 0.6480 0.88 0.8106 1.38 0.9162 1.88 0.9699 2.38 0.9913 2.88 0.9980 3.38 0.9996
0.39 0.6517 0.89 0.8133 1.39 0.9177 1.89 0.9706 2.39 0.9916 2.89 0.9981 3.39 0.9997
0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981 3.40 0.9997
0.41 0.6591 0.91 0.8186 1.41 0.9207 1.91 0.9719 2.41 0.9920 2.91 0.9982 3.41 0.9997
0.42 0.6628 0.92 0.8212 1.42 0.9222 1.92 0.9726 2.42 0.9922 2.92 0.9982 3.42 0.9997
0.43 0.6664 0.93 0.8238 1.43 0.9236 1.93 0.9732 2.43 0.9925 2.93 0.9983 3.43 0.9997
0.44 0.6700 0.94 0.8264 1.44 0.9251 1.94 0.9738 2.44 0.9927 2.94 0.9984 3.44 0.9997
0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984 3.45 0.9997
0.46 0.6772 0.96 0.8315 1.46 0.9279 1.96 0.9750 2.46 0.9931 2.96 0.9985 3.46 0.9997
0.47 0.6808 0.97 0.8340 1.47 0.9292 1.97 0.9756 2.47 0.9932 2.97 0.9985 3.47 0.9997
0.48 0.6844 0.98 0.8365 1.48 0.9306 1.98 0.9761 2.48 0.9934 2.98 0.9986 3.48 0.9997
0.49 0.6879 0.99 0.8389 1.49 0.9319 1.99 0.9767 2.49 0.9936 2.99 0.9986 3.49 0.9998
Table 1: Cumulative standard normal distribution. Only values for z > 0 are provided since if
z < 0 then P [Z ≤ z] = P [Z ≥ −z] = 1 − P [Z ≤ −z] where −z > 0.
-37-
Example 43 Shape of normal distribution with MATLAB
MATLAB
% Plot three normal distributions
% with mean at zero and different standard deviation
x = -10:0.1:10;
subplot (121);
title ( ’ Normal ␣ distributions ␣ with ␣ different ␣ \ sigma ’ );
plot (x , normpdf (x ,0 ,1) , ’r ’ ,x , normpdf (x ,0 ,2) , ’b ’ ,x , normpdf (x ,0 ,3) , ’g ’)
legend ( ’\ sigma =1 ’ , ’\ sigma =2 ’ , ’\ sigma =3 ’)
subplot (122);
title ( ’ Cumulative ␣ normal ␣ distributions ␣ \ n ␣ with ␣ different ␣ \ sigma ’ );
plot (x , normcdf (x ,0 ,1) , ’r ’ ,x , normcdf (x ,0 ,2) , ’b ’ ,x , normcdf (x ,0 ,3) , ’g ’)
legend ( ’\ sigma =1 ’ , ’\ sigma =2 ’ , ’\ sigma =3 ’)
0.4 1
0.9
0.35
σ=1
0.8 σ=1
0.3 σ=2
σ=2
σ=3 0.7
σ=3
0.25
0.6
0.2 0.5
0.4
0.15
0.3
0.1
0.2
0.05
0.1
0 0
−10 −5 0 5 10 −10 −5 0 5 10
Theorem 29: The normal distribution approximates the binomial distribution

q
Normal distribution=limit of binomial as n → ∞, with µ ≈ np, σ ≈ np(1 − p).
ÒP Hwnthi: Using Excel, plot the binomial distribution for (a) n = 20, p = 0.05, (b)
n = 200, p = 0.05, (c) n = 1000, p = 0.05, (d) n = 5000, p = 0.05, and compare it to the
normal distribution with corresponding average and variance.
-38-
Example 44 Calculations of cumulative normal distribution with MATLAB
MATLAB
% Plot standard normal distribution ranges
% 1 sigma , 2 sigma and 3 sigma
x = -4:0.1:4;
x3 = -3:0.1:3; % 3 sigma range
x2 = -2:0.1:2; % 2 sigma range
x1 = -1:0.1:1; % 1 sigma range
figure
subplot (121); title ( ’ 3\ sigma , ␣ 2\ sigma ␣ and ␣ 1\ sigma ␣ ranges ’ ); hold on
plot (x , normpdf (x ,0 ,1) , ’k ’)
area ( x3 , normpdf ( x3 ,0 ,1) , ’ Facecolor ’ , ’r ’)
area ( x2 , normpdf ( x2 ,0 ,1) , ’ Facecolor ’ , ’b ’)
area ( x1 , normpdf ( x1 ,0 ,1) , ’ Facecolor ’ , ’g ’)
hold off
subplot (122); title ( ’ Cumulative ␣ standard ␣ normal ␣ distribution ’ ); hold on ;
plot (x , normcdf (x ,0 ,1) , ’k ’)
plot (x , ones ( length ( x ))* normcdf (+0 ,0 ,1) , ’k ’)
plot (x , ones ( length ( x ))* normcdf ( -3 ,0 ,1) , ’r ’)
plot (x , ones ( length ( x ))* normcdf (+3 ,0 ,1) , ’r ’)
plot (x , ones ( length ( x ))* normcdf ( -2 ,0 ,1) , ’b ’)
plot (x , ones ( length ( x ))* normcdf (+2 ,0 ,1) , ’b ’)
plot (x , ones ( length ( x ))* normcdf ( -1 ,0 ,1) , ’g ’)
plot (x , ones ( length ( x ))* normcdf (+1 ,0 ,1) , ’g ’)
hold off
3σ, 2σ and 1σ ranges Cumulative standard normal distribution

0.4 1
0.9
0.35
0.8
0.3
0.7
0.25
0.6
0.2 0.5
0.4
0.15
0.3
0.1
0.2
0.05
0.1
0 0
−4 −2 0 2 4 −4 −2 0 2 4
-39-
3.3 Joint distributions

3.3.1 Discrete 2-D random variables and distributions
Definition 32: Discrete 2-D random variable

X, Y discrete random variables ⇒ (X, Y ) two-dimensional discrete random variable
Definition 33: Discrete density function of 2-D discrete random variable
fXY (x, y)=P

ˆ [X = x and Y = y]
Necessary and sufficient conditions for fXY to be a DDF
1. fXY (x, y) ≥ 0
P P
2. all x all y fXY (x, y) = 1
Example 45 2D discrete density function
Task 1: Welding two joints

Task 2: Tightening three bolts
X = number of defective welds
Y = number of improperly tightened bolts
\Y
P
0 1 2 3 fX (x) = all y fXY (x, y)
X\
0 0.840 0.030 0.020 0.010 0.900
1 0.060 0.010 0.008 0.002 0.080
2 0.010 0.005 0.004 0.001 0.020
P
fY (y) = all x fXY (x, y) 0.910 0.045 0.032 0.013 1.000
P [no errors] = P [X = 0 and Y = 0] = fXY (0, 0) = 0.840
P [exactly one error] = P [(X = 1 and Y = 0) or (X = 0 and Y = 1)]

= fXY (1, 0) + fXY (0, 1) = 0.060 + 0.030 = 0.090
-40-
Definition 34: Discrete marginal densities
X
fX (x)=
ˆ fXY (x, y)
all y
X
fY (y)=
ˆ fXY (x, y)
all x
Example 46 Discrete marginal densities in Example 45
See previous page.
3.3.2 Continuous 2-D random variables and distributions
Definition 35: Continuous 2-D random variable

X, Y continuous random variables ⇒ (X, Y ) two-dimentional continuous random variable
Definition 36: Continuous density function of 2-D continuous random variable

Necessary and sufficient conditions for fXY to be a 2-D CDF
1. fXY (x, y) ≥ 0, −∞ < x < ∞, −∞ < y < ∞

R∞ R∞
2. −∞ −∞ fXY (x, y)dydx = 1
RbRd
3. P [a ≤ X ≤ b and c ≤ Y ≤ d] = a c fXY (x, y)dydx
Example 47 2-D CDF


 c 8.5 < x < 10.5 120 < y < 240
fXY (x, y) =
 0 elsewhere
R 10.5 R 240 1
8.5 120 c dydx = 1 ⇒ ⇒c= 240
15
P [9 ≤ x ≤ 10 and 125 ≤ y ≤ 140] = = 240
Graphically:
-41-
Definition 37: Continuous marginal densities
Z ∞
fX (x)=
ˆ fXY (x, y)dy
−∞
Z ∞
fY (y)=
ˆ fXY (x, y)dx
−∞
Definition 38: Independent random variables (discrete & continuous)
X, Y independent ⇐⇒f
ˆ XY (x, y) = fX (x)fY (y)
Example 48 Marginal densities & independent variables, Example 45
fXY (0, 0) = 0.84

fX (0)fY (0) = (0.9)(0.91) = 0.819
1
fXY (x, y) = , 8.5 ≤ x ≤ 10.5, 120 ≤ y ≤ 240
240

R 240 1 
1 
fX (x) = 120 240 dy = 2

 1
⇒ fX (x)fY (y) =
R 10.5 1 2
 240
fY (y) = dx =


8.5 240 240

Definition 39: Expected value of function of 2-D random variable

- X, Y discrete random variables
XX
E[H(x, y)]=
ˆ H(x, y)fXY (x, y)
all x all y
|H(x, y)|fXY (x, y) < ∞

P P
provided x y
- X, Y continuous random variables

Z ∞ Z ∞
E[H(x, y)]=
ˆ H(x, y)fXY (x, y)dydx
−∞ −∞
R∞ R∞
provided −∞ −∞ |H(x, y)|fXY (x, y)dydx < ∞
-42-
Example 49 Expected values for Example 45
E[X] = = 0.12
E[Y ] = = 0.148
E[X + Y ] = = 0.268
E[XY ] = = 0.064
Example 50 Expected values for Example 47
E[X] = = 9.5
E[Y ] = = 180
E[X + Y ] =?
E[XY ] = = 1710
Definition 40: Covariance of two random variables

X, Y random variables (discrete or continuous)
σXY = Cov(X, Y )=E[(X
ˆ − µx )(Y − µy )]
Theorem 30: Covariance of two random variables
σXY = Cov(X, Y ) = E[XY ] − E[X]E[Y ]
Example 51 Covariance for variables of Example 45
Cov(X, Y ) = = 0.046
Example 52 Covariance for variables of Example 47
Cov(X, Y ) = =0
Theorem 31: Variance of the sum of two random variables

X, Y random variables
2 2 2 2 2
σaX+bY +c = a σX + b σY + 2abCov(X, Y ) (28)
Contrast to Eq. 25
-43-
Theorem 32: Covariance and independence between random variables

X, Y independent ⇒ Cov(X, Y ) = 0
Proof: By Theorem 30, ,

by Definition 38, X, Y independent ⇒ fXY (x, y) = .
Cov(X, Y ) = 0 ; X, Y independent
Example 53 Cov(X, Y ) = 0 ; X, Y independent
\X -2 -1 1 2 fY (y)
Y\
1 0 1/4 1/4 0 1/2
4 1/4 0 0 1/4 1/2
fX (x) 1/4 1/4 1/4 1/4 1
E[Y ] = 52 , E[X] = 0, E[XY ] = 0 ⇒ Cov(X, Y ) = 0

But X, Y not independent i.e. fXY (x, y) 6= fX (x)fY (y). In fact Y = X 2 .
Definition 41: (Pearson) Correlation coefficient between X, Y

X, Y random variables (discrete or continuous)
Cov(X, Y )
ρXY =
ˆq
Var(X)Var(Y )
Theorem 33: Cauchy-Schwartz inequality for correlation coefficient
−1 ≤ ρXY ≤ 1
or
|ρXY | ≤ 1
Proof: For any W, Z random variables, α real number, we have
E[(αW − Z)2 ] ≥ 0 ⇒
E[α2 W 2 − 2αW Z + Z 2 ] ≥ 0 ⇒
α2 E[W 2 ] − 2αE[W Z] + E[Z 2 ] ≥ 0
-44-
E[W Z]
Let α = E[W 2 ]
. Then, from the above inequality
E[W Z]2 E[W Z]2 E[W Z]2 E[W Z]2

E[W 2 ]\2
r
E[W 2
] − 2
E[W 2 ]
+ E[Z 2
] ≥ 0 ⇒ −
E[W 2 ]
+ E[Z 2
] ≥ 0 ⇒
E[W 2 ]E[Z 2 ]
≤1
Now let W = X − µX , Z = Y − µY to get
Theorem 34: Perfectly correlated random variables are linearly dependent
|ρXY | = 1 ⇔ Y = β0 + β1 X, β0 , β1 ∈ <, β1 6= 0
Proof: |ρXY | = 1 ⇒ ρ2XY = 1 ⇒ . . . ⇒ . . .
⇒ E[(αW − Z)2 ] = 0 ⇒ α |{z}

W − |{z}
Z = 0 ⇒ αX − αµX − Y + µY = 0
X−µX Y −µY
⇒ Y = |{z}
α X + (µY − αµX )
| {z }
β1 β0
Graphically:
y
y
x x
y
x x
y
-45-
Example 54 Variables of Example 53
\X -2 -1 1 2 fY (y)
Y\
E[Y ] = 25 , E[X] = 0, E[XY ] = 0 ⇒
1 0 1/4 1/4 0 1/2
Cov(X, Y ) = 0 ⇒ ρXY = 0.
4 1/4 0 0 1/4 1/2
But Y = X 2 !
fX (x) 1/4 1/4 1/4 1/4 1
ρ=0
3
ρXY = 0 ⇒ X, Y uncorrelated < unrelated
y
2
m
1
X, Y not related linearly
0
-2 -1 0 1 2
x
Example 55 Covariance and correlation for Example 45
fXY (x, y)
\Y 0 1 2 3 fX (x)
X\
0 0.840 0.030 0.020 0.010 0.900
1 0.060 0.010 0.008 0.002 0.080
2 0.010 0.005 0.004 0.001 0.020
fY (y) 0.910 0.045 0.032 0.013 1.000
E[X 2 ] = = 0.16
E[Y 2 ] = = 0.29
E[X] = = 0.12
E[Y ] = = 0.148
Var(X) = E[X 2 ] − E[X]2 = = 0.146
Var(Y ) = E[Y 2 ] − E[Y ]2 = = 0.268
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = = 0.046
ˆ √ Cov(X,Y )
⇒ ρXY = =√ 0.046
= 0.23
Var(X)Var(Y ) (0.146)(0.268)
-46-
Definition 42: Conditional density
fXY (x, y) fXY (x, y)

fX|y (x)=
ˆ fY |x (y)=
ˆ
fY (y) fX (x)
For discrete random variables:
P [X = x and Y = y]
fX|y (x) = P [X = x|Y = y] =
P [Y = y]
| {z }
X, Y discrete random variables
Example 56 Conditional density function for variables of Example 45
\Y 0 1 2 3 fX (x)
X\
0 0.840 0.030 0.020 0.010 0.900
1 0.060 0.010 0.008 0.002 0.080
2 0.010 0.005 0.004 0.001 0.020
fY (y) 0.910 0.045 0.032 0.013 1.000
\Y 0 1 2 3 Sum
X\
0 0.923 0.667 0.625 0.769 2.984 X
1 0.066 0.222 0.250 0.154 0.692 ÒP Hwnthi: Verify: fX|y (x) = 1
all x
2 0.011 0.111 0.125 0.077 0.324
Sum 1.000 1.000 1.000 1.000 4.000
\Y 0 1 2 3 Sum
X\
0 0.933 0.033 0.022 0.011 1.000 X
1 0.750 0.125 0.100 0.025 1.000 ÒP Hwnthi: Verify: fY |x (y) = 1
all y
2 0.500 0.250 0.200 0.050 1.000
Sum 2.183 0.408 0.322 0.086 3.000
-47-
4 STATISTICAL INFERENCE: Estimates and hypothesis testing

State the significance of the Central Limit Theorem and its relationship to the Normal
Distribution
Calculate confidence intervals for population mean, variance, difference between means,
and proportion from sampling results under a variety of conditions
Test a hypothesis for population mean, variance, difference between means, and ratio of
variances using sampling results.
• Use graphical representations to examine data
4.1 Descriptive statistics

Definition 43: Population
A population is a group of items or events
Example 57 Populations
A batch of catalyst pellets
Definition 44: Random variable X associated with a population
A random variable X associated with a population is a characteristic property that takes

values according to an unknown probability density function f (x).
Example 58 Random variable X associated with a population
Diameter of each of the pellets in population of

Example 57
-48-
Definition 45: Sample

A sample is a subset of a population
Example 59 Sample
Two scoops from the batch of pellets of

Example 57
Definition 46: Random sample of size n

Collection of n objects from the entire population, associated with n independent random
variables X1 , X2 , . . . Xn , each with same distribution as X (the random variable for the entire
population).
Example 60 Random Sample
Population: 500 pellets

X:pellet diameter
n=42 (42 pellets scooped out of 500)
X1 :Diameter of pellet #1
..
.
Xn :Diameter of pellet #42
- Which one to scoop?
Sampling is both art and science
-49-
Definition 47: Statistic

A function of the random variables X1 , X2 , . . . , Xn constituting a random sample.
Example 61 Statistic for random sample of catalyst pellets
Statistics:
X1 + . . . + X42
42
min{X1 + . . . + X42 }
max{X1 + . . . + X42 }
- Is X1 − µ a statistic?
Example 62 Checking for defective cartons
Following Example 2 you test 20 cartons (random sample). Each carton can be defective
or non-defective (independent random variable): If X is the number that are defective then
you can form an estimate of the proportion (needs to be less than 10%) as:
X X
=
n 20
Outcome: F F T F F F F F T F F F T F F F F F F F
X : 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
Table 2: Testing for defective cartons
Another approach is to designate as failure in being defective (False) a value of 0 and

a value of 1 (True) to being defective. Then your estimate of the proportion is simply the
average of this random variable.
Example 63 Particle size distributions in heterogeneous catalysts
Datye et al.∗ used electron microscopy to examine the particle size distribution in samples of
Pd/Al2 O3 supported metal catalysts and make conclusions about the sintering mechanism.
Suppose that Table 3 presents measurements on 300 such particles. What can we learn?
∗
Datye et al., Catalysis Today, 111, 59 (2006)
-50-
148.1 122.0 130.9 133.8 131.3 117.6 131.5 143.4 128.9 124.5 131.8 131.3
163.9 134.0 122.5 125.7 132.1 143.7 128.7 134.9 117.8 177.6 144.9 140.4
132.7 141.9 107.4 114.6 133.5 146.7 155.3 131.9 165.7 123.2 129.6 153.9
121.0 131.8 120.9 116.0 144.7 130.6 156.2 122.6 153.7 135.5 129.7 163.4
125.5 144.1 130.2 150.1 162.0 172.9 114.2 127.9 137.7 162.2 118.9 116.9
139.4 186.9 164.6 136.8 159.1 151.2 143.7 130.2 123.8 129.3 132.7 123.2
152.0 138.8 145.1 137.6 128.6 130.3 136.4 138.5 154.8 116.8 138.7 127.0
131.7 135.7 144.0 124.8 133.3 126.3 122.3 131.0 117.5 141.1 123.3 144.1
137.0 132.6 125.1 136.8 129.7 144.0 136.7 125.7 153.6 119.6 168.2 131.9
172.0 114.9 127.4 161.4 143.0 115.9 127.8 151.3 136.3 122.8 127.0 128.6
134.4 142.8 127.3 127.8 106.6 145.1 134.8 142.0 123.3 137.6 135.1 102.2
126.7 141.9 129.0 144.4 143.6 138.1 129.6 160.3 129.1 147.3 135.5 134.0
137.7 127.7 109.7 120.7 127.6 134.2 124.9 124.9 135.7 169.1 136.8 138.9
111.5 144.2 139.0 153.3 142.2 120.6 119.0 146.8 146.6 139.5 117.9 125.5
134.8 127.1 183.8 134.6 138.1 173.8 140.9 124.3 130.6 141.6 166.5 120.2
118.7 144.0 142.7 146.0 153.2 129.9 137.8 139.1 132.2 133.3 127.2 133.7
153.5 133.4 132.8 119.9 122.4 139.7 153.4 139.1 124.6 156.0 124.4 117.6
154.9 133.2 139.4 113.6 161.4 173.7 128.1 123.4 148.9 138.2 166.0 149.9
135.6 154.6 124.2 133.8 114.0 138.2 134.9 137.9 152.8 122.0 123.4 130.6
143.1 117.9 145.1 167.9 154.7 155.3 114.9 126.5 140.4 124.0 158.3 130.6
154.9 130.5 150.7 154.0 124.9 141.8 112.5 138.1 138.9 143.6 150.5 130.1
137.7 131.2 137.3 148.1 139.6 145.1 134.3 130.8 146.3 165.7 122.4 138.3
139.4 153.1 164.4 154.5 145.3 150.1 120.9 118.1 149.7 146.4 155.5 128.3
118.5 175.9 141.9 117.2 117.7 142.8 134.8 143.3 123.1 146.0 133.2 120.7
148.9 155.5 124.0 123.6 140.9 145.8 121.2 154.7 150.1 142.9 131.2 135.4
Table 3: Measurements of the size in nanometers of 300 catalyst particles.
4.2 Graphical methods for data description

4.2.1 Histograms
Example 64 Particle distribution from Example 63

Bin #Number Relative Frequency Minimum Value 102.2 Bin size 5
100 0 0.00 Maximum Value 186.9
105 1 0.00
0.18
110 3 0.01
115 8 0.03 0.16
120 19 0.06
125 36 0.12 0.14
130 33 0.11
0.12
135 48 0.16
Relative Frequency
140 42 0.14 0.10

145 33 0.11
150 20 0.07 0.08
155 25 0.08
160 8 0.03 0.06
165 9 0.03 0.04
170 7 0.02
175 4 0.01 0.02
180 2 0.01
185 1 0.00 0.00
110
120
130
140
150
160
170
180
190
200
100
190 1 0.00
195 0 0.00
200 0 0.00 Particle size (nm)
0 0.00
Total 300
Figure 9: Histogram of the data in Table 3 using Excel FREQUENCY function.
• What bin size should use for histograms?
-51-
Mathematica
ParticleSizes = Import [ " lognorm_particle . dat " ];
bins1 = BinCounts [ ParticleSizes , {90 , 200 , 10}];
bins2 = BinCounts [ ParticleSizes , {90 , 200 , 2}];
BarChart [ bins1 , PlotRange -> All , ChartLabels -> Range [90 , 200 , 10]]
BarChart [ bins2 , PlotRange -> All , ChartLabels -> Range [90 , 200 , 2]]
80 20
60 15
40 10
20 5
0 0
90 100 110 120 130 140 150 160 170 180 190 90
9294
969100
8102
104
106
108
110
112
114
116
118
120
122
124
126
128
130
132
134
136
138
140
142
144
146
148
150
152
154
156
158
160
162
164
166
168
170
172
174
176
178
180
182
184
186
188
190
192
194
196
198
Figure 10: Histograms of frequencies with different bin sizes
Better alternative:
MATLAB
ParticleSizes = importdata ( ’ lognorm_particle . dat ’ );
bins1 =90:10:200; bins2 =90:1:200;
h1 = histcounts ( ParticleSizes , bins1 , ’ Normalization ’ , ’ probability ’ );
h2 = histcounts ( ParticleSizes , bins2 , ’ Normalization ’ , ’ probability ’ );
xmid1 = 0.5*( bins1 (1: end -1)+ bins1 (2: end ));
xmid2 = 0.5*( bins2 (1: end -1)+ bins2 (2: end ));
hold on
bar ( xmid1 , cumsum ( h1 ) , ’b ’ , ’ FaceAlpha ’ ,0.2); plot ( xmid2 , cumsum ( h2 ) , ’o - ’)
1.2
1
Cumulative relative frequency
0.8
0.6
0.4
0.2
0
80 100 120 140 160 180 200
Particle size (nm)
Figure 11: Cumulative histograms of relative frequencies with different bin sizes
-52-
4.2.2 Box-and-whisker plots (Box-plots)
Example 65 Box-and-whisker plot for Example 63
180
160
140
120
100
4.2.3 Pie charts
Example 66 Pie chart for proportion of defective cartons from Example 62
ÒP Hwnthi Many graphical methods exist to visualize and assess the distribution. Search
for “Quantile” plots to compare to normal distribution.
-53-
4.3 Analytical methods for data representation

4.3.1 Sample Statistics
Definition 48: Sample mean
X 1 + . . . + Xn
X̄ =
ˆ
n
Remark: µX 6= X̄
Remark 2: In case of Bernoulli trials, a sample proportion can be calculated as the mean
of a variable that takes values of 0 or 1 (see Example 62)
Definition 49: Sample median
X̃
z }| {
X1 , . . . , X n+1 , . . . , Xn (n odd)
2
X1 , . . . , X n2 , X n2 +1 , . . . , Xn (n even)
| {z }
X̃
Definition 50: Sample variance

n
(Xi − X̄)2
S 2=
X
ˆ
i=1 n−1
Definition 51: Sample Standard deviation

√
ˆ S2
S=
Theorem 35: Sample variance
n n
!2
Xi2
X X
n − Xi
2 i=1 i=1
S =
n(n − 1)
Definition 52: Sample range
ˆ p − Xq
R=X
where
Xp = max {xi }, Xq = min {xi }
1≤i≤n 1≤i≤n
-54-
4.4 The Central Limit Theorem

- Why many distributions are close to normal
Theorem 36: The Central Limit Theorem: Sample averages are close to normal
Let X1 , X2 , . . . , Xn be a random sample (i.e. a collection of independent variables, identically
distributed to each other and X; recall Definition 46). Then the statistic (Definition 47)
X̄ = X1 +...+X
n
n
(sample mean, Definition 48) approaches a normal distribution, as n → ∞,
2
with mean µ and variance σn . Equivalent formulation
!
X̄ − µ
lim √ ∼ N (0; 1)
n→∞ σ/ n
Remark: The variables X and X1 , X2 , . . . , Xn do not have to be normally distributed.
Example 67 Experimental illustration of the Central Limit Theorem

Draw cards from a stack of 52 with reinsertion. Record and plot results as shown below
using the following values
13 cards × 4 A 2 3 4 5 6 7 8 9 10 J Q K
X= 1 2 3 4 5 6 7 8 9 10 10 10 10
x1 x2 x3 x4 x̄
• ÒP Hwnthi What is the expected relative frequency for X = 2 and what for X = 10?
-55-
Figure 12: Example data from an experiment where 80 cards have been drawn with replacement.
The two histograms present frequencies for observed values of single cards and quadruple averages.
-56-
MATLAB
% Draw number of cards with replacement and examine statistics of averages
num_cards =80;
x = floor ( rand (1 , num_cards )*13)+1; % draw 80 random numbers from 1 to 13
for i =1: num_cards ,
if x ( i ) > 10
x ( i )=10; % if value above 10 set it to 10 ( for J , Q , K )
end
end
s2x = var ( x ); sx = std ( x ); % Variance and standard deviation
sprintf ( ’ All ␣ cards : ␣ var = ␣ %f , ␣ std = ␣ % f ␣ \ n ’ ,s2x , sx )
% Take Quadruples of the value of X

j =1;
for i =1:4: num_cards -3
x4 ( j )= mean ( x ( i : i +3)); % Averages of Quadruples ( 4 x ’ s );
j = j +1;
end
s2x4 = var ( x4 ); sx4 = std ( x4 );
sprintf ( ’ Quadruples : ␣ ␣ var = ␣ %f , ␣ std = ␣ % f ␣ \ n ’ , s2x4 , sx4 )
% Take Octuples of the value of X

j =1;
for i =1:8: num_cards -7
x8 ( j )= mean ( x ( i : i +7)); % Averages of Octuples (8 x ’ s );
j = j +1;
end
s2x8 = var ( x8 ); sx8 = std ( x8 );
sprintf ( ’ Octuples : ␣ ␣ var = ␣ %f , ␣ std = ␣ % f ␣ \ n ’ , s2x8 , sx8 )
% Code below is for plotting

xvec =1:10;
histx = hist (x , xvec );
csumx =100* cumsum ( histx )/ sum ( histx ); % calculate and normalize cumulative
histx4 = hist ( x4 , xvec );
csumx4 =100* cumsum ( histx4 )/ sum ( histx4 );
histx8 = hist ( x8 , xvec );
csumx8 =100* cumsum ( histx8 )/ sum ( histx8 );
axis tight ;
subplot (2 ,3 ,1) , stem ( xvec , histx ); title ( ’ All ␣ cards ’ );
subplot (2 ,3 ,2) , stem ( xvec , histx4 ); title ( ’ Quadruples ’ );
subplot (2 ,3 ,3) , stem ( xvec , histx8 ); title ( ’ Octuples ’ );
subplot (2 ,3 ,4) , stem ([1:10] , csumx );
subplot (2 ,3 ,5) , stem ([1:10] , csumx4 );
subplot (2 ,3 ,6) , stem ([1:10] , csumx8 );
ÒP Hwnthi What is your observations on the mean and variance of X and the averages
of quadruples and octuples?
-57-
All cards Quadruples Octuples

25 80 60
50
20
60
40
15
40 30
10
20
20
5
10
0 0 0
0 5 10 0 5 10 0 5 10
100 100 100
80 80 80
60 60 60
40 40 40
20 20 20
0 0 0
0 5 10 0 5 10 0 5 10
Figure 13: MATLAB example output showing histograms and cumulative distributions for 80
cards taken as single, quadruples and octuples
-58-
Example 68 Illustration of the CLT for uniform distribution

 1 0<x<1
f (x) = 
0 else
µX = ,
2
σX = ,
Computer simulation of 10,000 random samples from the above distribution
σ
n=2 ⇒ σX̄ = √X
2
= ,
σ
n=5 ⇒ σX̄ = √X
5
= ,
σX
n = 20 ⇒ σX̄ = √
20
= ,
-59-
MATLAB
% CLT and uniform distribution
num_sample =10000;
bin_vec =0.02:0.04:0.98;
x = rand ( num_sample ,1); % draw 10 ,000 random numbers
std ( x )
subplot (1 ,4 ,1) , hist (x , bin_vec )
legend ( ’n =1 ’ );
j =1;
for i =1:2: num_sample -1;
x2 ( j )= mean ( x ( i : i +1));
j = j +1;
end
std ( x2 )
subplot (1 ,4 ,2) , hist ( x2 , bin_vec )
legend ( ’n =2 ’ );
j =1;
x5 ( j )= mean ( x ( i : i +4));
j = j +1;
end
std ( x5 )
legend ( ’n =5 ’ );
j =1;
x20 ( j )= mean ( x ( i : i +19));
j = j +1;
end
std ( x20 )
legend ( ’n =20 ’ );
500 500 250 140

n=1 n=2 n=5 n=20
120
400 400 200
100
300 300 150

80
60
200 200 100
40
100 100 50
20
0 0 0 0
0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1
-60-
Example 69 Illustration of the CLT for exponential distribution

 5e−5x x>0
f (x) = 
0 else
µX = ,
2
σX = ,
Computer simulation of 10,000 random samples from the above distribution
σ
n=2 ⇒ σX̄ = √X
2
= ,
σ
n=5 ⇒ σX̄ = √X
5
= ,
σX
n = 20 ⇒ σX̄ = √
20
= ,
-61-
MATLAB
% CLT and 5 exp ( -5 x ) distribution
num_sample =20000;
axis tight ;
x = rand ( num_sample ,1); % draw 10 ,000 random numbers

x = -(1/5)* log ( x );
mean ( x )
std ( x )
subplot (1 ,4 ,1) , hist (x ,50)
legend ( ’n =1 ’ );
j =1;
x2 ( j )= mean ( x ( i : i +1));
j = j +1;
end
std ( x2 )
subplot (1 ,4 ,2) , hist ( x2 ,50)
legend ( ’n =2 ’ );
j =1;
x5 ( j )= mean ( x ( i : i +4));
j = j +1;
end
std ( x5 )
subplot (1 ,4 ,3) , hist ( x5 ,50)
legend ( ’n =5 ’ );
j =1;
x20 ( j )= mean ( x ( i : i +19));
j = j +1;
end
std ( x20 )
subplot (1 ,4 ,4) , hist ( x20 ,50)
legend ( ’n =20 ’ );
3500 2500 700 150

n=1 n=2 n=5 n=20
3000 600
2000
2500 500
100
1500
2000 400
1500 300
1000
50
1000 200
500
500 100
0 0 0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3 0.4
-62-
Example 70 Significance of CLT: Are more experimental measurements better than one?
- Unknown distribution f (x) of experimental measurement error for the variable X with
average µ, standard deviation σX .
X1 +...+Xn
- Random sample: X̄ = n
.
σX
- Theorem 36 ⇒ µX̄ = µX , σX̄ = √
n
- We can make point estimates and interval estimates with specific confidence level and
perform hypothesis testing on means, variances and proportions.
-63-
4.5 Point estimation

Definition 53: Estimator
Statistic used to generate an estimate (number) of population parameter.
Notation: θ̂ estimator of θ. Requirements for θ̂:
1. θ̂ unbiased estimator of θ
E[θ̂] = θ
2. Var(θ̂) → 0 as sample size n → ∞
4.5.1 Estimators for µ, p and σ 2
Theorem 37: Sample average is unbiased estimator of population average

X̄ is an unbiased estimator of µ.
h i
X1 +...+Xn
Proof: E[X̄] = E n
= = µ(≡ µX̄ )
Example 71 Estimate of average of stack of cards (Example 67)

x1 +...+x80
Data in Figure 12 → x̄80 = 80
= 6.3 (“True” average µX = 6.538. ÒP Hwnthi:Verify!).
Example 72 Does x̄ (Example 67) appear to be unbiased estimator of µX ?
Revisit Example 67. Values of x̄4

(quadruple averages) for 20 experi-
ments appear spread above and be-
low µX as shown in the histogram of
x̄4 values on the right:
-64-
Theorem 38: Variance of estimator of population average decreases with sample size
σ2 2
Var(X̄) = (≡ σX̄ )
n
Proof: " 2 #
h
2
i X 1 + X2 + . . . + Xn
Var(X̄) =E (X̄ − µ) =E −µ
n
1 h 2
i
= E (X 1 + X 2 + . . . + X n − nµ)
n2
1 h 2
i 1 2 σ2
= E ((X 1 − µ) + (X 2 − µ) + . . . + (X n − µ)) = nσ =
n2 n2 n
Theorem 39: Sample average is BLUE (Best Linear Unbiased Estimator)

Of all unbiased estimators of µ that use a linear combination of data, X̄ has the smallest
variance.
Example 73 Sample average spread for card experiment (Example 67)

2
2 σX σX
σX̄ ≡ Var(X̄4 ) = ⇒ σX̄4 =
4
4 2
Compare spread of X and X̄4 below:
-65-
Theorem 40: Estimator of a proportion in Bernoulli trials

Bernoulli trials, with p:
X X np pq

P̂ = ⇒ µP̂ = E = = p, σP̂2 =
n n n n
Example 74 Estimate the proportion of defective cartons
Following Example 2 if 20 cartons are sampled and the outcome is as listed in Example 62
your estimate of the proportion is:
3
p̄ =
= 0.15
20
As we will see later, the sample size is too small to make an accurate estimate.
Theorem 41: Sample variance is unbiased estimator or population variance
E[S 2 ] = σX
2
Proof: " n # n
" 2 #
2
X(Xi − X̄)2 1 X X1 + X2 + . . . + Xn
E[S ] = E = E Xi −
i=1 n−1 n − 1 i=1 n
n
1 X h
2
i
= E (nX i − X 1 − X 2 − . . . − X n )
(n − 1)n2 i=1
 

n2 Xi2 + X12 + X22 + . . . + Xn2 
−2nXi X1 − 2nXi X2 − . . . − 2nXi Xn +
 
 
 
n
1 +2X1 X2 + 2X1 X3 + . . . + 2X1 Xn +
X  
 
= E 
(n − 1)n i=1 
2
 +2X2 X3 + . . . + 2X2 Xn +


 
...
 
 
 
+2Xn−1 Xn
 




E[n2 Xi2 ] + E[X12 ] + . . . + E[Xn2 ] − 2nE[Xi X1 ] − . . . − 2nE[Xi Xn ] 




 | {z } 
n terms
 


 


 

n
 
1 X 

+ 2E[X1 X2 ] + 2E[X1 X3 ] + . . . + 2E[X1 Xn ] + 
=
(n − 1)n2
| {z }
n−1 terms
 
i=1 






 


 




 + 2E[X2 X3 ] + . . . + 2E[X2 Xn ] + . . . + 2E[Xn−1 Xn ] 




 | {z } | {z } 

n−2 terms 1 term
-66-
 
 n terms n−1 terms 
 z }| { z }| { 
n2 E[X 2 ] + E[X 2 ] + . . . + E[X 2 ] − 2nE[X 2 ] − . . . − 2nE[X 2 ] 

 


n 
 
1  

X
= =
(n − 1)n2 
i=1 
 −2nE[X 2 ] + 2 ((n − 1) + (n − 2) + . . . + 2 + 1) E[X]2



 | {z } 


 

 n(n−1) 
2
1
2 2 2 2 2 2

= n n E[X ] + nE[X ] − 2n(n − 1)E[X] − 2nE[X ] + (n − 1)nE[X]
(n − 1)n2
1
= n(n2 E[X 2 ] − nE[X 2 ] − n(n − 1)E[X]2 ) = E[X 2 ] − E[X]2 = σ 2
(n − 1)n 2
Remark: S is not an unbiased estimator for σ, i.e. E[S] 6= σ
Example 75 Estimate of variance of stack of cards (Example 67)

(x1 − x̄80 )2 + . . . + (x80 − x̄80 )2
Figure 12 → s280 = = 10.01, with x̄80 as in Example 71.
80 − 1
Example 76 Does S 2 (Example 67) appear to be unbiased estimator of σX

2
?
Revisit Example 67. Values of S42

(quadruple averages) for 20 experiments
2
appear spread above and below σX .
Notice the right tail end of the above plot.
ÒP Hwnthi: What is the value of σX

2
for a stack of 52 cards?
Example 77 Estimate of standard deviation of stack of cards

(x1 − x̄80 )2 + . . . + (x80 − x̄80 )2
Figure 12 → s280 = = 10.01, with x̄80 as in Example 71.
s 80 − 1
(x1 − x̄80 )2 + . . . + (x80 − x̄80 )2 √
Therefore → s80 = = 10.01 = 3.16,
80 − 1
-67-
Example 78 Mean and variance of catalyst particles in Example 63
Mathematica
ParticleSizes = Import [ " lognorm_particle . dat " ];
Mean [ ParticleSizes ]
Variance [ ParticleSizes ]
Sta ndardD eviati on [ ParticleSizes ]
X̄ = 137.1 nm, S 2 = 214.9, S = 14.66.
• Common software/calculators provide functions to calculate sample means, variances and

standard deviations. If you do not write your own function, always read documentation
(’HELP’) for a built-in function before you use!
4.6 Interval estimation

Definition 54: Confidence interval for parameter of a population
[L1 , L2 ] is (1 − α)-confidence interval for parameter θ of entire population if:
P [L1 ≤ θ ≤ L2 ] = 1 − α
α is the probability of being out of the integral. If distribution of θ is known, it is easy to

calculate L1 , L2 , such that P [L1 ≤ θ ≤ L2 ] = 1 − α.
Example 79 Confidence interval for average of stack of cards
Revisit Example 67. By CLT (Theorem 36) X̄ = X1 +...+X

80
80
is practically normally distributed
(even though X is not − in fact X follows a discrete distribution!) with µX̄ = µX and
σX̄ = √σX80 . Therefore:
h i
P µX̄ − 2σX̄ ≤ X̄ ≤ µX̄ + 2σX̄ = 0.95 (Why?)
" #  
σX σX X̄ − µX
P µX − 2 √ ≤ X̄ ≤ µX + 2 √ ≈ 0.95 ⇔ P −2 ≤ σX ≤ 2 ≈ 0.95
80 80 √
80
X̄ − µX
because √ is approximately standard normal.
σX / n √
Assume for now that σX ≈ sX = 10.01 = 3.16 (Example 75) ∗ . Then by rearranging:
∗
This temporary “cheating” will be resolved in the next section by the introduction of the T-distribution
-68-
" #
s80 s80
P X̄ − 2 √ ≤ µX ≤ X̄ + 2 √ = 0.95
80 80
for the random variable X̄. Given that we have an experimental value for X̄, we can conclude
" #
s80 s80
P x̄80 − 2 √ ≤ µX ≤ x̄80 + 2 √ = 0.95 ⇒
80 80
" #
3.16 3.16
P 6.3 − 2 √ ≤ µX ≤ 6.3 + 2 √ = 0.95 ⇒ P [5.6 ≤ µX ≤ 7.0] = 0.95
80 80
This is a confidence interval for µX with confidence 95%. Thus beyond the value of 6.3
(Example 71) we can provide with 95% confidence based on the data in Figure 12 that the
true mean is between this interval (recall that the true mean is 6.538) or provide an estimate
of 6.3±0.7 with 95% confidence.
Confidence interval:
σ σ
X̄ − zα/2 √ < µ < X̄ + zα/2 √
n n
• Careful: zα/2 is the point where P [Z ≥ z a2 ] = α2

or the cumulative probability of the distribution at
zα/2 is equal to 1 − a2 !!! Learn how to find these
either using software or Table 1
ÒP Hwnthi: Calculate similar confidence intervals for confidence levels 67% and ∼99.7%.
How to estimate a confidence interval for a parameter θ of a population
1. Construct random variable Y (θ) which has θ as only unknown parameter and known
distribution if θ is fixed
2. Given α, find N1 , N2 such that P [N1 ≤ Y (θ) ≤ N2 ] = 1 − α
3. Solve for θ to find L1 , L2 such that P [L1 ≤ θ ≤ L2 ] = 1 − α
-69-
4.6.1 Estimate confidence interval for µ
• µ, σ of population are unknown. If X follows the normal distribution:
Theorem 42: Statistic involving µ that follows T-distribution
X̄ − µ
X ∼ N (µ, σ 2 ) ⇒ √ follows the T-distribution with n − 1 degrees of freedom.
S/ n
Definition 55: Student’s T-distribution with γ degrees of freedom (DOF)

γ+1
Γ 2 1 γ
f (t) = √ γ+1 − ∞ < t < ∞, µ = 0, σ2 = , γ>2
Γ γ
πγ 1 +
2
t 2 γ−2
2
γ
R ∞ z−1 −t
where gamma function is defined as Γ(z)=
ˆ 0 t e dt. f (t) is a function of both t and γ.
Figure 14: T-distribution for various degrees of freedom. Note the shape change as γ increases.
The right figure introduces a notation of tr as the point where the area to the right is r.
• Given α, find N1 , N2 so P [N1 ≤ Y (µ) ≤ N2 ] = 1 − α: T−distribution

" #
X̄ − µ
P −tα/2 ≤ √ ≤ tα/2 = 1 − α
S/ n
• Find L1 , L2 such that P [L1 ≤ µ ≤ L2 ] = 1 − α:

" #
tα/2 S tα/2 S
P X̄ − √ ≤ µ ≤ X̄ + √ =1−α
n n
i.e. 100(1 − α)% confidence bounds on µ are x̄ ± tα/2 √sn . As with the normal distribution,
Tables or software is required for efficient construction of confidence intervals.
-70-
Table 4: Cumulative T-distribution. Table values: P [T ≤ t].

Table provides cumulative probabilities for given t and γ.
-71-
Degrees of P [T t]
freedom 0.6000 0.7500 0.9000 0.9500 0.9750 0.9900 0.9950 0.9990 0.9995
1 0.3249 1.0000 3.0777 6.3138 12.7062 31.8205 63.6567 318.3088 636.6192
2 0.2887 0.8165 1.8856 2.9200 4.3027 6.9646 9.9248 22.3271 31.5991
3 0.2767 0.7649 1.6377 2.3534 3.1824 4.5407 5.8409 10.2145 12.9240
4 0.2707 0.7407 1.5332 2.1318 2.7764 3.7469 4.6041 7.1732 8.6103
5 0.2672 0.7267 1.4759 2.0150 2.5706 3.3649 4.0321 5.8934 6.8688
6 0.2648 0.7176 1.4398 1.9432 2.4469 3.1427 3.7074 5.2076 5.9588
7 0.2632 0.7111 1.4149 1.8946 2.3646 2.9980 3.4995 4.7853 5.4079
8 0.2619 0.7064 1.3968 1.8595 2.3060 2.8965 3.3554 4.5008 5.0413
9 0.2610 0.7027 1.3830 1.8331 2.2622 2.8214 3.2498 4.2968 4.7809
10 0.2602 0.6998 1.3722 1.8125 2.2281 2.7638 3.1693 4.1437 4.5869
11 0.2596 0.6974 1.3634 1.7959 2.2010 2.7181 3.1058 4.0247 4.4370
12 0.2590 0.6955 1.3562 1.7823 2.1788 2.6810 3.0545 3.9296 4.3178
13 0.2586 0.6938 1.3502 1.7709 2.1604 2.6503 3.0123 3.8520 4.2208
14 0.2582 0.6924 1.3450 1.7613 2.1448 2.6245 2.9768 3.7874 4.1405
15 0.2579 0.6912 1.3406 1.7531 2.1314 2.6025 2.9467 3.7328 4.0728
16 0.2576 0.6901 1.3368 1.7459 2.1199 2.5835 2.9208 3.6862 4.0150
17 0.2573 0.6892 1.3334 1.7396 2.1098 2.5669 2.8982 3.6458 3.9651
18 0.2571 0.6884 1.3304 1.7341 2.1009 2.5524 2.8784 3.6105 3.9216
19 0.2569 0.6876 1.3277 1.7291 2.0930 2.5395 2.8609 3.5794 3.8834
20 0.2567 0.6870 1.3253 1.7247 2.0860 2.5280 2.8453 3.5518 3.8495
21 0.2566 0.6864 1.3232 1.7207 2.0796 2.5176 2.8314 3.5272 3.8193
22 0.2564 0.6858 1.3212 1.7171 2.0739 2.5083 2.8188 3.5050 3.7921
23 0.2563 0.6853 1.3195 1.7139 2.0687 2.4999 2.8073 3.4850 3.7676
24 0.2562 0.6848 1.3178 1.7109 2.0639 2.4922 2.7969 3.4668 3.7454
25 0.2561 0.6844 1.3163 1.7081 2.0595 2.4851 2.7874 3.4502 3.7251
26 0.2560 0.6840 1.3150 1.7056 2.0555 2.4786 2.7787 3.4350 3.7066
27 0.2559 0.6837 1.3137 1.7033 2.0518 2.4727 2.7707 3.4210 3.6896
28 0.2558 0.6834 1.3125 1.7011 2.0484 2.4671 2.7633 3.4082 3.6739
29 0.2557 0.6830 1.3114 1.6991 2.0452 2.4620 2.7564 3.3962 3.6594
30 0.2556 0.6828 1.3104 1.6973 2.0423 2.4573 2.7500 3.3852 3.6460
31 0.2555 0.6825 1.3095 1.6955 2.0395 2.4528 2.7440 3.3749 3.6335
32 0.2555 0.6822 1.3086 1.6939 2.0369 2.4487 2.7385 3.3653 3.6218
33 0.2554 0.6820 1.3077 1.6924 2.0345 2.4448 2.7333 3.3563 3.6109
34 0.2553 0.6818 1.3070 1.6909 2.0322 2.4411 2.7284 3.3479 3.6007
35 0.2553 0.6816 1.3062 1.6896 2.0301 2.4377 2.7238 3.3400 3.5911
36 0.2552 0.6814 1.3055 1.6883 2.0281 2.4345 2.7195 3.3326 3.5821
37 0.2552 0.6812 1.3049 1.6871 2.0262 2.4314 2.7154 3.3256 3.5737
38 0.2551 0.6810 1.3042 1.6860 2.0244 2.4286 2.7116 3.3190 3.5657
39 0.2551 0.6808 1.3036 1.6849 2.0227 2.4258 2.7079 3.3128 3.5581
40 0.2550 0.6807 1.3031 1.6839 2.0211 2.4233 2.7045 3.3069 3.5510
41 0.2550 0.6805 1.3025 1.6829 2.0195 2.4208 2.7012 3.3013 3.5442
42 0.2550 0.6804 1.3020 1.6820 2.0181 2.4185 2.6981 3.2960 3.5377
43 0.2549 0.6802 1.3016 1.6811 2.0167 2.4163 2.6951 3.2909 3.5316
44 0.2549 0.6801 1.3011 1.6802 2.0154 2.4141 2.6923 3.2861 3.5258
45 0.2549 0.6800 1.3006 1.6794 2.0141 2.4121 2.6896 3.2815 3.5203
46 0.2548 0.6799 1.3002 1.6787 2.0129 2.4102 2.6870 3.2771 3.5150
47 0.2548 0.6797 1.2998 1.6779 2.0117 2.4083 2.6846 3.2729 3.5099
48 0.2548 0.6796 1.2994 1.6772 2.0106 2.4066 2.6822 3.2689 3.5051
49 0.2547 0.6795 1.2991 1.6766 2.0096 2.4049 2.6800 3.2651 3.5004
50 0.2547 0.6794 1.2987 1.6759 2.0086 2.4033 2.6778 3.2614 3.4960
Table 5: Cumulative T-distribution. Table values: P [T ≤ t].

Table provides t for given P and γ.
-72-
Example 80 Confidence interval for average of stack of cards
Revisit Example 67,Figure 12 and 79.
x1 + . . . + x80 (x1 − x̄80 )2 + . . . + (x80 − x̄80 )2

x̄80 = = 6.3, s280 = = 10.01
80 80 − 1
95% confidence, α = 0.05 ⇒ α2 = 0.025. Then for n−1 = 79 degrees of freedom, tα/2 = 1.990
(area to the right of t = 1.990 with γ = 79 is 0.025) and
√  
 5.6 
s80 10.01
x̄80 ± tα/2 √ = 6.3 ± 1.990 √ = 6.3 ± 0.7 =  with 95% confidence
n 80 7.0 
Compare result with normal approximation in Example 79.
• MATLAB check output of: tinv(0.975,79) and tcdf(1.9905,79)
• Mathematica: InverseCDF[StudentTDistribution[79], 0.975] and

CDF[StudentTDistribution[79], 1.99045]
• Excel: T.INV(0.975,79) and T.DIST(1.99045,79,1). Careful, not TINV and TDIST...
Tables/software often provide critical values (areas to the right) instead of CDF. Make sure
you understand what is available.
Table provides cumulative

distribution for given t, DOF
Table provides critical

values of t for
given α, DOF
Figure 15: Tables can provide either critical values or cumulative distributions
-73-
Theorem 43: T-distribution and the normal distribution

The T-distribution approaches the normal distribution for large number of DOF (see Fig-
ure 14, above and MATLAB script below).
MATLAB
% Plot T - distribution as a function of degrees of freedom
% and compare to normal
t = -4:0.1:4;
subplot (1 ,2 ,1) ,
plot (t , tpdf (t ,1) , t , tpdf (t ,3) , t , tpdf (t ,6) ,...
t , tpdf (t ,15) , t , normpdf (t ,0 ,1) , ’o ’ );
legend ( ’\ gamma ␣ =1 ’ , ’\ gamma ␣ =3 ’ , ’\ gamma ␣ =6 ’ ,...
’\ gamma =15 ’ , ’ st . ␣ normal ’ );
title ( ’ Probability ␣ Distribution ’)
subplot (1 ,2 ,2) ,
plot (t , tcdf (t ,1) , t , tcdf (t ,3) , t , tcdf (t ,6) ,...
t , tcdf (t ,15) , t , normcdf (t ,0 ,1) , ’o ’ );
’\ gamma =15 ’ , ’ st . ␣ normal ’ );
title ( ’ Cumulative ␣ Probability ␣ Distribution ’)
Probability Distribution Cumulative Probability Distribution

0.4 1
0.9
0.35 γ =1
γ =3 0.8
0.3
γ =6 0.7
0.25 γ=15
0.6
st. normal
0.2 0.5
0.4 γ =1
0.15
γ =3
0.3
0.1 γ =6
0.2 γ=15
0.05 st. normal
0.1
0 0
−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4
ÒP Hwnthi Calculate in Excel, MATLAB or Mathematica the CDF for the standard
normal at z = 2 and the student-t distribution at t = 2 and γ = 5, 10, 100, 500 and
examine how by increasing DOF values values converge.
-74-
Example 81 Large vs Small sample and increasing confidence level
Revisit Example 67 and Figure 12 but use only the first 8 points for a 95% confidence interval
on µ.
6 + 10 + 9 + 2 + 4 + 7 + 10 + 9
x̄8 = = 7.126, s28 = 8.7, s8 = 2.95
8
Table 5
• DOF=7, 95% confidence interval −−−−→ t = 2.3646
7.126 ±t0.025 2.95
√ → 7.126 ± 2.3646 ∗ 1.043 = 7.12 ± 2.47. Compare to Example 80.
8
Table 5
• DOF=7, 99% confidence interval −−−−→ t = 3.4995
7.126 ±t0.005 2.95
√ → 7.126 ± 3.4995 ∗ 1.043 = 7.12 ± 3.65
8
• Compare Example 80 and Example 79 with the normal distribution.

If we use the standard normal for the sample of 8 data:
Table 1
95% confidence interval −−−−→ z = 1.96
7.126 ±z0.025 2.95
√ → 7.126 ± 1.96 ∗ 1.043 = 7.12 ± 2.0 !
8
Example 82 Confidence interval for average in air quality data
SO2 , NOx data:
52.7 43.9 41.7 71.5 47.6 55.1

62.2 56.5 33.4 61.8 54.3 50.0
45.3 63.4 53.9 65.5 66.6 70.0
52.4 38.6 46.1 44.4 60.7 56.4
s2 = 101.40, x̄ = 53.92
Let α = 0.05. Then n − 1 = 23 degrees of freedom tα/2 = 2.069 and

√  
s 101.40  49.67 
x̄ ± tα/2 √ = 53.92 ± 2.069 √ = with 95% confidence
n 24  58.17 
• The t-distribution is applicable when X follows the normal distribution. How can we check?
• If we do not know whether X ∼ N (µ, σ 2 ) what can we do? Hint: Check Examples 68 and
69.
-75-
4.6.2 Estimate confidence interval for p
Based on CLT theorem, Theorem 40, P̂ = Xn will follow a normal distribution if n large
enough (recall Theorem 29). In practice, the approximation with a normal will be accurate
if np̂ ≥ 5. If p is the unknown proportion, µP̂ = p and σP̂2 = pq
n
, 1 − α confidence interval:
 
P̂ − p
P −zα/2 < q < zα/2  = 1 − α (29)
pq/n
Known p̂ = x
n
and q̂ = 1 − p̂. Approximate solution,∗ assume pq ≈ p̂q̂:
s s
p̂q̂ p̂q̂
p̂ − zα/2 < p < p̂ + zα/2 (30)
n n
Example 83 Confidence interval for proportion of defective cartons
Following Example 2 if 20 cartons are sampled and the outcome is as listed in Example 62,
then np̂ = 3. You increase the sample taken to 100 cartons and find 7 defective. Then
p̂ = 0.07 and a 95% confidence interval is
s s
0.07 ∗ 0.93 0.07 ∗ 0.93
0.07 − 1.96 < p < 0.07 + 1.96 ⇒ p = 0.07 ± 0.05
100 100
ÒP Hwnthi Provide a 99% confidence interval.
4.6.3 Estimate confidence interval for σ, σ 2
- µ, σ of population are unknown. If X follows the normal distribution:
Theorem 44: Statistic involving σ 2 that follows Chi-square-distribution

N
2 (Xi − X̄)2
X (n − 1)S 2
X ∼ N (µ, σ ) ⇒ 2
≡ 2
follows the χ2 -distribution with n − 1 degrees
i=1 σ σ
of freedom.
∗
More accurate expressions are provided in statistics textbooks and can be useful for small samples
-76-
Definition 56: Chi-square (χ2 ) distribution with γ degrees of freedom

1
xγ/2−1 e−x/2


 x>0
2γ/2 Γ(γ/2)

f (x) = µ = γ, σ 2 = 2γ



 0 elsewhere
Figure 16: χ2 -distribution for various degrees of freedom. Note that χ2 ≥ 0 and no-symmetry
Theorem 45: Chi-square-distribution and the normal distribution

The χ2 -distribution approaches the normal distribution for large DOF (see Figure 16 above).
MATLAB
% Plot chi2 - distribution as a function of degrees of freedom
% and compare to normal ( why 15 and 6.325 for parameters ?)
t =0:0.5:40;
subplot (1 ,2 ,1) ,
plot (t , chi2pdf (t ,1) , t , chi2pdf (t ,5) , t , chi2pdf (t ,10) ,...
t , chi2pdf (t ,20) , t , normpdf (t ,20 ,6.325) , ’o ’ );
’\ gamma =20 ’ , ’ normal ’ );
title ( ’ Probability ␣ Distribution ’)
subplot (1 ,2 ,2) ,
plot (t , chi2cdf (t ,1) , t , chi2cdf (t ,5) , t , chi2cdf (t ,10) ,...
t , chi2cdf (t ,20) , t , normcdf (t ,20 ,6.325) , ’o ’ );
’\ gamma =15 ’ , ’ normal ’ );
title ( ’ Cumulative ␣ Probability ␣ Distribution ’)
-77-
Probability Distribution Cumulative Probability Distribution

0.45 1
0.4 0.9
0.8
0.35 γ =1
γ =5 0.7
0.3
γ =10 γ =1
0.6
0.25 γ=20 γ =3
normal 0.5 γ =6
0.2
0.4 γ=15
0.15 normal
0.3
0.1
0.2
0.05 0.1
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
• Values for the cumulative distribution using Excel function: CHIDIST(x,deg f).
(Careful! CHIDIST calculates right tails area! For the inverse you can use CHIINV)
• Given α, find N1 , N2 such that P [N1 ≤ Y (σ 2 ) ≤ N2 ] = 1 − α:

" #
(n − 1)S 2
P χ21− α ≤ 2
≤ χ2α = 1 − α
2 σ 2
• Solve for σ 2 to find L1 , L2 such that P [L1 ≤ σ 2 ≤ L2 ] = 1 − α:

 
(n − 1)S 2 2 (n − 1)S 2 
P ≤ σ ≤ =1−α
χ2α χ21− α
2 2
" #
2 (n−1)S 2 (n−1)S 2
i.e. 100(1 − α)% confidence bounds on σ are χ2α
, χ2 α
2 1− 2
α α 1 1
Note that 2
<1− 2
⇒ χ2α > χ21− α ⇒ χ2α
< χ21− α
!!!
2 2
2 2
-78-
Table 6: Cumulative Chi-square-distribution. Table values: P [X ≤ χ2 ].
-79-
Degrees of P[ X x2]
freedom 0.005 0.010 0.025 0.050 0.075 0.100 0.250 0.500 0.750 0.900 0.925 0.950 0.975 0.990 0.995
1 0.0000 0.0002 0.0010 0.0039 0.0089 0.0158 0.1015 0.4549 1.3233 2.7055 3.1701 3.8415 5.0239 6.6349 7.8794
2 0.0100 0.0201 0.0506 0.1026 0.1559 0.2107 0.5754 1.3863 2.7726 4.6052 5.1805 5.9915 7.3778 9.2103 10.5966
3 0.0717 0.1148 0.2158 0.3518 0.4720 0.5844 1.2125 2.3660 4.1083 6.2514 6.9046 7.8147 9.3484 11.3449 12.8382
4 0.2070 0.2971 0.4844 0.7107 0.8969 1.0636 1.9226 3.3567 5.3853 7.7794 8.4963 9.4877 11.1433 13.2767 14.8603
5 0.4117 0.5543 0.8312 1.1455 1.3937 1.6103 2.6746 4.3515 6.6257 9.2364 10.0083 11.0705 12.8325 15.0863 16.7496
6 0.6757 0.8721 1.2373 1.6354 1.9415 2.2041 3.4546 5.3481 7.8408 10.6446 11.4659 12.5916 14.4494 16.8119 18.5476
7 0.9893 1.2390 1.6899 2.1673 2.5277 2.8331 4.2549 6.3458 9.0371 12.0170 12.8834 14.0671 16.0128 18.4753 20.2777
8 1.3444 1.6465 2.1797 2.7326 3.1440 3.4895 5.0706 7.3441 10.2189 13.3616 14.2697 15.5073 17.5345 20.0902 21.9550
9 1.7349 2.0879 2.7004 3.3251 3.7847 4.1682 5.8988 8.3428 11.3888 14.6837 15.6309 16.9190 19.0228 21.6660 23.5894
10 2.1559 2.5582 3.2470 3.9403 4.4459 4.8652 6.7372 9.3418 12.5489 15.9872 16.9714 18.3070 20.4832 23.2093 25.1882
11 2.6032 3.0535 3.8157 4.5748 5.1243 5.5778 7.5841 10.3410 13.7007 17.2750 18.2942 19.6751 21.9200 24.7250 26.7568
12 3.0738 3.5706 4.4038 5.2260 5.8175 6.3038 8.4384 11.3403 14.8454 18.5493 19.6020 21.0261 23.3367 26.2170 28.2995
13 3.5650 4.1069 5.0088 5.8919 6.5238 7.0415 9.2991 12.3398 15.9839 19.8119 20.8966 22.3620 24.7356 27.6882 29.8195
14 4.0747 4.6604 5.6287 6.5706 7.2415 7.7895 10.1653 13.3393 17.1169 21.0641 22.1795 23.6848 26.1189 29.1412 31.3193
CBE301 Lecture Notes - Part 2
15 4.6009 5.2293 6.2621 7.2609 7.9695 8.5468 11.0365 14.3389 18.2451 22.3071 23.4522 24.9958 27.4884 30.5779 32.8013
16 5.1422 5.8122 6.9077 7.9616 8.7067 9.3122 11.9122 15.3385 19.3689 23.5418 24.7155 26.2962 28.8454 31.9999 34.2672
17 5.6972 6.4078 7.5642 8.6718 9.4522 10.0852 12.7919 16.3382 20.4887 24.7690 25.9705 27.5871 30.1910 33.4087 35.7185
18 6.2648 7.0149 8.2307 9.3905 10.2053 10.8649 13.6753 17.3379 21.6049 25.9894 27.2178 28.8693 31.5264 34.8053 37.1565
19 6.8440 7.6327 8.9065 10.1170 10.9653 11.6509 14.5620 18.3377 22.7178 27.2036 28.4581 30.1435 32.8523 36.1909 38.5823
20 7.4338 8.2604 9.5908 10.8508 11.7317 12.4426 15.4518 19.3374 23.8277 28.4120 29.6920 31.4104 34.1696 37.5662 39.9968
21 8.0337 8.8972 10.2829 11.5913 12.5041 13.2396 16.3444 20.3372 24.9348 29.6151 30.9200 32.6706 35.4789 38.9322 41.4011
22 8.6427 9.5425 10.9823 12.3380 13.2819 14.0415 17.2396 21.3370 26.0393 30.8133 32.1424 33.9244 36.7807 40.2894 42.7957
23 9.2604 10.1957 11.6886 13.0905 14.0648 14.8480 18.1373 22.3369 27.1413 32.0069 33.3597 35.1725 38.0756 41.6384 44.1813
24 9.8862 10.8564 12.4012 13.8484 14.8525 15.6587 19.0373 23.3367 28.2412 33.1962 34.5723 36.4150 39.3641 42.9798 45.5585
-80-
25 10.5197 11.5240 13.1197 14.6114 15.6447 16.4734 19.9393 24.3366 29.3389 34.3816 35.7803 37.6525 40.6465 44.3141 46.9279
26 11.1602 12.1981 13.8439 15.3792 16.4410 17.2919 20.8434 25.3365 30.4346 35.5632 36.9841 38.8851 41.9232 45.6417 48.2899
27 11.8076 12.8785 14.5734 16.1514 17.2414 18.1139 21.7494 26.3363 31.5284 36.7412 38.1840 40.1133 43.1945 46.9629 49.6449
28 12.4613 13.5647 15.3079 16.9279 18.0454 18.9392 22.6572 27.3362 32.6205 37.9159 39.3801 41.3371 44.4608 48.2782 50.9934
29 13.1211 14.2565 16.0471 17.7084 18.8530 19.7677 23.5666 28.3361 33.7109 39.0875 40.5727 42.5570 45.7223 49.5879 52.3356
30 13.7867 14.9535 16.7908 18.4927 19.6639 20.5992 24.4776 29.3360 34.7997 40.2560 41.7619 43.7730 46.9792 50.8922 53.6720
31 14.4578 15.6555 17.5387 19.2806 20.4780 21.4336 25.3901 30.3359 35.8871 41.4217 42.9479 44.9853 48.2319 52.1914 55.0027
32 15.1340 16.3622 18.2908 20.0719 21.2951 22.2706 26.3041 31.3359 36.9730 42.5847 44.1309 46.1943 49.4804 53.4858 56.3281
33 15.8153 17.0735 19.0467 20.8665 22.1151 23.1102 27.2194 32.3358 38.0575 43.7452 45.3110 47.3999 50.7251 54.7755 57.6484
34 16.5013 17.7891 19.8063 21.6643 22.9379 23.9523 28.1361 33.3357 39.1408 44.9032 46.4884 48.6024 51.9660 56.0609 58.9639
35 17.1918 18.5089 20.5694 22.4650 23.7633 24.7967 29.0540 34.3356 40.2228 46.0588 47.6631 49.8018 53.2033 57.3421 60.2748
36 17.8867 19.2327 21.3359 23.2686 24.5911 25.6433 29.9730 35.3356 41.3036 47.2122 48.8353 50.9985 54.4373 58.6192 61.5812
37 18.5858 19.9602 22.1056 24.0749 25.4214 26.4921 30.8933 36.3355 42.3833 48.3634 50.0051 52.1923 55.6680 59.8925 62.8833
38 19.2889 20.6914 22.8785 24.8839 26.2540 27.3430 31.8146 37.3355 43.4619 49.5126 51.1726 53.3835 56.8955 61.1621 64.1814
39 19.9959 21.4262 23.6543 25.6954 27.0889 28.1958 32.7369 38.3354 44.5395 50.6598 52.3378 54.5722 58.1201 62.4281 65.4756
40 20.7065 22.1643 24.4330 26.5093 27.9258 29.0505 33.6603 39.3353 45.6160 51.8051 53.5010 55.7585 59.3417 63.6907 66.7660
41 21.4208 22.9056 25.2145 27.3256 28.7648 29.9071 34.5846 40.3353 46.6916 52.9485 54.6620 56.9424 60.5606 64.9501 68.0527
42 22.1385 23.6501 25.9987 28.1440 29.6058 30.7654 35.5099 41.3352 47.7663 54.0902 55.8211 58.1240 61.7768 66.2062 69.3360
43 22.8595 24.3976 26.7854 28.9647 30.4487 31.6255 36.4361 42.3352 48.8400 55.2302 56.9783 59.3035 62.9904 67.4593 70.6159
44 23.5837 25.1480 27.5746 29.7875 31.2934 32.4871 37.3631 43.3352 49.9129 56.3685 58.1336 60.4809 64.2015 68.7095 71.8926
45 24.3110 25.9013 28.3662 30.6123 32.1399 33.3504 38.2910 44.3351 50.9849 57.5053 59.2872 61.6562 65.4102 69.9568 73.1661
46 25.0413 26.6572 29.1601 31.4390 32.9882 34.2152 39.2197 45.3351 52.0562 58.6405 60.4390 62.8296 66.6165 71.2014 74.4365
47 25.7746 27.4158 29.9562 32.2676 33.8380 35.0814 40.1492 46.3350 53.1267 59.7743 61.5892 64.0011 67.8206 72.4433 75.7041
48 26.5106 28.1770 30.7545 33.0981 34.6895 35.9491 41.0794 47.3350 54.1964 60.9066 62.7378 65.1708 69.0226 73.6826 76.9688
49 27.2493 28.9406 31.5549 33.9303 35.5426 36.8182 42.0104 48.3350 55.2653 62.0375 63.8848 66.3386 70.2224 74.9195 78.2307
50 27.9907 29.7067 32.3574 34.7643 36.3971 37.6886 42.9421 49.3349 56.3336 63.1671 65.0303 67.5048 71.4202 76.1539 79.4900
Table 7: Cumulative Chi-square-distribution. Table provides values of χ2 for given probabilities.

Manolis Doxastakis
Example 84 Confidence interval for variance

Observations of random variable:
3.4 3.6 4.0 0.4 2.0
− ( xi ) 2
P 2 P
2 n xi
3.0 3.1 4.1 1.4 2.5 s = = 1.407
n(n − 1)
1.4 2.0 3.1 1.8 1.6
α α
3.5 2.5 1.7 5.1 0.7 Let α = 0.05 ⇒ = 0.025, 1 − = 0.975
2 2
4.2 1.5 3.0 3.9 3.0
n − 1 = 24 DOF
(n − 1)s2 (24)(1.407)
2
= = 0.857
χ0.025 39.4
(n − 1)s2 (24)(1.407)
2
= = 2.723
χ0.975 12.4
i.e. 0.857 ≤ σ 2 ≤ 2.723 and 0.926 ≤ σ ≤ 1.650 with 95% likelihood (confidence).
MATLAB Check: chi2inv(0.975,24)=39.3641 6= χ20.975

Mathematica InverseCDF[ChiSquareDistribution[24], 0.025] =12.4012 6= χ20.025
How are the above values used in calculations?
Example 85 Variance in catalyst size in Example 63
The average particle size in sample of Example 63 can be found as 137.1 nm. The sample
variance s2 = 215 and the sample standard deviation s = 14.7.
99% Confidence interval for the variance σ 2 in particle size:
α α
1 − α = 0.99 ⇒ α = 0.01, = 0.005, 1 − = 0.995, n − 1 = 299
2 2
299 ∗ 215 299 ∗ 215
χ20.005 = 365.7, χ20.995 = 239.8 ⇒ < σ2 < ⇒ 175.8 < σ 2 < 268.1
365.7 239.8
or 13.3 < σ < 16.4, where χ20.005 and χ20.995 found using software for 299 DOF.
Since γ large, normal with µ = γ and σ 2 = 2γ:

√ √
χ20.005 ≈ 2γz0.005 + γ = 2 ∗ 299 ∗ 2.576 + 299 = 362
√ √
χ20.995 ≈ 2γz0.995 + γ = 2 ∗ 299 ∗ (−2.576) + 299 = 236
-81-
4.7 Hypothesis testing

Definition 57: What is a Hypothesis
Assertion or conjecture about one or more populations
Definition 58: What is a Hypothesis Testing?

Use sampled data to test whether a hypothesis is true.
How is Hypothesis Testing done?
• Form the hypothesis to be tested; call it H1
Definition 59: Research or alternative hypothesis

H1 is called the research or alternative hypothesis.
• Form the converse of the hypothesis H1 ; call it H0
Definition 60: Null hypothesis

H0 is called the null hypothesis.
• Assume that H0 is true, and hope to reject H0 by contradiction with the data, as follows:
1. Assuming is true, create a statistic with a known probability distribution.
2. Using sampled data, calculate the value of that statistic.
3. If that value is in a range that is “very” improbable, based on the known probability
distribution of the previous step 1∗ , then it is “very” unlikely that H0 is true. Conse-
quently, it is “very” likely that H1 is true. Otherwise, H1 cannot be asserted.
Definition 61: Acceptance or rejection of a hypothesis

There is “high” or “low” probability the hypothesis is true.
Acceptance or rejection always corresponds to a level of confidence supported by sampled

data.
∗
The range of improbable values of the statistic is roughly 2 or 3 standard deviations of the statistic (at
∼5% or ∼0% improbability level) away from the average of the statistic, according to the Rule-of-thumb
after Example 34
-82-
Example 86 Is p > 0.10 in a 200,000 order of cartons (Example 2)?

Claim: If an item is picked at random from a shipment, then p ≡ P [defective] > 0.1.
Preliminary thinking: You design a test with random sampling of 20 items. How many
defective cartons should be found if n = 20 to claim p > 0.1 with specific level of confidence?
Hypothesis-testing thinking:
1. Define H1 : p > 0.1 (want to assert)
2. Define H0 : p ≤ 0.1 (want to reject) (equal always on H0 )
3. Assume H0 : p ≤ 0.1 is true (take p = 0.1)
(a) Define test statistic:

X =number of defective items out of 20; follows binomial distribution
P [X ≥ 4 | p = 0.1] = f (4) + f (5) + . . . + f (20) = 0.09 + 0.032 + . . . = 0.133=α;
ˆ
The calculation is easier performed using the CDF P [X ≥ 4] = 1 − P [X ≤ 3].
Figure 17: P [X = x] and P [X ≥ x] for p ≤ 0.1, n = 20
For any value of p < 0.1, P [X ≥ 4] is smaller (compare the CDFs). Therefore if
you reject H0 for p = 0.1 you can reject for p < 0.1.
Agree to reject H0 if observed x ≥ 4 (this a design parameter)
(b) Collect experimental data: x = 5, i.e. 5 defective items found in batch of 20.
(c) Was that likely? NO, based on step above. Therefore, H0 is rejected.
H1 : p > 0.1 is accepted with confidence ≥ 1 − 0.133 = 87% in the above design ⇔ x ≥ 4
(the confidence level is in practice your design parameter).
• What is the true value of p?
• How to increase confidence level?
-83-
4.7.1 Possible Errors In Testing A Statistical Hypothesis
Definition 62: Type I error in hypothesis testing (False alarm)

H0 is mistakenly rejected, even though it is true.
Definition 63: Type II error in hypothesis testing (Missed alarm)

H0 is mistakenly accepted, even though it is false.
Definition 64: Probabilities of type I and type II errors in hypothesis testing

β =
ˆ P [type II error]
α =
ˆ P [type I error]
• Calculation of β requires knowledge of p. 1 − β is called the power of the test and is

examined under hypothetical scenarios to compare between different tests.
• Want α, β ≈ 0 (common: 0.10, 0.05, 0.01) however decreasing α increases β for a fixed n.
Example 87 Probability of missed alarm in Example 86
If ptrue ≡ P [defective] = 0.15, what is P [not asserting the claim p > 0.1 | ptrue = 0.15]? given
that you designed your test at 87% confidence level?
β = P [fail to reject H0 | ptrue = 0.15] = P [X ≤ 3 | ptrue = 0.15]

= f (0) + f (1) + f (2) + f (3) = 0.647!
The power of the test (the ability to reject H0 ) if p = 0.15 is only 0.353!
Set limit to reject H0 at X ≥ 5 ⇒ α = 0.043 (96% confidence) but if p = 0.15, β = 0.65! By

asking more defective items to be found to reject p ≤ 0.1 you increase the probability that
you will not reject H0 although p > 0.1.
ÒP Hwnthi How do these numbers for β change if ptrue = 0.2?
-84-
Example 88 How to improve α and β for Example 86?
Increase sample size! For example, five-fold increase to n = 100 . . ., cut-off value: 20 or more
defective out of 100.
Figure 18: a) P [X = x] and P [X ≥ x] for n = 100
For ptrue = 0.15 ⇒ β ≈ 0. Of course if a destructive test is implemented for testing

defective cartons the cost of quality control is important: your test design can only specify
two out of three: α, β and/or n.
4.7.2 Significance Testing
- How to select confidence level α in step 3c in hypothesis thinking?

Instead of:
1. Selecting significance level α
2. Accepting or rejecting H0
Try:
1. Rejecting H0
2. Calculating significance level α at which H0 is rejected
Instead of designing the test with specific cut-off values and reject/accept based on sampling,
you always reject H0 and report the “significance” level. Large values of α correspond to
large probabilities of false alarm.
-85-
4.7.3 Hypothesis and significance tests on µ
When designing hypothesis tests, significance level α reflects probability (area) that you
reject H0 while true. For two-tailed tests this is two areas contrary to one-tail.
LEFT-TAILED TEST
Example 89 Does new resin reduce sulfates in water?

Claim: If a new resin is used to treat water, it reduces sulfates, i.e. the average concentration
of sulfates, µ, after treatment is lower than the average concentration of sulfates µ0 = 10
before treatment.
Preliminary thinking: Consider a random sample of size n. Then, by Theorem 37, the
best estimate of µ is x̄ = x1 +...+x
n
n
.
How much smaller than µ0 should x̄ be to assert the claim µ < µ0 ?
1. Define H1 : µ < µ0 (want to assert), choose α = 0.10
2. Define H0 : µ ≥ µ0 (want to reject)
3. Assume H0 : µ ≥ µ0 true, suffices to check H0 = µ0 (why?)
(a) Test statistic: X̄−µ

√ 0 follows T-distribution with n − 1
S/ n
DOF, when X normal and is a function of sample x̄
h i
X̄−µ
Probability to be in the critical area P [T ≤ −tα ] = α P √0
S/ n
≤ −tα = α
shaded area
x̄−µ
Agree to reject H0 : µ ≥ µ0 if observed √0
S/ n
≤ −tα
h i
X̄−µ
where P √0
S/ n
≤ −tα = α.
(b) Collect data: Take sample of size n = 25 (DOF 24)
and calculate value of x̄: x̄ = 9.2, s = 2. Calculate
value of statistic: x̄−µ
√ 0 = 9.2−10
s/ n
√
2/ 25
= −2.0.
(c) Compare value of statistic with critical value for h
−2.0 −1.3
i
X̄−10
given α: α = 0.10 ⇒ tα = 1.32. (from Table 4) P √
S/ 25
≤ −t 0.10 = 0.10
−2.0 < −1.32 ⇒ H0 is rejected since t = −2.0 in shaded area
critical area.
H1 : µ < µ0 = 10 is accepted with confidence 1 − 0.10 = 90%
-86-
- So what is the true value of µ?

- What if α = 0.05 or α = 0.01?
Significance-testing thinking: Instead of choosing α, the statistic t = −2.0 is calculated

and the critical area (area to the left) is reported as significance (P −value) to reject H0 . If
t = −2.0 and given that DOF=24, P [T ≤ −2.0] = 0.0285 therefore H0 can be rejected with
significance level 0.03.
RIGHT-TAILED TEST
Example 90 Have sulfates in water increased?

Claim: Average concentration of sulfates, µ, increased above the long-term average µ0 = 10.
n
n
.
How much larger than µ0 should x̄ be to assert the claim µ > µ0 ?
1. Define H1 : µ > µ0 (want to assert)
2. Define H0 : µ ≤ µ0 (want to reject)
3. Assume H0 : µ ≤ µ0 is true

S/ n
h i
P X̄−µ
√ 0 ≥ tα = α shaded
S/ n
DOF, when X is normal
x̄−µ area
Agree to reject H0 : µ ≤ µ0 if observed S/ √ 0 ≥ tα
n
h i
X̄−µ
where P √0
S/ n
≥ tα = α.
(b) Collect data: Take sample of size n = 25 and calcu-
late value of x̄: x̄ = 10.3, s = 2. Calculate value of
statistic: x̄−µ
√ 0 = 10.3−10
s/ n
√
2/ 25
= 0.75.
(c) Compare value of statistic with critical value for
0.75 1.3
given α: α = 0.10 ⇒ tα = 1.32. (from Table 4) h

X̄−10
i
P √
S/ 25
≥ t0.10 = 0.10
1.32 > 0.75 ⇒ H0 is not rejected.
shaded area
H1 : µ > µ0 = 10 cannot be accepted with confidence 1 − 0.10 = 90%
- Significance-testing?
-87-
TWO-TAILED TEST
Example 91 Have sulfates in water shifted?
Claim: Average concentration of sulfates in water, µ, has shifted away from µ0 = 10.
n
n
.
How much far from µ0 should x̄ be to assert the claim µ 6= µ0 ?
1. Define H1 : µ 6= µ0 (want to assert)
2. Define H0 : µ = µ0 (want to reject)
3. Assume H0 : µ = µ0 is true

h i
S/ n P X̄−µ
√ 0 ≤ −tα/2 = α and
S/h n i 2
DOF, when X is normal X̄−µ
P S/ n ≥ tα/2 = α2
√ 0
x̄−µ
√ 0 ≤ −tα/2
Agree to reject H0 : µ = µ0 if observed S/ n shaded areas
h i
x̄−µ X̄−µ α
or S/
√0
n
≥ tα/2 where P √0
S/ n
≤ −tα/2 = 2
and
h i
X̄−µ α
P √0
S/ n
≥ tα/2 = 2
(b) Collect data: Take sample of size n = 25 and calcu-

late value of x̄: x̄ = 10.3, s = 2. Calculate value of
statistic: x̄−µ
√ 0 = 10.3−10
s/ n
√
2/ 25
= 0.75. −1.7 0.75 1.7
h i
(c) Compare value of statistic with critical value for P S/X̄−10
√ ≥ t0.05 = 0.05
25
given α: α = 0.10 ⇒ tα/2 = 1.7 (from Table 4).
h i
X̄−10
P S/ √
25
≤ −t 0.05 = 0.05
1.7 > 0.75 > −1.7 ⇒ H0 is not rejected. shaded area
H1 : µ 6= µ0 = 10 cannot be accepted with confidence 1 − 0.10 = 90%
Significance-testing thinking: Instead of choosing α, the statistic t = 0.75 is calculated

and the critical area (area to the right adding the same to left) is reported as significance
(P −value) to reject H0 . If t = 0.75 and given that DOF=24, P [T ≤ −0.75] = 0.23 therefore
H0 can be rejected with significance level 2*0.23=0.46 (careful, this is for two-tailed test).
What if X is not normal?
-88-
4.7.4 Hypothesis and significance tests on proportion p
We already examined how hypothesis and significance testing can be performed with the
Binomial distribution to assert a claim regarding the proportion p in Example 86.
Often, to reduce α and β large samples are drawn (Example 88). In this case, it is
reasonable to assume that the Binomial approximates a normal distribution with mean np0
√
and standard deviation np0 q0 based on the values of the null hypothesis H0 .
Example 92 Is p 6= 0.1 if x = 15 defective cartons are found in a sample of n = 100?

H1 : p 6= 0.1
H0 : p = 0.1
x − np0 p̂ − p0
p̂ = 0.15, z= √ =q = 1.67 ⇒ P [Z ≤ −1.67] = 0.0475
np0 q0 p0 q0 /n
H0 rejected with significance 0.095.
ÒP Hwnthi What is the value of the corresponding statistic if the Binomial was used?
4.7.5 Hypothesis and significance tests on σ 2 , σ

(n−1)S 2
Same approach as previously however the appropriate statistic χ2 = σ2
has to be used.
LEFT-TAILED TEST RIGHT-TAILED TEST TWO-TAILED TEST

Hypothesis to assert H1 : σ 2 < σ02 H1 : σ 2 > σ02 H1 : σ 2 6= σ02
Hypothesis to reject H0 : σ 2 ≥ σ02 H0 : σ 2 ≤ σ02 H0 : σ 2 = σ02
(n−1)S 2
Test-statistic Assuming H0 true, σ2
is χ2 distributed with n − 1 DOF
2
Exact if X ∼ N (µ, σ ), approximate otherwise
Example 93 Variance in catalyst size in Example 63 6= 200?

Following Examples 63,85, H0 : σ 2 = 200, α = 0.05, γ = 299:
α α
s2 = 215, = 0.025, 1 − = 0.975 ⇒ χ2α = 348.8, χ21− α = 253
2 2 2 2
299 ∗ 215
χ2 = = 321.45
200
χ2 not in critical region, do not reject H0 with 95% Confidence.
-89-
4.8 Comparing population parameters

Population 1
Sample 1
size n1
Sample 2
size n2
Population 2
µ1 − µ2 =??
Comparing means, variances or populations represents performing tests using a statistic that
is a linear combination of sampled statistics. Theorems on combinations of random variables
are the basis of such tests (i.e. Theorems 21, 24, 31 )
4.8.1 Point estimation of µ1 − µ2
Theorem 46: Difference of sample averages, unbiased estimator of difference of means

X̄ (1) − X̄ (2) is an unbiased estimator of µ1 − µ2 .
Example 94 Comparing two materials
Plastic tubing vs copper tubing strength.

Data from sampling each of the two populations
Plastic Copper
3.0 5.3 6.9 4.1 7.1 9.3 8.2
8.0 6.7 6.3 10.4 9.1 8.7
7.1 4.2 7.2 12.1 10.7 10.6
5.1 5.5 5.8 10.5 11.3 11.5
75.2 119.5
x̄(1) = 13
= 5.78 x̄(2) = 12
= 9.96
1 − µ2 = 5.78 − 9.96 = −4.18

µ\
In order to make a confidence interval or perform hypothesis testing we need information

on the variance of X̄ (1) − X̄ (2) . Different methods are used based on scenarios present.
-90-
4.8.2 Simplified confidence interval on µ1 − µ2 (or p1 − p2 ) for large samples
- Independent large samples. µ1 , σ1 , µ2 , σ2 of populations are unknown.

- Approximate solution (for large samples) based on Theorem 47 below.
σ2 σ2
- Random variable Y (X1 − X2 ), µY = µ1 − µ2 , σY2 = n11 + n22 (check Theorems 21,24)
Theorem 47: Distribution of estimator of µ1 − µ2 for normal random variables
X̄ (1) − X̄ (2) − (µ1 − µ2 )

X (1) ∼ N (µ1 , σ12 ), X (2) ∼ N (µ2 , σ22 ) ⇒ Z =
ˆ r ∼ N (0, 1)
σ12 σ22
n1
+ n2
(standard normal)
where n1 ,n2 are sizes of random samples drawn from each population.
Theorem 48: Theorem 48 when random variables are not normally distributed
Standard Normal Distribution
By Central Limit Theorem (Theorem 36) Theorem 47 is

approximately true as n1 , n2 → ∞.
 
X̄ (1) − X̄ (2) − (µ1 − µ2 )
P −zα/2 ≤ q ≤ zα/2  = 1 − α
σ12 /n1 + σ22 /n2
Given α we can now perform hypothesis testing or form confidence intervals for true
µ1 − µ2 based on our sampled X̄ (1) − X̄ (2) :
100(1 − α)% confidence bounds on µ1 − µ2 for large samples are:
s s
(1) (2) σ12 σ22 (1) (2) S12 S22
µ1 − µ2 = X̄ − X̄ ± zα/2 + ≈ X̄ − X̄ ± zα/2 +
n1 n2 n1 n2
Since proportions can be considered as averages of random variables being true or false, for
large samples a similar approach can be used for the differences of proportions given sample
estimates p̄1 and p̄2
s
p̄1 q̄1 p̄2 q̄1
p1 − p2 ≈ p̄1 − p¯2 ± zα/2 +
n1 n2
-91-
4.8.3 Confidence interval on µ1 − µ2
- Independent small samples (potentially n1 6= n2 ). µ1 , σ1 , µ2 , σ2 of populations are unknown

- Since small samples, T-distribution is appropriate.
- An estimate of the variance of Y is required:
• If support for σ12 = σ22 (check section 4.8.4) pooled estimate Sp2 (1/n1 + 1/n2 ) where
(n −1)S12 +(n2 −1)S22
Sp2 =
ˆ 1 n1 +n 2 −2
and n1 + n2 − 2 degrees of freedom (n1 = n2 ⇒ Sp2 = 0.5S12 + 0.5S22 )
Theorem 49: Statistic involving µ1 − µ2 that follows T-distribution when σ12 = σ22
X̄ (1) − X̄ (2) − (µ1 − µ2 )

X (1) ∼ N (µ1 , σ12 ), X (2) ∼ N (µ2 , σ22 ), and σ12 = σ22 ⇒ q follows
Sp2 (1/n1 + 1/n2 )
the T-distribution
(n1 −1)S12 +(n2 −1)S22

with n1 + n2 − 2 degrees of freedom, where Sp2 =
ˆ n1 +n2 −2
100(1 − α)% confidence bounds on µ1 − µ2 are:
s
1 1

(1) (2)
X̄ − X̄ ± tα/2 Sp2 + if σ12 = σ22 ;
n1 n2
 
(S 2 /n + S22 /n2 )2 
 1 1
• If support for σ12 6= σ22 , use S12 /n1 + S22 /n2 with integer part of (S12 /n1 )2 (S22 /n2 )2
as
n1 −1
+ n2 −1
degrees of freedom
Theorem 50: Statistic involving µ1 − µ2 that follows T-distribution σ12 6= σ22
X̄ (1) − X̄ (2) − (µ1 − µ2 )

X (1) ∼ N (µ1 , σ12 ), X (2) ∼ N (µ2 , σ22 ), and σ12 6= σ22 ⇒ r follows
S12 S22
n1
+ n2
 
(S 2 /n + S22 /n2 )2 
 1 1
approximately T-distribution with integer part of (S12 /n1 )2 (S22 /n2 )2
degrees of freedom.
n1 −1
+ n2 −1
100(1 − α)% confidence bounds on µ1 − µ2 are:
s
(1) (2) S12 S22
or X̄ − X̄ ± tα/2 + if σ12 6= σ22
n1 n2
-92-
4.8.4 Hypothesis and significance tests on σ12 − σ22
Theorem 51: Statistic for ratio of variances from independent samples
S12 /σ12 S12 σ22

X (1) ∼ N (µ1 , σ12 ), X (2) ∼ N (µ2 , σ22 ) ⇒ F = =
S22 /σ22 S22 σ12
follows the F -distribution with two parameters, v1 = n1 − 1, v2 = n2 − 1 degrees of freedom.
100(1 − α)% Confidence interval:

h i
P f1−α/2 (v1 , v2 ) < F < fα/2 (v1 , v2 ) = 1 − α ⇒
S12 1 σ12 S12

< < fα/2 (v2 , v1 )
S22 fα/2 (v1 , v2 ) σ22 S22
1
using the relation f1−α/2 (v1 , v2 ) = fα/2 (v2 ,v1 )
.
Careful when v1 6= v2 !
Example 95 Thickness of film (Example 3)
Sample 1 x̄1 s21

1.473 1.484 1.384 1.225 1.448 1.367 1.276 1.485 1.390 1.350 1.388 0.00782
Sample 2 x̄2 s22

1.310 1.501 1.485 1.435 1.538 1.417 1.500 1.469 1.474 1.552 1.468 0.00478
Is the thickness different in the two samples?: x̄1 − x̄2 = −0.08
• Can we assume σ12 6= σ22 because we observe that s21 /s22 ≈ 1.64?
T ables,Sof tware
H0 : σ12 = σ22 = σ02 , α = 0.1 −−−−−−−−−→ f0.05 (9, 9) = 3.1789, f0.95 (9, 9) = 0.3146
0.00782σ02
Form statistic F = 0.00478σ02
= 1.64 not in critical area! Fail to reject H0
σ12 σ12
90% confidence interval: 1.64/3.1789 < σ22
< 1.64 ∗ 3.1789 ⇒ 0.52 < σ22
< 5.21
So we can not support that σ12 6= σ22 .
ÒP Hwnthi Check: MATLAB fcdf(3.1789,9,9) or finv(0.95,9,9)

Mathematica InverseCDF[FRatioDistribution[9, 9], 0.05]
Excel F.INV(0.05,9,9) or F.DIST(3.1789,9,9,1)
-93-
• Assume σ12 = σ22 ,

H0 : µ1 = µ2 = µ, 18 DOF, α = 0.1 ⇒ t0.95 = −1.7341
0.00782 + 0.00478 x̄1 − x̄2 −0.08

Sp2 = = 0.0063, t = q =q = −2.25
2 Sp2 (1/n1 + 1/n2 ) 0.0063 ∗ (0.2)
In critical region therefore with 90% confidence we support that the thickness is dif-
ferent between the two samples.
ÒP Hwnthi Repeat exercise using a higher level of confidence.
4.8.5 Paired observations
When comparing population parameters from two samples, we assumed random and inde-
pendent samples. This is often not a valid assumption.
Example 96 Corrosion on pipe coatings: is it different?

In Example 4 pairs of with different coatings pipes were buried in 11 locations and the extent
of corrosion was measured as:
Soil Type Lead-coated (XL ) Bare (XB ) Difference (d)
A 27.3 41.4 -14.1

B 18.4 18.9 -0.5
C 11.9 21.7 -9.8
D 11.3 16.8 -5.5
E 14.8 9 5.8
F 20.8 19.3 1.5
G 17.9 32.1 -14.2
H 7.8 7.4 0.4
I 14.7 20.7 -6.0
J 19 34.4 -15.4
K 65.3 76.2 -10.9
x̄ 20.8 27.1 -6.3

s2 245 371 52.6
Do coatings present a difference in the extent of corrosion since x̄L − x̄B = −6.3?
Approach 1: H0 : µL − µB = 0: We assume σ12 = σ22 , Sp2 = 245+371 2

= 308, 20 DOF,
t = √−6.3 2 = −0.84 , α = 0.1 ⇒ t0.95 = −1.7247, with 90% confidence no difference.
308∗ 11
-94-
Approach 2: We observed that the assumption of random and independent sampling is not
valid based on a simple plot as shown in the figure: the corrosion clearly depends on the soil
type which seems to be a more important factor than the coating! As a result, we do not
have 20 degrees of freedom in our sampling and Sp2 /n1 + Sp2 /n2 (estimate as 308*2/11 = 56)
is overestimating the variance of the mean of XL − XB , given that there is covariance (see
Theorems 31, 24)
Solution: we pair observations therefore we have 11 pairs and 10 DOF. For each pair we
calculate the difference d and perform hypothesis testing checking whether d is significantly
different than zero.
H0 : d = 0: s2 = 52.6, 10 DOF, t = √ −6.3 1

= −2.86 , α = 0.1 ⇒ t0.95 = −1.8125, with 90%
52.6∗ 11
confidence there is more corrosion for the bare pipe.
-95-
5 REGRESSION AND CORRELATION

Perform linear regression providing confidence intervals
• Understand the concept of linear and nonlinear regression
• Perform multiple linear regression
• Understand the concept of correlation
Definition 65: Dependent and independent variables

X: Independent or predictor variable
Y : Dependent or response variable
- Y is random due to external disturbances unaccounted for, or measurement noise.
- The values of X are known either without or with some random error.
Example 97 Independent and dependent variables
X: Reactor temperature, Y : Extent of conversion of reactants to products

X: Concentration of CO2 in atmosphere, Y : Atmospheric temperature
Definition 66: Regression

Use data to build a mathematical model that describes how X affects Y .
- Ideally want: fY |x (y)

- Settle for µY |x as a function of x
Figure 19: Curve of regression. Distribution of Y for a collection of xi (i.e. fY |x (y) ) is shown.
-96-
Definition 67: Curve of regression

Curve of regression is plot of µY |x as a function of versus x.
- Experimental determination of curve of regression: Use random sample
 

 

(x1 , Y |x1 ), . . . , (xn , Y |xn )

 | {z } | {z } 

Y1 Yn
where {x1 , . . . , xn } are known values and {Y1 , . . . , Yn } are random variables.
- {Y1 , . . . , Yn } take different values by chance every time they are measured.
Figure 20: Collection of data for regression. 4 measurements are collected at each xi .
Definition 68: Controlled and uncontrolled study

Controlled Study: xi selected by experimenter
Observational Study: xi observed at random
Example 98 Controlled and uncontrolled studies
Controlled study: Effect of reactor temperature, X, on extent of conversion of reactants to

products, Y .
Uncontrolled study: Effect of concentration of CO2 in atmosphere, X, on atmospheric tem-

perature, Y .
-97-
Definition 69: Linear and nonlinear regression

Assumed curve of regression: µY |x = g(x, θ1 , θ2 , . . .) where θ is vector of unknown parameters.
g(x, θ1 , θ2 , . . .) linear in θ1 , θ2 , . . . ⇒ Linear regression (Relatively easy!)
g(x, θ1 , θ2 , . . .) not linear in at least one of θ1 , θ2 , . . . ⇒ Nonlinear regression (More difficult!)
Example 99 Linear and Nonlinear Regression
Linear regression: µY |x = θ1 + θ2 x + θ3 x2 + θ4 e−x

Nonlinear regression: µY |x = θ1 e−θ2 /x
Note: Nonlinearity of mapping x 7→ µY |x is not the same as nonlinearity of regression.
- Why is it called “Regression”? (Hint: Do a web search with keywords Francis Galton,
regression.)
- Who discovered regression? (Hint: Do a web search with keywords Gauss, Legendre, Ceres,
regression.)
5.1 Linear regression = linear least squares

System structure:
Random Random
variable variable
z }| { Number z }| {
= β 0 + β 1 xi ⇒ Y |xi ⇒
z}|{
µY |xi = β0 + β1 xi + Ei
| {z } | {z } |{z}
q Yi E[Ei ]=0
E[Y |xi −Ei ] (Why?)
True relationship between measurements xi , yi :
y i = β0 + β1 xi + εi (31)
|{z} |{z} |{z}
measured measured impossible
to measure
-98-
yy 6
5
y3
y2
y4
y1
x x
x1 x2 x3 x4 x5 x6 x1 x2 x3 x4 x5 x6
- Task: Estimate β0 , β1 . (Point estimates and confidence intervals)

Model structure: yi = b0 + b1 xi + ei
To estimate β0 , β1 by b0 , b1 , minimize sum of squared errors (SSE):
N N
e2i = (yi − b0 − b1 xi )2
X X
min (SSE)=
ˆ min ˆ min (32)
b0 , b1 b0 , b1 b0 , b1
i=1 i=1
The result of the minimization is (How?)
xi yi − ( xi )( yi )
P P P
n
β̂1 = b1 = , β̂0 = b0 = ȳ − b1 x̄ (33)
n x2i − ( xi )2
P P
-99-
Example 100 Simple linear regression
Examine relationship between humidity (X) and extent of solvent evaporation (Y ) in water-
reducible paints during sprayout (data from Journal of Coating Technology, 65, 1983)
Pn Pn
n = 25, i=1 xi = 1314.90, i=1 yi = 235.70
Pn Pn
i=1 x2i = 76308.53, i=1 yi2 = 2286.07,
Pn
i=1 xi yi = 11824.44
Eq. 33 ⇒ β̂1 = b1 = −0.08, β̂0 = b0 = 13.64 ⇒
ˆ = 13.64 − 0.08x
µ̂Y |x =ŷ
Graph of estimated line of regression of Y

Solvent evaporation, %wt
12
11
10
30 40 50 60 70
. Relative humidity, %
-Is the assumed model, Eq 31 , reasonable?

-What is the best guess of solvent evaporation when
humidity is 50%?
Definition 70: Residuals

 
model fit ŷi
z }| {
ri =
ˆ y − β0 + β1 |{z}
xi  (34)
 
i
|{z}
measured measured
For Eq. 31 to be valid, residuals must be “white” (What is white?)
Theorem 52: Best guess of y given x
µ̂Y |x = β0 + β1 x (35)
-100-
Example 101 Residuals in Example 100: Are they white?
Residual=Measurement−Model Fit
1
0.5
0
−0.5
−1
−1.5
5 10 15 20 25
measurement point, i
Residuals appear white.
Example 102 Non-white Residuals
Drifting Patterning
Residual,
Residual,
Measurement point, i Measurement point, i
Growing noise Auto−correlation

Residual,
Residual,
Measurement point, i Measurement point, i
Outlier
Residual,
Measurement point, i
Example 103 Best guess for given x in Example 100
µ̂Y |x=50 = ŷ = 13.64 − (0.08)(50) = 9.64% (36)
-101-
5.1.1 Properties of least-squares estimators
Definition 71: Parameters, estimators for parameters, and estimates of parameters in

simple linear regression
β0 , β1 : Numbers
B0 =
ˆ β̂0 , B1 =
ˆ β̂1 : Estimators for β0 , β1
(b0 , b1 values of B0 , B1 )
Theorem 53: Distribution of estimators in simple linear regression

P P P 
n xi )( Yi )
xi Yi −( 
B1 = β̂1 =


P 2 P 2 
n xi −( xi )







B0 = β̂0 = Ȳ − B1 x̄ ⇒




Yi = β0 + β1 xi + E


i 


|{z} 
∼N (0,σ 2 )

! Pn !
σ2 x2i i=1
B1 ∼ N β1 , Pn and B0 ∼ N β0 , Pn σ2
i=1 (xi − x̄) n i=1 (xi − x̄)2
2
Pn(xi −x̄)
Pn
Proof: : Yi = β0 + β1 xi + |{z}
Ei ⇒ Yi ∼ N (β0 + β1 xi , σ 2 ) ⇒ B1 = i=1 (xj −x̄)2
Yi ⇒ . . .
j=1
∼N (0,σ 2 ) | {z }
Ci
⇒ ... ⇒ B1 , B0 are linear combinations of Yi
Theorem 54: Estimator of variance of measurement noise
SSE
S 2 = σ̂ 2 = (37)
n−2
-102-
Definition 72: Useful universal notation for linear regression

n
(xi − x̄)2
X
sxx =
ˆ
i=1
n n
(yi − ȳ)2 or Syy = (Yi − Ȳ )2
X X
syy =
ˆ ˆ (What is the difference?)
i=1 i=1
Xn n
X
sxy =
ˆ (xi − x̄)(yi − ȳ) or Sxy =
ˆ (xi − x̄)(Yi − Ȳ ) (What is the difference?)
i=1 i=1
n n
(yi − b0 − b1 xi )2 or SSE= (Yi − B0 − B1 xi )2
X X
sse=
ˆ ˆ (What is the difference?)
i=1 i=1
Theorem 55: Variance of parameter estimators in simple linear regression
SSE = Syy − 2B1 Sxy + B12 Sxx = Syy − B1 Sxy
Sxy
B1 =
Sxx
σ2
σB2 1 =
Sxx
P 2 2
xσ i
σB2 0 =
nSxx
-103-
5.1.2 Confidence intervals and hypothesis testing in linear least squares
1. Hypothesis testing and confidence interval on β1 (slope):

 
>

6 
H1 : β1  = 0
<
 
≤
H0 : β1  =  0
 
Theorem 56: Confidence interval for β1

" #
S S
P B1 − tα/2 √ ≤ β1 ≤ B1 + tα/2 √ =1−α (38)
Sxx Sxx
B1√−β1
Proof: Tn−2 = S/ Sxx
is T-distributed with n − 2 DOF ⇒ . . .
Example 104 Does X have an effect on Y in Example 100?
Example 100 ⇒ µ̂Y |x = 13.64 − 0.08

| {z } x
significantly 6=0?
Sxx = 7150.05
Syy = 63.89
H0 : β1 = 0
Sxy = −572.44
H1 : β1 6= 0
SSE = Syy − b1 Sxy = 18.09 ⇒
SSE
S 2 = n−2 = 0.79
b1
Therefore observed value of T23 is t = √
s/ Sxx
= −7.62
P [T23 ≤ −7.62] < 0.0005 ⇒ P = 0.001
-104-
2. Hypothesis testing and confidence interval on β0 (intercept):

 
>

6 
H1 : β0  = 0
<
 
≤
H0 : β0  = 

0
≥
Theorem 57: Confidence interval for β0

 qP qP 
S x2i S x2i
P B0 − tα/2
 √ ≤ β0 ≤ B0 + tα/2 √ =1−α (39)
nSxx nSxx
−β0
Proof: B
√0P 2
is T-distributed with n − 2 DOF ⇒ . . .
S x
√ i
nSxx
Example 105 Does the straight line in Example 100 cross (0,0)?
 
√ √
2.807 0.79 2.807 0.79 
 
−0.08 − √ ≤ β1 ≤ −0.08 + √

P  = 0.99
7150.05 } 7150.05
 
 
| {z | {z }
−0.109 −0.051
3. Confidence interval on µY |x :
Theorem 58: Confidence interval on µY |x

 s s 
1 (x − x̄)2 1 (x − x̄)2 
P µ̂Y |x − tα/2 S + ≤ µY |x ≤ µ̂Y |x + tα/2 S + =1−α (40)
n Sxx n Sxx
µ̂ −µY |x
Proof: qY |x is T-distributed with n − 2 DOF ⇒ . . .
1 (x−x̄)2
S n
+ S
xx
-105-
4. Confidence interval on Y |x:
Theorem 59: Confidence interval on Y |x

 s s 
1 (x − x̄)2 1 (x − x̄)2 
P Ŷ |x − tα/2 S 1 + + ≤ Y |x ≤ Ŷ |x + tα/2 S 1 + + = 1 − α (41)
n Sxx n Sxx
Proof: q Ŷ |x−Y |x is T-distributed with n − 2 DOF ⇒ . . .

1 (x−x̄)2
S 1+ n + Sxx
Example 106 90% Confidence interval for predictions of Example 100

q
1
Eq. 40 ⇒ µ̂Y |x = 13.6389 − 0.0800606x ± 1.51874 25
+ 0.000139859(−52.5960 + x)2
with 90% confidence.
q
26
Eq. 41 ⇒ Y |x = 13.6389 − 0.0800606x ± 1.51874 25
+ 0.000139859(−52.5960 + x)2
with 90% confidence.
90% confidence interval on 90% confidence interval on

14 14
12 12
10 10
8 8
6
6
0 20 40 60 80 100 0 20 40 60 80 100
Relative humidity, % Relative humidity, %
-106-
Example 107 Simple linear regression: Effect of car weight on mileage
Eq. 33 ⇒ β̂1 = b1 = −4.03, β̂0 = b0 = 23.75 ⇒

ˆ = 23.75 − 4.03x
µ̂Y |x =ŷ
Estimated line of regression of Y Residual=Measurement−Model Fit
0.4
18.5
Car mileage, mpg
18 0.2
17.5 0
17
16.5 −0.2
16 −0.4
15.5
1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2 4 6 8 10
Car weight, tons measurement point, i
-How sure are we that weight has an effect on mileage?

B1√−β1
H1 : β1 6= 0, H0 : β1 = 0. Assume H0 is true. For β1 = 0, Tn−2 = S/ Sxx
is T-distributed
−4.04−0
with n − 2 DOF. Value of Tn−2 = 0.126/√
0.581
= −24 ⇒ P ≈ 0, H0 rejected.
-What is best guess for average mileage of cars weighing 1.7 tons?
Eq. 35 ⇒ µ̂Y |x=1.7 =ŷ
ˆ = 23.75 − (4.03)(1.7) = 16.9 miles per gallon.
-What is 90%-confidence interval for average mileage of cars weighing 1.7 tons?
Eq. 40 ⇒
√ q 2
µY |x=1.7 = 16.9 ± 1.86 0.126 101
+ (1.7−1.675)
0.581
= 16.9 ± 0.2 with 90% confidence.
-What is 90%-confidence interval for mileage of my car, if it weighs 1.7 tons?

Eq. 41 ⇒
√ q 2
Y |(x = 1.7) = 16.9 ± 1.86 0.126 1 + 10 1
+ (1.7−1.675)
0.581
= 16.9 ± 0.7 with 90% confi-
dence.
90% confidence interval on 90% confidence interval on
22 22
Car mileage, mpg
Car mileage, mpg
20 20
18 18
16 16
14 14
12 12
0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3
Car weight, tons Car weight, tons
-107-
5.1.3 Correlation
Theorem 60: Estimator of (Pearson) correlation coefficient

Sxy
ρ̂=R
ˆ =ˆq (42)
Sxx Syy
is estimator ρ = √ Cov(X,Y ) (Definition 41).

(VarX)(VarY )
Example 108 Correlation between methods for nitrates measurement
x y 600
25 30
500
40 80
120 150 400
75 80 sxy
300
Eq. 42 ⇒ ρ̂=r
y
150 200 ˆ =ˆ√ = 0.978

300 350 200
sxx syy
270 240
100
400 320
450 470 0
0 100 200 300 400 500 600
x
575 583
Theorem 61: Confidence interval and hypothesis testing for ρ

Assume (X, Y ) follow:
(x−µx )2 (y−µy )2

1√ 1 x−µx y−µy
fXY (x, y) = exp − 2(1−ρ 2) σx2
− 2ρ σx σy
+ σy2
2πσx σy 1−ρ2
i.e. the bi-variate normal distribution. Then, for |ρ| close to 1

√ √
(1+R)−(1−R) exp(2zα/2 / n−3) (1+R)−(1−R) exp(−2zα/2 / n−3)
P √
(1+R)+(1−R) exp(2zα/2 / n−3)
≤ρ≤ √
(1+R)+(1−R) exp(−2zα/2 / n−3)
=1−α
1 1+R 1+ρ
ln( 1−R )− 12 ln( 1−ρ )
Proof: 2
q
1
is approximately standard normal ⇒ . . .
n−3
Theorem 62: Hypothesis testing for for ρ

>

√
H1 : ρ 6= 0 R n−2

√

is T-distributed with n − 2 DOF
< 1 − R2

H0 : ρ = 0

-108-
Definition 73: Coefficient of determination

SSE
R2 = 1 − (43)
Syy
Coefficient of determination is variability of Y explained by linear regression model
Theorem 63: Coeff. of determination is squared correlation coefficient between mea-

sured and model-generated Y
2
Sŷy
R2 = (44)
Sŷŷ Syy
Example 109 Coefficient of determination for Example 100 and Example 107
yi =measurement, ŷi = β0 + β1 xi =model-generated. Definition 72 ⇒
18.06
For Example 100: R2 = 1 − 63.89
= 0.72
0.9993
For Example 107: R2 = 1 − 10.46
= 0.90
-Is model for Example 107 more linear than model for Example 100?
-109-
5.2 Multiple linear regression

-Polynomial regression: µY |x = β0 + β1 x + β2 x2 + . . . + βp xp
y
x
-Multivariate linear regression, linear model: µY |x1 ,x2 ,...,xk = β0 + β1 x1 + . . . + βk xk
x1
x2
-General linear regression, nonlinear model:

µY |x1 ,...,xm = β0 + β1 φ1 (x1 , . . . .xm ) + . . . + βk φk (x1 , . . . , xm )
x1
x2
In all linear regression problems the solution can be easily found using matrix algebra.
-110-
5.3 General least squares

-Model:
µY |x1 ,x2 ,...,xk = β0 + β1 φ1 (x1 , . . . , xp ) + . . . + βk φk (x1 , . . . , xp )
-Random Sample:
 
Yi


 z }| { 

x1i , x2i , . . . , xpi , Y |x1i , . . . , xpi  : i = 1, . . . , n
 

 

⇒ Yi = β0 + β1 φ1i (x1i , x2i , . . . , xpi ) + . . . + βk φki (x1i , x2i , . . . , xpi ) + iE , i = 1, . . . , n

|{z}
∼N (0,σ 2 )
or Y = Xβ + E
⇒ yi = β0 + β1 φ1i (x1i , x2i , . . . , xpi ) + . . . + βk φki (x1i , x2i , . . . , xpi ) + ei , i = 1, . . . , n
or
y = Xβ + e (45)
Definition 74: General least-squares minimization to estimate β

n
e2i =kek 2
ˆ − Xbk22
X
minimize SSE=
ˆ ˆ 2 =ky (46)
ˆ 0 ,b1 ,...,bp }
b={b
i=1
Theorem 64: General least-squares parameter estimator

The solution of Eq. 46 is
(XT X)b = XT y ⇔ β̂ = b = (XT X)−1 XT y (47)
ÒP Hwnthi: What are X, b, y ? What are X, b, y for simple linear regression?

- Estimate:
ŷ = µ̂Y |x1 ,...,xm = b0 + b1 φ1 (x1 , . . . , xp ) + . . . + bk φk (x1 , . . . , xp )
-111-
Example 110 Fluid flow through pipe
D
Chapra & Canale, Numerical Methods for Engineers,
Q S
Mc-GrawHill; (5th Ed., Case Study 20.4, p. 551)
Assumed model structure:

Q = a0 Da1 S a2 (48)
where:
Q: flowrate (ft3 /s) Experiment D, ft S, ft/ft Q, ft3 /s
1 1 0.001 1.4
S: slope (ft/ft) 2 2 0.001 8.3
D: diameter (ft) 3 3 0.001 24.2
4 1 0.01 4.7
a0 , a1 , a2 : coefficients to determine 5 2 0.01 28.9
6 3 0.01 84.0
7 1 0.05 11.1
Eq. 48 is not linear in the parameters! 8 2 0.05 69.0
Linearization trick: Eq. 48 ⇒ 9 3 0.05 200
x1 x2
z}|{ z}|{
log Q = log a0 + a1 log D + a2 log S ⇒ µY |x1 ,x2 = β0 + β1 φ1 (x1 , x2 ) + β2 φ2 (x1 , x2 )
| {z } | {z } |{z} | {z } |{z} | {z }
Y β0 β1 φ1 (x1 ,x2 ) β2 φ2 (x1 ,x2 )
   
log Q1 1 log D1 log S1  
    log a
log Q  1 log D log S   0
 2  2 2 
ˆ  .  − .
e=   
.. ..  
 a1 

.
 .  . . . .   
    a2
log Q9 1 log D9 log S9 | {z }
| {z } | {z } b
y X
X= y=
1 0. -3. 0.146128
1 0.30103 -3. 0.919078
1 0.477121 -3. 1.38382
1 0. -2. 0.672098
1 0.30103 -2. 1.4609
1 0.477121 -2. 1.92428
1 0. -1.30103 1.04532
1 0.30103 -1.30103 1.83885
1 0.477121 -1.30103 2.30103
-112-
XT X= XT y= Eq. 47⇒ b=
9. 2.33445 -18.9031 11.6915 1.74797
2.33445 0.954791 -4.90315 3.94623 2.61584
-18.9031 -4.90315 44.078 - 22.2077 0.53678
⇒ (a0 , a1 , a2 ) = (55.97, 2.616. 0.5368)

S, ft / ft
0.04
0.02
0
200
150
100Q, ft3 s
50
30
2.5
2
1.5 D, ft
1
Linearization is starting point for nonlinear regression!
-113-
MATLAB: % Example of fluid flow through a pipe - Solution with Matlab
num_points=9;
k=2;
D=[1; 2; 3; 1; 2; 3; 1; 2; 3 ];
S=[ 0.001; 0.001; 0.001; 0.01; 0.01; 0.01; 0.05; 0.05; 0.05 ];
Q=[ 1.4; 8.3; 24.2; 4.7; 28.9; 84.0 ; 11.1; 69.0; 200];
logQ=log10(Q);
X=zeros(num_points,k+1);
for i=1:1:num_points,
X(i,1)=1;
X(i,2)=log10(D(i));
X(i,3)=log10(S(i));
end
XTX=(X’*X);
INV_XTX=inv(XTX);
XTy=((X’)*logQ);
b=INV_XTX*XTy
Y=X*b;
lnr=logQ-Y; % Residuals
YYi= ( (logQ-mean(logQ)).*(logQ-mean(logQ)));
Syy=sum(YYi);
E2=lnr.*lnr;
SSE= sum(E2);
R2=1-(SSE/Syy) % Coefficient of determination
r=Q-10.ˆY; % Residuals of non-linear
subplot(121),plot([1:9],lnr,’-o’,[ 1 9],[ 0 0]);

ylabel(’r_i=\Delta logQ_i’)
subplot(122),plot([1:9],r,’-o’,[ 1 9],[ 0 0]);
ylabel(’r_i=\Delta Q_i’)
−3
x 10
10 1.6
8 1.4
1.2
6
1
4
0.8
ri=∆ logQi
ri=∆ Qi
2
0.6
0
0.4
−2
0.2
−4 0
−6 −0.2
0 2 4 6 8 10 0 2 4 6 8 10
-114-
5.4 Polynomial least squares

-Model:
Y |x = β0 + β1 x + β2 x2 + . . . + βp xp + E
-Random Sample:
{(x1 , Y |x1 ), . . . , (xn , Y |xn )}
⇒ Yi = β0 + β1 xi + . . . + βp xpi + E i i = 1, . . . , n
|{z}
independent ,∼N (0,σ 2 )
⇒ yi = β0 + β1 xi + . . . + βp xpi + ei i = 1, . . . , n
or
y = Xβ + e (49)
Definition 75: Polynomial least-squares minimization to estimate β

n
e2i =kek 2
ˆ − Xbk22
X
minimize SSE=
ˆ ˆ 2 =ky (50)
ˆ 0 ,b1 ,...,bp }
b={b
i=1
Theorem 65: Polynomial least-squares parameter estimator

Eq. 51⇒
xpi
    P 
x2i · · ·
P P P
n xi b0 yi
p+1   
P   P 
 x P 2 P
 i xi xi  b1   xi yi 


 .. .. ..    ..  =  .. 
    
 . . .  .   . 
   
 
P p P p+1 P p+2 P 2p P p
xi xi xi ··· xi bp xi y i
| {z } | {z } | {z }
XT X b XT y
-Estimate:
ŷ = µ̂Y |x = b0 + b1 x + . . . + bp xp (52)
-115-
Example 111 Polynomial linear regression
x y
µY |x = β0 + β1 x + β2 x2
5 14.0     P 
x2i  b0  
P P
5 12.5 P
n xi yi 
P 2 P 3  
10 7.0 b1  =  xi yi 
P
 xi xi xi 
 
P    P 2 
10 5.0 x2i
P 3 P 4
xi x i b2 xi y i
15 2.1 | {z }
XT X
15 1.8     
20 6.2 
10 150 2750  b0   81.3 
 150 2750 56, 250  b  =  1228 
    
20 4.9
   1  
25 13.2 2750 56, 250 1, 223, 750 b2 24555
25 14.6
⇒ b0 = 27.3, b1 = −3.313, b2 = 0.111
20
15
10
y
0
0 5 10 15 20 25 30
x
MATLAB: % Example of polynomial regression

num_points=10;
k=2;
x=[ 5; 5; 10; 10; 15; 15; 20; 20; 25; 25];

y=[ 14.0;12.5; 7.0; 5.0; 2.1; 1.8; 6.2; 4.9;13.2;14.6];
for j=1:1:k+1,
X(i,j)=x(i)ˆ(j-1);
end
end
XTX=(X’*X)
INV_XTX=inv(XTX);
XTy=((X’)*y)
b=INV_XTX*XTy
-116-
5.5 Multiple linear least squares

-Model:
µY |x1 ,x2 ,...,xk = β0 + β1 x1 + β2 x2 + . . . + βk xk
-Random Sample:  
 Yi 
 z }| { 
x1i , x2i , . . . , xki , Y |x1i , x2i , . . . , xki  i = 1, 2, . . . , n
 

 

⇒ Yi = β0 + β1 x1i + . . . + βk xki + Ei
|{z}
∼N (0,σ 2 )
⇒ yi = β0 + β1 x1i + . . . + βk xki + ei
or
y = Xβ + e (53)
Definition 76: Multiple linear least-squares minimization to estimate β

n
X
minimize SSE= ˆ e2i =kek
ˆ 2
ˆ − Xbk22
2 =ky (54)
ˆ 0 ,b1 ,...,bp }
b={b
i=1
Theorem 66: Multiple linear least-squares parameter estimator

 P P P    P 
Pn P x1i P x2i ··· P xki b0 P yi
 x1i x21i x2i x1i  b1   yi x1i 
xki x1i     
=

 .. .. .. . ..
  ..  
    
 .
P . . . 
P P P 2 P
xki x1i xki x2i xki ··· xki bk yi xki
| {z } | {z } | {z }
XT X b XT y
-Estimate:
ŷ = µ̂Y |x = b0 + b1 x1 + b2 x2 + . . . + bk xk (56)
-117-
Example 112 Multiple linear least squares
y 17.9 6.5 16.4 16.8 18.8 15.5 17.5 16.4 15.9 18.3
x1 1.35 1.90 1.70 1.80 1.30 2.05 1.60 1.80 1.85 1.40
x2 90 30 80 40 35 45 50 60 65 30
 P P    P 
Pn P x1i
2
P x2i b0 P yi
 x1i  b1  =  x1i yi  ⇒
P P x1i Px2i2x1i P
x2i x1i x2i x2i b2 x2i yi
    
10 16.75 525 b0 170
16.75 28.6375 874.5  b1  = 282.405
525 874.5 31, 475 b2 8887.0
b0 = 24.75, b1 = −4.16, b2 = −0.0149
x2 100
60 80
40
20
18
y
16
1.25
1.5
1.75
x1 2
-118-
5.6 Confidence intervals in multiple linear regression

µY |x1 ,...,xk = β0 + β1 x1 + β2 x2 + . . . + βk xk
  (1) (1)
 
y (1)

1 x1 ··· xk b0
 ..   .  . 
 ..
e =  . −  .  ⇒
 .
y (n) 1
(n)
x1 ···
(n)
xk bk
| {z } | {z } | {z }
y X b
-Least-squares estimate of β:
β̂ = bopt = (XT X)−1 XT y
-(1-α)-Confidence Interval for β:
p p
β̂i − tα/2 S Cii ≤ βi ≤ β̂i + tα/2 S Cii
where,
α=1-Confidence level
tα/2 =appropriate point of T-distribution with n − (k + 1) degrees of freedom
s s P
SSE e(i)2 ky − Xβ̂k2
S=ˆ = =p
n − (k + 1) n − (k + 1) n − (k + 1)
h i
−1
(ith Diagonal element of (XT X)−1 )

Cii = XT X
ii
-Prediction interval for µ̂ for given x = [1, x1 , . . . xk ]T :

q
µY |x1 ,...,xk ± tα/2 S xT (XT X)−1 x
-Prediction interval for given x = [1, x1 , . . . xk ]T :

q
y(x1 , x2 , . . . , xk ) ± tα/2 S 1 + xT (XT X)−1 x
-119-
Example 113 Does temperature affect car mileage?
(i) (i)
Car y (i) x1 x2
µY |x1 ,x2 = β0 + β1 x1 + β2 x2
Number (mpg) (Tons) (◦ F)
 
1 17.9 1.35 90 1 1.35 90
 .. .. .. 
2 16.5 1.90 30 X = . . .
3 16.4 1.70 80 1 1.40 30
4 16.8 1.80 40  
10 16.75 525
5 18.8 1.30 35
XT X = 16.75 28.6375 874.5 
6 15.5 2.05 45 525 874.5 31, 475
7 17.5 1.60 50  
8 16.4 1.80 60 170
XT y = 282.405
9 15.9 1.85 65
8887
10 18.3 1.40 30
⇒ β̂ = bopt = [24.75 − 4.1593 − 0.0149]T

s sP
10 √
SSE e(i)2
i=1 ky − Xbopt k2
S= = = p = 0.021
n − (k + 1) n − (k + 1) n − (k + 1)
[C11 C22 C33 ] = [6.07 1.738 0.000258]
α = 0.05 ⇒ tα/2 = t0.025 = 2.365
Therefore
√
β0 = 24.75 ± (2.365)(0.14) 6.07 = 24.75 ± 0.825
β1 = −4.1593 ±
β2 = −0.0149 ±
-120-
MATLAB: % Example of multiple linear regression

num_points=10;
k=2;
format short e;
x1=[1.35; 1.90; 1.70; 1.80; 1.30; 2.05; 1.60; 1.80; 1.85; 1.40];
x2=[90; 30; 80; 40; 35; 45; 50; 60; 65; 30];
y=[17.9; 16.5; 16.4; 16.8; 18.8; 15.5; 17.5; 16.4; 15.9; 18.3];
X(i,1)=1;
X(i,2)=x1(i);
X(i,3)=x2(i);
end
X
XTX=(X’*X)
XTy=((X’)*y)
INV_XTX=inv(XTX);
b=INV_XTX*XTy
Yn=X*b;
S=norm(y-Yn)/(sqrt(num_points-(k+1)))
CII = [ INV_XTX(1,1) ; INV_XTX(2,2) ; INV_XTX(3,3) ]
%a=0.05 => a/2=0.025 => t_a/2=tinv(0.975, num_points-(k+1)) % tinv is inverse CDF!

tinv(0.975, num_points-(k+1))*S*sqrt(CII(1))
-121-
6 STATISTICAL QUALITY CONTROL

6.1 Background
Statistical quality control methods have been used in industry since at least the early 1940s. An American
statistician, Edward S. Deming received international recognition for his efforts in assisting Japanese industry
in implementing industrial quality control methods. ∗
Definition 77: Causes of variability

Random: Due to chance; cause variation that leaves the process in statistical quality control.
Assignable: Not due to chance; cause variation that leaves the process out of statistical control.
6.2 Quality, hypothesis testing, and Shewhart charts

Definition 78: Shewhart charts for statistical quality control
Some of the first control charts were developed by Walter A. Shewhart, working to improve the reliability
of the transmission systems of Bell Laboratories. ∗ A Shewhart chart continually determines whether the
variability of a parameter (quality indicator) monitored during production is due to random causes, by
checking whether the monitored parameter is within an upper and lower control limit (UCL, LCL). UCL
and LCL are determined by statistical analysis of prior data. (statistical quality control=repeated hypothesis
testing at each sampling point (what is H1 ?))
(a) (b)
UCL UCL
Sample value of
Sample value of
quality indicator
quality indicator
CL CL
LCL LCL
Time point of sampling Time point of sampling
Figure 21: Production process in statistical control (a) and out of statistical control (b).
Example 114 Statistical quality control of VCM (Example 6) revisited

Use first 15 data points to establish CL, UCL, LCL (USL=Upper specification limit=1, given)
CL = x̄ = 0.4, s = 0.07 ⇒ UCL = x̄ + 3s ≈ 0.6, LCL = x̄ − 3s ≈ 0.2
P [LCL < X < UCL] =? P [X > UCL or X < LCL] =?
∗
Read W. Edwards Deming
∗
Read more about Walter A. Shewart
-122-
15
1.4
1.2
1 USL
ppm NaOH
0.8
0.6 UCL
0.4 CL
0.2 LCL
5 10 15 20 25
Time point of sampling, k
At time point 19, the system is out of statistical quality control, long before it is off-spec.
Better to act when the process is out of statistical control rather than wait until it is off-spec.
Monitoring Means: The lower control limit (LCL) and the upper control limit (UCL) represent the
minimum and maximum values that the sample mean X̄ can assume without raising an alarm:
LCL ≤ X̄ ≤ UCL
LCL = µ0 − kσX̄ UCL = µ0 + kσX̄
Typically, k=2 or 3 and n=4 or 5, taken at fixed time intervals.

−3σ 3σ
P √ ≤ X̄ − µ0 ≤ √ = 0.99
n n
or the exact probability is 0.9974.
Distribution of Run Length

• How often we will make the wrong decision of declaring the process out of control when, in fact, it is in
control?
Geometric distribution: P [Y = y] = f (y) = pq y−1 and E[Y ] = 1/p. Average time = E[Y ] · d.
Example 115 Monitoring means

√
Production of bolts of length with µ0 = 0.5in, standard error of the mean σ/ n = 0.01in. 3-sigma control.
UCL = 0.53 in. LCL=0.47 in.

• If process produces bolts with shifted mean to 0.51 in, what is the probability of a signal?
p = P [X̄ > 0.53 or X̄ < 0.47]

X̄ − 0.51 0.53 − 0.51
P > = P [Z > 2.0] = 0.0228
0.01 0.01

X̄ − 0.51 0.47 − 0.51
and P < = P [Z < −4.0] = 0
0.01 0.01
p = 0.0228 + 0 = 0.0228 with average number of samples to have a signal 1/p = 43.86. If samples taken
every hour → 43.86 hours.
-123-
ÒP Hwnthi What happens if the average has shifted to 0.56 in?

• X̄ Chart (Mean)
Draw m samples of size n over a a time period in which the process is assumed to be under control. Suggested
m ≥ 20 and n = 4 or 5.
Pm Pm
j=1 X̄j E[R] j=1 R̄j
µ̂0 = σ= , E[R] ≈ R̄ =
m d2 m
where R is sample range and d2 is taken from tables as a function of n.(for n = 4, d2 = 2.059 and for
n = 5, d2 = 2.326.
Example 116 Mean Chart
Sample Mean X̄j Range Rj

1 0.493 0.484 0.522 0.476 0.494 0.046
2 0.506 0.491 0.505 0.477 0.495 0.028
3 0.485 0.528 0.529 0.498 0.510 0.031
4 0.495 0.489 0.496 0.470 0.487 0.025
5 0.471 0.518 0.515 0.498 0.500 0.020
6 0.481 0.488 0.474 0.510 0.488 0.036
7 0.481 0.483 0.517 0.477 0.490 0.040
8 0.527 0.523 0.515 0.490 0.514 0.033
9 0.477 0.507 0.496 0.513 0.498 0.018
10 0.528 0.470 0.521 0.483 0.501 0.050
11 0.528 0.520 0.512 0.493 0.513 0.027
12 0.479 0.478 0.493 0.479 0.482 0.016
13 0.525 0.479 0.507 0.477 0.497 0.030
14 0.496 0.511 0.517 0.507 0.508 0.009
15 0.524 0.504 0.514 0.521 0.516 0.017
16 0.497 0.500 0.481 0.504 0.496 0.024
17 0.477 0.507 0.488 0.475 0.487 0.031
18 0.507 0.478 0.489 0.505 0.495 0.027
19 0.528 0.470 0.528 0.477 0.501 0.058
20 0.478 0.491 0.486 0.473 0.482 0.018
Total 9.952 0.584
Average 0.498 0.029
R̄ 0.029
µ̂0 = 0.498, σ̂ = , given d2 = 2.059 for n = 4 ⇒ σ̂ = = 0.014
d2 2.059
3σ̂
and 3-sigma bounds are µ̂0 ± √ n
: 0.498 ± 0.021 (are X̄j within limits? What if there is one out of the
limits?)
• R Chart (Range)
The theoretical bounds for the range are µR ± 3σR . We estimate µ̂R = R̄, and σ̂R = dd32R̄ where d2 as before
(mean chart) and d3 given as function of n (for n = 4, d3 = 0.880 and for n = 5, d3 = 0.864.
Example 117 Range Chart
-124-
Using data from Example 116: µ̂R = 0.029, d3 = 0.880, d2 = 2.059 and UCL, LCL control limits for range
chart are 0.029 ± 0.012. (what if LCL is negative?)
Other Shewhart control charts:
• P Chart (Proportion Defective): monitors the proportion of defective items produced
• C Chart (Average Number of Defects): monitors the average number of defects per the item produced.
6.3 Process capability and six-sigma

Low−capability process High−capability process, off−target 6s process
LSL USL LSL USL LSL USL
LCL CL UCL LCL CL UCL LCL CL UCL
Definition 79: Process capability

Relationshiop between UCL âĂŞ LCL and USL âĂŞ LSL
- Want to have (a) UCL - LCL < USL - LSL, (b) CL = on-spec.
Example 118 Process capability for octane number in distillation

Want small spread of octane #, average on-spec (Why?)
Definition 80: Six-sigma process

USL-LSL=6σ.
-125-

Numerical & Statistical Anylysis For Cheme's Part2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Numerical & Statistical Anylysis For Cheme's Part2

Uploaded by

Copyright:

Available Formats

CBE 301

1 INTRODUCTION: STATISTICS AND PROBABILITY CONCEPTS

Maxwell−Boltzmann Molecular Speed

probability density (s/m)

0 500 1000 1500 2000 2500

Figure 3: Maxwell-Boltzmann speed distribution in noble gases (from

Example 2 Sampling from a population, test of proportion

Example 3 Determine the thickness of a film

Is the thickness between the two samples significantly different?

Example 4 Corrosion on pipe coatings∗

Example 5 Parameter Estimation

- What are confidence margins for these parameters?

Example 6 Statistical quality control of VCM

• Did you consider any potential error in your measurements?

• Understand the mathematical concept of probability, sample space and events

• Know the probability laws

• Understand the concept of conditional probability and independence

• Apply Bayes’ theorem

- How do you test for

2.1 Sample spaces and events

Example 7 Sample space and events for tossing of two dice

Sample Point: 3/6

Sample Point: 6/3

Sample Point: 1/2

Event: Getting a sum of 7 = {1/6, 2/5, 3/4, 4/3, 5/2, 6/1}

Event: Getting 1 from a dice and 6

2.2 Combination rules

Definition 3: Event A and B

Definition 4: Event not A

A0 (Complement with respect to S) Graphically:

Example 8 Combining events

• Event: Getting 7 or 11 = (Getting 7) ∪ (Getting 11)=

Definition 5: Mutually exclusive events

Example 9 Mutually exclusive events

# of ways A can occur

- How to compute # of ways experiment can proceed?

Example 10 Permutations of 3 distinct objects

Permutations of 3 objects: ABC ACB

Permutations of 2 objects out of 5: AB AC AD AE BC BD BE CD CE DE

Combinations of 2 objects out of 5: AB AC AD AE

Theorem 1: Number of permutations of n distinct objects

Theorem 2: Number of permutations of r out of n distinct objects

Theorem 3: Number of combinations of r out of n objects

Example 13 Example 10 revisited

Example 14 Example 11 revisited

Example 15 Example 12 revisited

2.3 Probability laws

3. P [A1 ∪ A2 ∪ A3 . . .] = P [A1 ] + P [A2 ] + P [A3 ] + . . . for every finite or infinite collection

Example 16 Laws of probability

Blood type distribution in the US:

Theorem 4: Probability of impossible event is zero

Proof: S = S ∪ ∅ ⇒ P [S] = P [S ∪ ∅] ⇒ P [S] = P [S] + P [∅] ⇒ P [∅] = 0

Theorem 5: Probability of complementary events

Proof: S = A0 ∪ A ⇒ P [S] = P [A0 ∪ A] ⇒ 1 = P [A0 ] + P [A] ⇒ P [A0 ] = 1 − P [A]

Theorem 6: General addition rule

P [A1 ∪ A2 ] =P [A1 ] + P [A2 ] − P [A1 ∩ A2 ]

Example 17 Connect two engines in parallel or in serial mode?

P [engine 2 works] = 0.9

P [both engines work] = 0.81

P [engine 1 or engine 2 works] =?

2.4 Conditional probability and independent events

Theorem 7: The multiplication rule