Professional Documents
Culture Documents
Numerical & Statistical Anylysis For Cheme's Part2
Numerical & Statistical Anylysis For Cheme's Part2
Application of Numerical
and Statistical Techniques
in Chemical and Biomolecular Engineering
Lecture Notes
Part 2
c Manolis Doxastakis
Chemical and Biomolecular Engineering
University of Tennessee, Knoxville, 2021
How To Use These Notes:
• As the cover page suggests, this text is a set of Lecture Notes, NOT a textbook!
Suggested titles of textbooks will be provided by the instructor and you can find
additional resources online and in the library.
• A large number of sources were used, in addition to textbooks as well as notes written
by Prof. M. Nikolaou (Univ. of Houston).
• Certain topics covered in detail in textbooks are presented herein rather telegraphically
while others are elaborated on, particularly when they refer to material not often
covered in textbooks.
• In many places, throughout the notes some space has been intentionally left blank, for
the student to understand a certain topic by being forced to fill in the missing material.
That is frequently done during lecture time.
• In other places assignments are given for homework not to hand in (ÒP Hwnthi).
• Additional practice problems from past exams will be provided
• The examples have been carefully selected to correspond to a variety of problems of
interest to the evolving nature of chemical and biomolecular engineering. While the
emphasis is on numerical methods, the physical picture is also important.
• There are several basic software tools used throughout: MATLAB, Mathematica,
and Excel. The student should be familiar with computational tools along with the
mathematical and programming principles of computation. Each tool does certain
tasks particularly well and may be adequate for others; therefore you are most efficient
when you learn how to use multiple tools.
• The code included with some examples is intentionally kept simple, to illustrate con-
cepts. Professional code is a lot more complicated, although the numerical recipe
involved is usually not very different. The emphasis herein is to critically demonstrate
applications of the discussed mathematical methods rather than learning the technical
use of the software
• The nature of the material requires active participation of the student. Therefore,
study and perform the numerical examples on your own and expand problems by
altering parameters and methods.
Notation:
◦ Uppercase, boldface: Matrices. e.g. M
◦ Lowercase, boldface: vectors. e.g. v
◦ Lowercase, italics: scalars. e.g. f
Contents
1 INTRODUCTION: STATISTICS AND PROBABILITY CONCEPTS 1
2 PROBABILITY 5
2.1 Sample spaces and events 6
2.2 Combination rules 7
2.3 Probability laws 10
2.4 Conditional probability and independent events 12
3 RANDOM VARIABLES AND DISTRIBUTIONS 15
3.1 Discrete distributions 16
3.1.1 Calculations with discrete distribution functions . . . . . . . . . . . . 18
3.1.2 Expected value and other parameters of a discrete distribution . . . . 18
3.1.3 The geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.4 The binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.5 The Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Continuous distributions 30
3.2.1 Expected value and other parameters of a continuous distribution . . 32
3.2.2 Graphical representation of mean, median, and mode of distribution . 34
3.2.3 The normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Joint distributions 40
3.3.1 Discrete 2-D random variables and distributions . . . . . . . . . . . . 40
3.3.2 Continuous 2-D random variables and distributions . . . . . . . . . . 41
4 STATISTICAL INFERENCE 48
4.1 Descriptive statistics 48
4.2 Graphical methods for data description 51
4.2.1 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.2 Box-and-whisker plots (Box-plots) . . . . . . . . . . . . . . . . . . . . 53
4.2.3 Pie charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Analytical methods for data representation 54
4.3.1 Sample Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 The Central Limit Theorem 55
4.5 Point estimation 64
4.5.1 Estimators for µ, p and σ 2 . . . . . . . . . . . . . . . . . . . . . . . . 64
4.6 Interval estimation 68
4.6.1 Estimate confidence interval for µ . . . . . . . . . . . . . . . . . . . . 70
4.6.2 Estimate confidence interval for p . . . . . . . . . . . . . . . . . . . . 76
4.6.3 Estimate confidence interval for σ, σ 2 . . . . . . . . . . . . . . . . . . 76
4.7 Hypothesis testing 82
4.7.1 Possible Errors In Testing A Statistical Hypothesis . . . . . . . . . . 84
4.7.2 Significance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.7.3 Hypothesis and significance tests on µ . . . . . . . . . . . . . . . . . 86
4.7.4 Hypothesis and significance tests on proportion p . . . . . . . . . . . 89
4.7.5 Hypothesis and significance tests on σ 2 , σ . . . . . . . . . . . . . . . 89
4.8 Comparing population parameters 90
4.8.1 Point estimation of µ1 − µ2 . . . . . . . . . . . . . . . . . . . . . . . 90
4.8.2 Simplified confidence interval on µ1 − µ2 (or p1 − p2 ) for large samples 91
4.8.3 Confidence interval on µ1 − µ2 . . . . . . . . . . . . . . . . . . . . . . 92
4.8.4 Hypothesis and significance tests on σ12 − σ22 . . . . . . . . . . . . . . 93
4.8.5 Paired observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5 REGRESSION AND CORRELATION 96
5.1 Linear regression = linear least squares 98
5.1.1 Properties of least-squares estimators . . . . . . . . . . . . . . . . . . 102
5.1.2 Confidence intervals and hypothesis testing in linear least squares . . 104
5.1.3 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2 Multiple linear regression 110
5.3 General least squares 111
5.4 Polynomial least squares 115
5.5 Multiple linear least squares 117
5.6 Confidence intervals in multiple linear regression 119
6 STATISTICAL QUALITY CONTROL 122
6.1 Background 122
6.2 Quality, hypothesis testing, and Shewhart charts 122
6.3 Process capability and six-sigma 125
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Figure 1: Graphical representation methods for Particle Size Distribution; Histogram and Cumu-
lative Arithmetic Curve
(a) (b)
Ideal
Actual
Channeling
Channeling
Dead Zone
(c) (d)
By Passing
By Passing
Dead Zones
Dead Zones
(e) (f)
Figure 2: Observed residence time distributions (RTD). (a) RTD for near plug-flow reactor; (b)
RTD for near perfectly mixed CSTR, (c) Packed-bed reactor, (d) RTD for packed-bed reactor in
(c), (e) CSTR with short-circuiting flow (bypass) and dead zone, (f) RTD for CSTR in (e).
∗
H. Brittain, Pharmaceutical Technology, 2002 and H. S. Fogler, Elements of Chemical Reaction Engi-
neering
-1-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
4
He
20
Ne
40
Ar
132
Xe
Speed (m/s)
A paper company receives 200,000 cartons from a supplier. It has been agreed originally
that the shipment of cartons should contain no more than 10 percent defective items. In
practice the quality-assurance group could sample n cartons, selected randomly, and either
accept or reject the shipment according to the number of defective items found in the sample.
How should just a test be designed and what are the associated errors with the choice made?
Two samples were selected from different locations in a plastic film sheet. The thickness of
the respective samples was measured at 10 close but equally spaced points as:
Sample 1
1.473 1.484 1.484 1.425 1.448 1.367 1.276 1.485 1.390 1.350
Sample 2
1.310 1.501 1.485 1.435 1.348 1.417 1.500 1.469 1.474 1.452
-2-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Pairs of pipes have been buried in 11 different locations to determine corrosion on nonbitu-
minous pipe coatings for underground use. One type includes a lead-coated steel pipe and
the other a bare steel pipe. The extent of corrosion on the 11 pairs has been determined as:
Soil type A B C D E F G H I J K
Lead-coated steel pipe 27.3 18.4 11.9 11.3 14.8 20.8 17.9 7.8 14.7 19.0 65.3
Bare steel pipe 41.4 18.9 21.7 16.8 9.0 19.3 32.1 7.4 20.7 34.4 76.2
Gilliland and Sherwood (1934) obtained mass transfer data for the evaporation of nine dif-
ferent liquids falling down the inside of a vertical wet-wall column into a turbulent air stream
flowing counter current to the liquid. They considered several air flow rates. The data in
the following table represent a relatively small sample of the data reported by these authors.
where Sh, Re, and Sc are the Sherwood, Reynolds, and Schmidt dimensionless numbers,
respectively. It is postulated that
Sh = B1 ReB2 ScB3
- What are the best values of the parameters B1, B2, and B3?
-3-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Vista Chemical Co. produces VCM in Lake Charles, LA via a continuous process. Finished
VCM is shipped to Vista’s plants in Aberdeen, MS, and Oklahoma City, OK to make poly-
vinylchloride (PVC) resin. One of the final steps in the VCM process is a caustic soda
(NaOH) “wash” to remove by-products. Following the caustic wash, the VCM passes through
a knock-out drum to remove any entrained caustic. Caustic removed from the VCM stream
gradually accumulates in the knock-out drum; the drum is periodically drained, based on
the level as determined visually from a sight glass on the drum. Then knock-out drum is not
100% effective in removing entrained caustic from the VCM stream. Above 1.0 ppm, residual
caustic in the finished product severely impacts numerous PVC resign quality parameters.
Residual caustic in the finished VCM stream is sampled 4 times/day and tracked. The upper
specification limit (USL) is 1.0 ppm.
• What criteria should be used to monitor this process and avoid exceeding the 1.0 ppm
limit, after 15 data points have been collected (see below)?
-4-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
2 PROBABILITY
Course Learning Objectives
Explain the concept of discrete, continuous, joint probability distributions
• Apply combination rules for events and enumerate permutations and combinations
when sample space/events known
Probability
Personal Relative Frequency Classical
Example Example Example
Percent chance of suc- Out of 1000 plastic cups 4 Probability of getting a 3 after
cess for a first time were defective. Chance of find- tossing a dice is 1/6. (Assum-
venture, e.g. selling ing defective cups in future ing all six numbers are equally
lettuce on the web. batches is ≈ 4/1000. likely (?) to appear).
(Level of Belief)
# of times # of ways
event A occured event A can occur
P [event A] ≈ P [A] ≈
# of times # of ways
experiment was run experiment can proceed
-5-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
S = Sample Space
= Set of all possible outcomes for the experiment.
Sample point = an element of S
Event = any subset of S.
Impossible Event = empty space ∅
Event: 1/6
Event: Getting at least one 6 = {1/6, 2/6, 3/6, 4/6, 5/6, 6/6, 6/1, 6/2, 6/3, 6/4, 6/5}
-6-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Definition 2: Event A or B
A ∪ B (Union) Graphically:
A ∩ B (Intersection) Graphically:
-7-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Definition 7: Combination
Selection of objects without regard to other.
-8-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
n Pn = n!
n!
n Pr = = n(n − 1)(n − 2) . . . (n − r + 1)
(n − r)!
By Theorem 1,
3 P3 = 3! = 6
By Theorem 2,
5!
5 P2 = = 20
3!
By Theorem 3,
5! 5·4·3·2·1
n = 5, r = 2 ⇒ n Cr = = = 10
2!3! 2·1·3·2·1
-9-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
1. P [S] = 1
2. P [A] ≥ 0 ∀A ⊂ S
Graphically:
P [∅] = 0
-10-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
P [A0 ] = 1 − P [A]
Graphically:
Graphically:
-11-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
P [A1 ∩ A2 ]
P [A2 |A1 ]=
ˆ
P [A1 ]
Graphically:
• Are A1 , A2 independent?
P [A1 ]P [A2 ] = (0.32)(0.16) = 0.05
⇒ not independent
P [A1 ∩ A2 ] = 0.10
• What is the probability (Toxic levels of Pb are found after toxic levels of Hg are found)?
P [A2 ∩ A1 ] 0.10
P [A1 |A2 ] = = = 0.63
P [A2 ] 0.16
-12-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
A1 , A2 , . . . , Am independent ⇐⇒P
ˆ [A1 ∩ A2 ∩ . . . ∩ Am ] = P [A1 ]P [A2 ] . . . P [Am ]
Graphically (sets,trees) :
-13-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
• What is P [TA]?
P [TA] = P [TA|A]P [A] + P [TA|B]P [B] + P [TA|AB]P [AB ] + P [TA|O]P [O] = 0.37
• What is P [A|TA]?
P [TA|A]P [A]
P [A|TA] = = 0.92
P [TA]
Graphically (sets,trees) :
-14-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
-15-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
f (x)=P
ˆ [X = x] (1)
• is f (x) a DDF?
∞ x
X X 1 1/2
f (x) ≥ 0, f (x) = = =1 (How ?)
all x x=1 2 1 − 1/2
Figure 4: Example 23 f (x) in a Excel spreadsheet for the first ten values of x where f (x) 6= 0
• Check:
Mathematica
Sum [(1/2)ˆ x , {x , 1 , Infinity }]
-16-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
ˆ [X ≤ x]
F (x)=P (2)
and
f (xn ) = F (xn ) − F (xn−1 ) (4)
k=x k 1 2 3 x−1 x
X 1 1 1 1 1 1
F (x) = = + + + ... + +
k=1 2 2 2 2 2 2
Figure 5: Example 24 F (x) in a Excel spreadsheet for the first ten values of x
-17-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
• P [X ≥ 6] =?
P [X ≥ 6] = P [{X = 6} ∪ {X = 7} ∪ {X = 8} ∪ {X = 9} ∪ {X = 10} + . . .] =
= P [X = 6] + P [X = 7] + P [X = 8] + P [X = 9] + P [X = 10] + . . .
or
P [X ≥ 6] = 1 − P [A0 ] = 1 − P [X ≤ 5] = 1 − F (5) = 1 − 0.96785 = 0.03215
| {z }
A
• P [3 ≤ X ≤ 5] =?
x 0 1 2 3 4 5
f (x) 0.40 0.36 0.16 0.05 0.02 0.01
x 0 1 2 3 4 5
F (x) 0.40 0.76 0.92 0.97 0.99 1
• P [X ≤ 2] =?
• P [X ≥ 0] =?
• P [1 ≤ X ≤ 3] =?
- Why? Information about a distribution (and therefore the probabilities of values of the
random variable X by knowing just a few number of parameters
Per capita income, Rich-Poor income gap, grade point average, points per game, average
pore size.
-18-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
∞ 2 3
X 1 1 1 1/2
µ= xf (x) = 1 + 2 +3 + ... = =2
x=1 2 2 2 (1 − 1/2)2
• Check:
Mathematica
Sum [( a )ˆ x , {x , 1 , Infinity }]
Sum [ x *( a )ˆ x , {x , 1 , Infinity }]
E[cX] = cE[X]
E[X + Y ] = E[X] + E[Y ] (6)
E[c] = c (7)
-19-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
∞ 2 3
1 1 1
σ2 = (x − µ)2 f (x) = (1 − 2)2 + (2 − 2)2 + (3 − 2)2
X
+ ... = 2
x=1 2 2 2
Proof:
Var(c) = 0
Var(cX) = c2 Var(X)
X Y independent ⇒ Var(X + Y ) = Var(X) + Var(Y )
Proof:
ÒP Hwnthi: Is σX+Y = σX + σY ?
-20-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Px=5
• µ= x=0 xf (x) = 0.96 imperfections on the average per 10 meters of synthetic fiber
Px=5
• Variance σ 2 = − µ)2 f (x)=1.0984
x=0 (x
√
• Standard deviation = σ 2 = 1.048
Px=5 2
• Alternative formula for σ 2 = x f (x) − µ2
x=0 = 2.02 − 0.962 = 1.0984
Px=5
ÒP Hwnthi: What is the value of E(X − µ), that is x=0 (x − µ)f (x) ?
MATLAB
x =[0:5];
fx =[0.4 ,0.36 ,0.16 ,0.05 ,0.02 ,0.01];
mu = sum ( x .* fx )
sigma2 = sum (( x - mu ).*( x - mu ).* fx )
sigma2alt = sum ( x .* x .* fx ) - mu * mu
Mathematica
x = Range [0 , 5];
fx = {0.4 , 0.36 , 0.16 , 0.05 , 0.02 , 0.01};
mu = x . fx
sigma2 = (( x - mu )ˆ2). fx
sigma2alt = ( x ˆ2). fx - mu * mu
-21-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
After several experiments, a microchip manufacturer found that the probability of producing
a defective wafer D is p = 0.010. After starting the production line, the manufacturer wants
to know what is the probability of producing a defective D wafer exactly after 1, 2, 3, . . . etc
good wafers G have been produced.
P [X = 1] =
ˆ P [X = D] = p = 0.010
ˆ P [X = GD] = (1 − p)p = (0.990)(0.010) = 0.0099
P [X = 2] =
ˆ P [X = GGD] = (1 − p)(1 − p)p = (1 − p)2 p = (0.990)2 (0.010)
P [X = 3] =
..
.
ˆ P [X = |GG.{z. .G} D] = (1 − p)x−1 p
P [X = x] =
x−1
- Bernoulli trials
- Outcome of each Bernoulli trial is one of two alternatives with probabilities p, q = 1−p
(0 < p < 1)
- X =number of Bernoulli trial at which alternative 1 occurs for the first time
1 1−p
f (x) = (1 − p)x−1 p, x = 1, 2, 3, . . . , µ= , σ2 =
p p2
ÒP Hwnthi: Is
P
all x f (x) = 1?
-22-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
MATLAB
x =1:1:15;
p =0.3; y03 = p *(1 - p ).ˆ( x -1);
p =0.5; y05 = p *(1 - p ).ˆ( x -1);
p =0.8; y08 = p *(1 - p ).ˆ( x -1);
0.4
0.6
0.25 0.35
0.5
0.3
0.2
0.25 0.4
0.15
0.2
0.3
0.1 0.15
0.2
0.1
0.05 0.1
0.05
0 0 0
0 10 20 0 10 20 0 10 20
-23-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
µ = np σ 2 = np(1 − p)
ÒP Hwnthi: Is
P
all x f (x) = 1?
n x
p (1 − p)n−x = (p + (1 − p))n = . . .
P P
(Hint: all x f (x) = all x
x
-24-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
What is the expected number of defective items in a lot of 1000 items produced by the above
machine?
µ = np = (1000)(0.05) = 50
What is the probability that the number of defective items in a lot of 1000 items produced by
the previous machine will be in the intervals [µ − σ, µ + σ], [µ − 2σ, µ + 2σ], [µ − 3σ, µ + 3σ]?
1
P [µ − kσ < X < µ + kσ] ≥ 1 −
k2
-25-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
The binomial distribution can be calculated using Eq. 12 (and the cumulative by summing
all previous terms). However, because it is so common, textbooks offer tables (careful, tables
often offer cumulative and the distribution can be calculated using Eq. 4).
x f(x) F(x)
0 1.336749E-09 1.336749E-09 n= 40
1 3.564665E-08 0.000000037 p= 0.4
2 4.634065E-07 5.003899E-07
3 3.91321E-06 4.4136E-06
4 2.413146E-05 2.854506E-05
5 0.000115831 0.000144376
6 0.000450454 0.00059483
7 0.001458613 0.002053443 f(x)
8 0.004011185 0.006064628 0.14
9 0.009507995 0.015572624
0.12
10 0.019649857 0.03522248
0.1
11 0.035727012 0.070949492
12 0.057560186 0.128509678 0.08
13 0.082650523 0.211160202 0.06
14 0.106264959 0.31742516
0.04
15 0.122795063 0.440220224
16 0.127911524 0.568131748 0.02
17 0.120387317 0.688519065 0
18 0.102552159 0.791071224 0 4 8 12 16 20 24 28 32 36 40
35 6.040698E-10 1
0
36 5.593239E-11 1 0 4 8 12 16 20 24 28 32 36 40
37 4.031163E-12 1 F(x)
38 2.121665E-13 1
39 7.253555E-15 1
40 1.208926E-16 1
Figure 6: Binomial distribution and cumulative binomial distribution for n = 40 and p = 0.4
using Excel function BINOM.DIST
-26-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
MATLAB
n =40
p =0.25:0.25:0.75
for k =1:3
for i =1:1:40
x ( i )= i ;
x2 ( i )= i * i ;
y (i , k )= nchoosek (n , i ) * p ( k )ˆ i * (1 - p ( k ))ˆ( n - i );
end
F (: , k )= cumsum ( y (: , k )); % Cumulative distribution
m = sum (x ’.* y (: , k )) % Expected values for variable x
var = sum ( x2 ’.* y (: , k )) - m * m % Variance
end
figure
subplot (321); stem (x , y (: ,1)); legend ( ’p =0.25 ’)
subplot (322); stem (x , F (: ,1))
subplot (323); stem (x , y (: ,2)); legend ( ’p =0.50 ’)
subplot (324); stem (x , F (: ,2))
subplot (325); stem (x , y (: ,3)); legend ( ’p =0.75 ’)
subplot (326); stem (x , F (: ,3))
0.2 1
p=0.25
0.8
0.15
0.6
0.1
0.4
0.05
0.2
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
0.2 1
p=0.50
0.8
0.15
0.6
0.1
0.4
0.05
0.2
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
0.2 1
p=0.75
0.8
0.15
0.6
0.1
0.4
0.05
0.2
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
-27-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
e−k k x
f (x) = , x = 0, 1, 2, 3, . . . k>0 (13)
x!
µ=k σ2 = k
e−k kx P∞ kx
ÒP Hwnthi: Is = e−k
P P P
all x f (x) = 1? (Hint: all x f (x) = all x x! x=0 x! = . . .)
108 bacterial cells multiply once. The probability that a mutant resistant to antibiotics will
result from each multiplication is 10−8 . What is the probability that when all 108 bacteria
multiply once there will be (a) no resistant mutants, (b) at least one resistant mutant in the
offspring?
e−1 10
n = 108 , p = 10−8 ⇒ k = np = 1 ⇒ f (0) = = 0.37
0!
P [At least one resistant mutant] = 1 − f (0) = 0.63
-28-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Figure 7: Binomial distribution and Poisson distribution n = 40 and p1 = 0.4 and p2 = 0.04
using Excel functions BINOM.DIST and POISSON.DIST
It is common for textbooks to offer tables of the cumulative Poisson distribution. Again
the density distribution can be calculated using Eq. 4 given tables or directly from the
formula in Eq. 13.
-29-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
x∈I⊂<
P [X = x] = 0 (14)
X=Pb in gasoline
12.5x − 1.25, 0.1 ≤ x ≤ 0.5
f (x) =
0, elsewhere
R∞
Is: −∞ f (x)dx = 1?
R 0.3
P [0.2 ≤ x ≤ 0.3] = 0.2 f (x)dx = = 0.1875
Graphically:
-30-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
ˆ [X ≤ x]
F (x)=P (16)
and
dF
f (x) =
dx
Proof: Based on the laws of probabilities
Graphically:
-31-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Per capita income, Rich-Poor income gap, grade point average, average pore size.
R∞
provided −∞ |x|f (x)dx < ∞
E[cX] = cE[X]
E[X + Y ] = E[X] + E[Y ] (18)
E[c] = c (19)
R∞
provided −∞ |H(x)|f (x)dx < ∞
-32-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Proof:
Var(c) = 0 (23)
Var(cX) = c2 Var(X) (24)
X Y independent ⇒ Var(X + Y ) = Var(X) + Var(Y ) (25)
Proof:
ÒP Hwnthi: Is σX+Y = σX + σY ?
-33-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
- Mean
F(x)
f (x) 1.0
0.8
0.6
0.4
0.2
x x
(point of balance)
- Median
F(x)
f (x) 1.0
0.5
x x
(50/50 area split)
- Mode
F(x)
f (x) 1.0
Mode x Mode x
(peak point) (inflection point)
-34-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
1 1 x−µ 2
f (x) = √ e− 2 [ σ ] (27)
σ 2π
∞ 2
e−x dx by computing the double integral I 2 =
R
(Hint: Compute first the integral I = −∞
R ∞ R ∞ −(x2 +y2 )
−∞ −∞ e dxdy in polar coordinates, where x = r cos θ, y = r sin θ ⇒ x2 + y 2 = r2 .
Recall the integration formula for coordinate change in double integrals.)
x−µ
Proof: Set z = σ
⇒ dx = σdz.
R∞ − z2
2 1 Z ∞ − z2 1 Z ∞ − z2
Then, E[X] = √1 dz = µ √ e dz +σ √
−∞ (µ + σz)e ze 2 dz = µ
2
2π
|
2π −∞
{z } |
2π −∞
{z }
=1 (why?) =0 (why?)
1 x−µ 2
− µ)2 e− 2 [ ] dx
R∞
Var(X) = E[(X − µ)2 ] = √1
−∞ (x
σ
σ 2π
x−µ
Set z = σ
⇒ dx = σdz.
2 R∞ z2 2 R∞ z2
√σ 2 − 2 √σ − 2
Then, E[(x − µ)2 ] = 2π −∞ z e dz = 2π −∞ z · z e dz
z2 z2
Integrate by parts: u = z ⇒ du = dz, ν = −e− 2 ⇒ dν = ze− 2 to get
∞
2
Z ∞
z2
σ2 − z2
e− 2 dz = σ2
Var(X) = √ −ze +
2π
−∞
−∞
√
| {z } | {z }
=0 = 2π
-35-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Z ∼ N (0; 1)
0.4 0.9987
1
0.9772
0.8 0.8413
0.3
0.6
f (z)
0.4
0.1
0.2 0.1587
0.02775
0 0
-4 -2 0 2 4 -4 -2 0 2 0.00135 4
z z
• The probability that X will fall between x1 and x2 can be written as the probability
that Z will be between z1 = x1σ−µ and z2 = x2σ−µ for ANY normal distribution:
1 Z x2 − 12 [ x−µ 2
σ ] dx = √
1 Z z2 − 1 z 2
P [x1 < X < x2 ] = √ e e 2 dz = P [z1 < Z < z2 ]
σ 2π x1 2π z1
Let X ∼ N (1; 0.0625) (µ = 1, σ = 0.25). Find P [0.9 ≤ X ≤ 1.5] using tables (Table 1) and
compare to:
MATLAB: normcdf(2,0,1)-normcdf(-0.4,0,1)
Mathematica:
CDF[NormalDistribution[0,1],2]-CDF[NormalDistribution[0,1],-0.4]
-36-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Table 1: Cumulative standard normal distribution. Only values for z > 0 are provided since if
z < 0 then P [Z ≤ z] = P [Z ≥ −z] = 1 − P [Z ≤ −z] where −z > 0.
-37-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
MATLAB
% Plot three normal distributions
% with mean at zero and different standard deviation
x = -10:0.1:10;
subplot (121);
title ( ’ Normal ␣ distributions ␣ with ␣ different ␣ \ sigma ’ );
plot (x , normpdf (x ,0 ,1) , ’r ’ ,x , normpdf (x ,0 ,2) , ’b ’ ,x , normpdf (x ,0 ,3) , ’g ’)
legend ( ’\ sigma =1 ’ , ’\ sigma =2 ’ , ’\ sigma =3 ’)
subplot (122);
title ( ’ Cumulative ␣ normal ␣ distributions ␣ \ n ␣ with ␣ different ␣ \ sigma ’ );
plot (x , normcdf (x ,0 ,1) , ’r ’ ,x , normcdf (x ,0 ,2) , ’b ’ ,x , normcdf (x ,0 ,3) , ’g ’)
legend ( ’\ sigma =1 ’ , ’\ sigma =2 ’ , ’\ sigma =3 ’)
0.4 1
0.9
0.35
σ=1
0.8 σ=1
0.3 σ=2
σ=2
σ=3 0.7
σ=3
0.25
0.6
0.2 0.5
0.4
0.15
0.3
0.1
0.2
0.05
0.1
0 0
−10 −5 0 5 10 −10 −5 0 5 10
ÒP Hwnthi: Using Excel, plot the binomial distribution for (a) n = 20, p = 0.05, (b)
n = 200, p = 0.05, (c) n = 1000, p = 0.05, (d) n = 5000, p = 0.05, and compare it to the
normal distribution with corresponding average and variance.
-38-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
MATLAB
% Plot standard normal distribution ranges
% 1 sigma , 2 sigma and 3 sigma
x = -4:0.1:4;
x3 = -3:0.1:3; % 3 sigma range
x2 = -2:0.1:2; % 2 sigma range
x1 = -1:0.1:1; % 1 sigma range
figure
subplot (121); title ( ’ 3\ sigma , ␣ 2\ sigma ␣ and ␣ 1\ sigma ␣ ranges ’ ); hold on
plot (x , normpdf (x ,0 ,1) , ’k ’)
area ( x3 , normpdf ( x3 ,0 ,1) , ’ Facecolor ’ , ’r ’)
area ( x2 , normpdf ( x2 ,0 ,1) , ’ Facecolor ’ , ’b ’)
area ( x1 , normpdf ( x1 ,0 ,1) , ’ Facecolor ’ , ’g ’)
hold off
subplot (122); title ( ’ Cumulative ␣ standard ␣ normal ␣ distribution ’ ); hold on ;
plot (x , normcdf (x ,0 ,1) , ’k ’)
plot (x , ones ( length ( x ))* normcdf (+0 ,0 ,1) , ’k ’)
plot (x , ones ( length ( x ))* normcdf ( -3 ,0 ,1) , ’r ’)
plot (x , ones ( length ( x ))* normcdf (+3 ,0 ,1) , ’r ’)
plot (x , ones ( length ( x ))* normcdf ( -2 ,0 ,1) , ’b ’)
plot (x , ones ( length ( x ))* normcdf (+2 ,0 ,1) , ’b ’)
plot (x , ones ( length ( x ))* normcdf ( -1 ,0 ,1) , ’g ’)
plot (x , ones ( length ( x ))* normcdf (+1 ,0 ,1) , ’g ’)
hold off
0.9
0.35
0.8
0.3
0.7
0.25
0.6
0.2 0.5
0.4
0.15
0.3
0.1
0.2
0.05
0.1
0 0
−4 −2 0 2 4 −4 −2 0 2 4
-39-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
1. fXY (x, y) ≥ 0
P P
2. all x all y fXY (x, y) = 1
\Y
P
0 1 2 3 fX (x) = all y fXY (x, y)
X\
0 0.840 0.030 0.020 0.010 0.900
1 0.060 0.010 0.008 0.002 0.080
2 0.010 0.005 0.004 0.001 0.020
P
fY (y) = all x fXY (x, y) 0.910 0.045 0.032 0.013 1.000
-40-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
X
fX (x)=
ˆ fXY (x, y)
all y
X
fY (y)=
ˆ fXY (x, y)
all x
15
P [9 ≤ x ≤ 10 and 125 ≤ y ≤ 140] = = 240
Graphically:
-41-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Z ∞
fX (x)=
ˆ fXY (x, y)dy
−∞
Z ∞
fY (y)=
ˆ fXY (x, y)dx
−∞
X, Y independent ⇐⇒f
ˆ XY (x, y) = fX (x)fY (y)
R∞ R∞
provided −∞ −∞ |H(x, y)|fXY (x, y)dydx < ∞
-42-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
E[X] = = 0.12
E[Y ] = = 0.148
E[X + Y ] = = 0.268
E[XY ] = = 0.064
E[X] = = 9.5
E[Y ] = = 180
E[X + Y ] =?
E[XY ] = = 1710
Cov(X, Y ) = = 0.046
Cov(X, Y ) = =0
2 2 2 2 2
σaX+bY +c = a σX + b σY + 2abCov(X, Y ) (28)
Contrast to Eq. 25
-43-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Cov(X, Y ) = 0 ; X, Y independent
\X -2 -1 1 2 fY (y)
Y\
1 0 1/4 1/4 0 1/2
4 1/4 0 0 1/4 1/2
fX (x) 1/4 1/4 1/4 1/4 1
Cov(X, Y )
ρXY =
ˆq
Var(X)Var(Y )
−1 ≤ ρXY ≤ 1
or
|ρXY | ≤ 1
E[(αW − Z)2 ] ≥ 0 ⇒
E[α2 W 2 − 2αW Z + Z 2 ] ≥ 0 ⇒
α2 E[W 2 ] − 2αE[W Z] + E[Z 2 ] ≥ 0
-44-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
E[W Z]
Let α = E[W 2 ]
. Then, from the above inequality
|ρXY | = 1 ⇔ Y = β0 + β1 X, β0 , β1 ∈ <, β1 6= 0
⇒ Y = |{z}
α X + (µY − αµX )
| {z }
β1 β0
Graphically:
y
y
x x
y
x x
y
-45-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
\X -2 -1 1 2 fY (y)
Y\
E[Y ] = 25 , E[X] = 0, E[XY ] = 0 ⇒
1 0 1/4 1/4 0 1/2
Cov(X, Y ) = 0 ⇒ ρXY = 0.
4 1/4 0 0 1/4 1/2
But Y = X 2 !
fX (x) 1/4 1/4 1/4 1/4 1
ρ=0
3
ρXY = 0 ⇒ X, Y uncorrelated < unrelated
y
2
m
1
X, Y not related linearly
0
-2 -1 0 1 2
x
fXY (x, y)
\Y 0 1 2 3 fX (x)
X\
0 0.840 0.030 0.020 0.010 0.900
1 0.060 0.010 0.008 0.002 0.080
2 0.010 0.005 0.004 0.001 0.020
fY (y) 0.910 0.045 0.032 0.013 1.000
E[X 2 ] = = 0.16
E[Y 2 ] = = 0.29
E[X] = = 0.12
E[Y ] = = 0.148
Var(X) = E[X 2 ] − E[X]2 = = 0.146
Var(Y ) = E[Y 2 ] − E[Y ]2 = = 0.268
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = = 0.046
ˆ √ Cov(X,Y )
⇒ ρXY = =√ 0.046
= 0.23
Var(X)Var(Y ) (0.146)(0.268)
-46-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
P [X = x and Y = y]
fX|y (x) = P [X = x|Y = y] =
P [Y = y]
| {z }
X, Y discrete random variables
\Y 0 1 2 3 fX (x)
X\
0 0.840 0.030 0.020 0.010 0.900
1 0.060 0.010 0.008 0.002 0.080
2 0.010 0.005 0.004 0.001 0.020
fY (y) 0.910 0.045 0.032 0.013 1.000
\Y 0 1 2 3 Sum
X\
0 0.923 0.667 0.625 0.769 2.984 X
1 0.066 0.222 0.250 0.154 0.692 ÒP Hwnthi: Verify: fX|y (x) = 1
all x
2 0.011 0.111 0.125 0.077 0.324
Sum 1.000 1.000 1.000 1.000 4.000
\Y 0 1 2 3 Sum
X\
0 0.933 0.033 0.022 0.011 1.000 X
1 0.750 0.125 0.100 0.025 1.000 ÒP Hwnthi: Verify: fY |x (y) = 1
all y
2 0.500 0.250 0.200 0.050 1.000
Sum 2.183 0.408 0.322 0.086 3.000
-47-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Example 57 Populations
-48-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Example 59 Sample
-49-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Statistics:
X1 + . . . + X42
42
min{X1 + . . . + X42 }
max{X1 + . . . + X42 }
- Is X1 − µ a statistic?
Following Example 2 you test 20 cartons (random sample). Each carton can be defective
or non-defective (independent random variable): If X is the number that are defective then
you can form an estimate of the proportion (needs to be less than 10%) as:
X X
=
n 20
Outcome: F F T F F F F F T F F F T F F F F F F F
X : 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
Datye et al.∗ used electron microscopy to examine the particle size distribution in samples of
Pd/Al2 O3 supported metal catalysts and make conclusions about the sintering mechanism.
Suppose that Table 3 presents measurements on 300 such particles. What can we learn?
∗
Datye et al., Catalysis Today, 111, 59 (2006)
-50-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
148.1 122.0 130.9 133.8 131.3 117.6 131.5 143.4 128.9 124.5 131.8 131.3
163.9 134.0 122.5 125.7 132.1 143.7 128.7 134.9 117.8 177.6 144.9 140.4
132.7 141.9 107.4 114.6 133.5 146.7 155.3 131.9 165.7 123.2 129.6 153.9
121.0 131.8 120.9 116.0 144.7 130.6 156.2 122.6 153.7 135.5 129.7 163.4
125.5 144.1 130.2 150.1 162.0 172.9 114.2 127.9 137.7 162.2 118.9 116.9
139.4 186.9 164.6 136.8 159.1 151.2 143.7 130.2 123.8 129.3 132.7 123.2
152.0 138.8 145.1 137.6 128.6 130.3 136.4 138.5 154.8 116.8 138.7 127.0
131.7 135.7 144.0 124.8 133.3 126.3 122.3 131.0 117.5 141.1 123.3 144.1
137.0 132.6 125.1 136.8 129.7 144.0 136.7 125.7 153.6 119.6 168.2 131.9
172.0 114.9 127.4 161.4 143.0 115.9 127.8 151.3 136.3 122.8 127.0 128.6
134.4 142.8 127.3 127.8 106.6 145.1 134.8 142.0 123.3 137.6 135.1 102.2
126.7 141.9 129.0 144.4 143.6 138.1 129.6 160.3 129.1 147.3 135.5 134.0
137.7 127.7 109.7 120.7 127.6 134.2 124.9 124.9 135.7 169.1 136.8 138.9
111.5 144.2 139.0 153.3 142.2 120.6 119.0 146.8 146.6 139.5 117.9 125.5
134.8 127.1 183.8 134.6 138.1 173.8 140.9 124.3 130.6 141.6 166.5 120.2
118.7 144.0 142.7 146.0 153.2 129.9 137.8 139.1 132.2 133.3 127.2 133.7
153.5 133.4 132.8 119.9 122.4 139.7 153.4 139.1 124.6 156.0 124.4 117.6
154.9 133.2 139.4 113.6 161.4 173.7 128.1 123.4 148.9 138.2 166.0 149.9
135.6 154.6 124.2 133.8 114.0 138.2 134.9 137.9 152.8 122.0 123.4 130.6
143.1 117.9 145.1 167.9 154.7 155.3 114.9 126.5 140.4 124.0 158.3 130.6
154.9 130.5 150.7 154.0 124.9 141.8 112.5 138.1 138.9 143.6 150.5 130.1
137.7 131.2 137.3 148.1 139.6 145.1 134.3 130.8 146.3 165.7 122.4 138.3
139.4 153.1 164.4 154.5 145.3 150.1 120.9 118.1 149.7 146.4 155.5 128.3
118.5 175.9 141.9 117.2 117.7 142.8 134.8 143.3 123.1 146.0 133.2 120.7
148.9 155.5 124.0 123.6 140.9 145.8 121.2 154.7 150.1 142.9 131.2 135.4
120
130
140
150
160
170
180
190
200
100
190 1 0.00
195 0 0.00
200 0 0.00 Particle size (nm)
0 0.00
Total 300
-51-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Mathematica
ParticleSizes = Import [ " lognorm_particle . dat " ];
bins1 = BinCounts [ ParticleSizes , {90 , 200 , 10}];
bins2 = BinCounts [ ParticleSizes , {90 , 200 , 2}];
BarChart [ bins1 , PlotRange -> All , ChartLabels -> Range [90 , 200 , 10]]
BarChart [ bins2 , PlotRange -> All , ChartLabels -> Range [90 , 200 , 2]]
80 20
60 15
40 10
20 5
0 0
90 100 110 120 130 140 150 160 170 180 190 90
9294
969100
8102
104
106
108
110
112
114
116
118
120
122
124
126
128
130
132
134
136
138
140
142
144
146
148
150
152
154
156
158
160
162
164
166
168
170
172
174
176
178
180
182
184
186
188
190
192
194
196
198
Better alternative:
MATLAB
ParticleSizes = importdata ( ’ lognorm_particle . dat ’ );
bins1 =90:10:200; bins2 =90:1:200;
h1 = histcounts ( ParticleSizes , bins1 , ’ Normalization ’ , ’ probability ’ );
h2 = histcounts ( ParticleSizes , bins2 , ’ Normalization ’ , ’ probability ’ );
xmid1 = 0.5*( bins1 (1: end -1)+ bins1 (2: end ));
xmid2 = 0.5*( bins2 (1: end -1)+ bins2 (2: end ));
hold on
bar ( xmid1 , cumsum ( h1 ) , ’b ’ , ’ FaceAlpha ’ ,0.2); plot ( xmid2 , cumsum ( h2 ) , ’o - ’)
1.2
1
Cumulative relative frequency
0.8
0.6
0.4
0.2
0
80 100 120 140 160 180 200
Particle size (nm)
Figure 11: Cumulative histograms of relative frequencies with different bin sizes
-52-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
180
160
140
120
100
ÒP Hwnthi Many graphical methods exist to visualize and assess the distribution. Search
for “Quantile” plots to compare to normal distribution.
-53-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
X 1 + . . . + Xn
X̄ =
ˆ
n
Remark: µX 6= X̄
Remark 2: In case of Bernoulli trials, a sample proportion can be calculated as the mean
of a variable that takes values of 0 or 1 (see Example 62)
X̃
z }| {
X1 , . . . , X n+1 , . . . , Xn (n odd)
2
X1 , . . . , X n2 , X n2 +1 , . . . , Xn (n even)
| {z }
X̃
n n
!2
Xi2
X X
n − Xi
2 i=1 i=1
S =
n(n − 1)
ˆ p − Xq
R=X
where
Xp = max {xi }, Xq = min {xi }
1≤i≤n 1≤i≤n
-54-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Theorem 36: The Central Limit Theorem: Sample averages are close to normal
Let X1 , X2 , . . . , Xn be a random sample (i.e. a collection of independent variables, identically
distributed to each other and X; recall Definition 46). Then the statistic (Definition 47)
X̄ = X1 +...+X
n
n
(sample mean, Definition 48) approaches a normal distribution, as n → ∞,
2
with mean µ and variance σn . Equivalent formulation
!
X̄ − µ
lim √ ∼ N (0; 1)
n→∞ σ/ n
• ÒP Hwnthi What is the expected relative frequency for X = 2 and what for X = 10?
-55-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Figure 12: Example data from an experiment where 80 cards have been drawn with replacement.
The two histograms present frequencies for observed values of single cards and quadruple averages.
-56-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
MATLAB
% Draw number of cards with replacement and examine statistics of averages
num_cards =80;
x = floor ( rand (1 , num_cards )*13)+1; % draw 80 random numbers from 1 to 13
for i =1: num_cards ,
if x ( i ) > 10
x ( i )=10; % if value above 10 set it to 10 ( for J , Q , K )
end
end
s2x = var ( x ); sx = std ( x ); % Variance and standard deviation
sprintf ( ’ All ␣ cards : ␣ var = ␣ %f , ␣ std = ␣ % f ␣ \ n ’ ,s2x , sx )
axis tight ;
subplot (2 ,3 ,1) , stem ( xvec , histx ); title ( ’ All ␣ cards ’ );
subplot (2 ,3 ,2) , stem ( xvec , histx4 ); title ( ’ Quadruples ’ );
subplot (2 ,3 ,3) , stem ( xvec , histx8 ); title ( ’ Octuples ’ );
subplot (2 ,3 ,4) , stem ([1:10] , csumx );
subplot (2 ,3 ,5) , stem ([1:10] , csumx4 );
subplot (2 ,3 ,6) , stem ([1:10] , csumx8 );
ÒP Hwnthi What is your observations on the mean and variance of X and the averages
of quadruples and octuples?
-57-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
50
20
60
40
15
40 30
10
20
20
5
10
0 0 0
0 5 10 0 5 10 0 5 10
80 80 80
60 60 60
40 40 40
20 20 20
0 0 0
0 5 10 0 5 10 0 5 10
Figure 13: MATLAB example output showing histograms and cumulative distributions for 80
cards taken as single, quadruples and octuples
-58-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
1 0<x<1
f (x) =
0 else
µX = ,
2
σX = ,
σ
n=2 ⇒ σX̄ = √X
2
= ,
σ
n=5 ⇒ σX̄ = √X
5
= ,
σX
n = 20 ⇒ σX̄ = √
20
= ,
-59-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
MATLAB
% CLT and uniform distribution
num_sample =10000;
bin_vec =0.02:0.04:0.98;
x = rand ( num_sample ,1); % draw 10 ,000 random numbers
std ( x )
subplot (1 ,4 ,1) , hist (x , bin_vec )
legend ( ’n =1 ’ );
j =1;
for i =1:2: num_sample -1;
x2 ( j )= mean ( x ( i : i +1));
j = j +1;
end
std ( x2 )
subplot (1 ,4 ,2) , hist ( x2 , bin_vec )
legend ( ’n =2 ’ );
j =1;
for i =1:5: num_sample -4;
x5 ( j )= mean ( x ( i : i +4));
j = j +1;
end
std ( x5 )
subplot (1 ,4 ,3) , hist ( x5 , bin_vec )
legend ( ’n =5 ’ );
j =1;
for i =1:20: num_sample -19;
x20 ( j )= mean ( x ( i : i +19));
j = j +1;
end
std ( x20 )
subplot (1 ,4 ,4) , hist ( x20 , bin_vec )
legend ( ’n =20 ’ );
60
200 200 100
40
100 100 50
20
0 0 0 0
0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1
-60-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
5e−5x x>0
f (x) =
0 else
µX = ,
2
σX = ,
σ
n=2 ⇒ σX̄ = √X
2
= ,
σ
n=5 ⇒ σX̄ = √X
5
= ,
σX
n = 20 ⇒ σX̄ = √
20
= ,
-61-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
MATLAB
% CLT and 5 exp ( -5 x ) distribution
num_sample =20000;
axis tight ;
j =1;
for i =1:2: num_sample -1;
x2 ( j )= mean ( x ( i : i +1));
j = j +1;
end
std ( x2 )
subplot (1 ,4 ,2) , hist ( x2 ,50)
legend ( ’n =2 ’ );
j =1;
for i =1:5: num_sample -4;
x5 ( j )= mean ( x ( i : i +4));
j = j +1;
end
std ( x5 )
subplot (1 ,4 ,3) , hist ( x5 ,50)
legend ( ’n =5 ’ );
j =1;
for i =1:20: num_sample -19;
x20 ( j )= mean ( x ( i : i +19));
j = j +1;
end
std ( x20 )
subplot (1 ,4 ,4) , hist ( x20 ,50)
legend ( ’n =20 ’ );
3000 600
2000
2500 500
100
1500
2000 400
1500 300
1000
50
1000 200
500
500 100
0 0 0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3 0.4
-62-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Example 70 Significance of CLT: Are more experimental measurements better than one?
- Unknown distribution f (x) of experimental measurement error for the variable X with
average µ, standard deviation σX .
X1 +...+Xn
- Random sample: X̄ = n
.
σX
- Theorem 36 ⇒ µX̄ = µX , σX̄ = √
n
- We can make point estimates and interval estimates with specific confidence level and
perform hypothesis testing on means, variances and proportions.
-63-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
1. θ̂ unbiased estimator of θ
E[θ̂] = θ
-64-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Theorem 38: Variance of estimator of population average decreases with sample size
σ2 2
Var(X̄) = (≡ σX̄ )
n
Proof: " 2 #
h
2
i X 1 + X2 + . . . + Xn
Var(X̄) =E (X̄ − µ) =E −µ
n
1 h 2
i
= E (X 1 + X 2 + . . . + X n − nµ)
n2
1 h 2
i 1 2 σ2
= E ((X 1 − µ) + (X 2 − µ) + . . . + (X n − µ)) = nσ =
n2 n2 n
-65-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
X X np pq
P̂ = ⇒ µP̂ = E = = p, σP̂2 =
n n n n
Following Example 2 if 20 cartons are sampled and the outcome is as listed in Example 62
your estimate of the proportion is:
3
p̄ =
= 0.15
20
As we will see later, the sample size is too small to make an accurate estimate.
E[S 2 ] = σX
2
Proof: " n # n
" 2 #
2
X(Xi − X̄)2 1 X X1 + X2 + . . . + Xn
E[S ] = E = E Xi −
i=1 n−1 n − 1 i=1 n
n
1 X h
2
i
= E (nX i − X 1 − X 2 − . . . − X n )
(n − 1)n2 i=1
n2 Xi2 + X12 + X22 + . . . + Xn2
−2nXi X1 − 2nXi X2 − . . . − 2nXi Xn +
n
1 +2X1 X2 + 2X1 X3 + . . . + 2X1 Xn +
X
= E
(n − 1)n i=1
2
+2X2 X3 + . . . + 2X2 Xn +
...
+2Xn−1 Xn
E[n2 Xi2 ] + E[X12 ] + . . . + E[Xn2 ] − 2nE[Xi X1 ] − . . . − 2nE[Xi Xn ]
| {z }
n terms
n
1 X
+ 2E[X1 X2 ] + 2E[X1 X3 ] + . . . + 2E[X1 Xn ] +
=
(n − 1)n2
| {z }
n−1 terms
i=1
+ 2E[X2 X3 ] + . . . + 2E[X2 Xn ] + . . . + 2E[Xn−1 Xn ]
| {z } | {z }
n−2 terms 1 term
-66-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
n terms n−1 terms
z }| { z }| {
n2 E[X 2 ] + E[X 2 ] + . . . + E[X 2 ] − 2nE[X 2 ] − . . . − 2nE[X 2 ]
n
1
X
= =
(n − 1)n2
i=1
−2nE[X 2 ] + 2 ((n − 1) + (n − 2) + . . . + 2 + 1) E[X]2
| {z }
n(n−1)
2
1
2 2 2 2 2 2
= n n E[X ] + nE[X ] − 2n(n − 1)E[X] − 2nE[X ] + (n − 1)nE[X]
(n − 1)n2
1
= n(n2 E[X 2 ] − nE[X 2 ] − n(n − 1)E[X]2 ) = E[X 2 ] − E[X]2 = σ 2
(n − 1)n 2
-67-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Mathematica
ParticleSizes = Import [ " lognorm_particle . dat " ];
Mean [ ParticleSizes ]
Variance [ ParticleSizes ]
Sta ndardD eviati on [ ParticleSizes ]
P [L1 ≤ θ ≤ L2 ] = 1 − α
X̄ − µX
because √ is approximately standard normal.
σX / n √
Assume for now that σX ≈ sX = 10.01 = 3.16 (Example 75) ∗ . Then by rearranging:
∗
This temporary “cheating” will be resolved in the next section by the introduction of the T-distribution
-68-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
" #
s80 s80
P X̄ − 2 √ ≤ µX ≤ X̄ + 2 √ = 0.95
80 80
for the random variable X̄. Given that we have an experimental value for X̄, we can conclude
" #
s80 s80
P x̄80 − 2 √ ≤ µX ≤ x̄80 + 2 √ = 0.95 ⇒
80 80
" #
3.16 3.16
P 6.3 − 2 √ ≤ µX ≤ 6.3 + 2 √ = 0.95 ⇒ P [5.6 ≤ µX ≤ 7.0] = 0.95
80 80
This is a confidence interval for µX with confidence 95%. Thus beyond the value of 6.3
(Example 71) we can provide with 95% confidence based on the data in Figure 12 that the
true mean is between this interval (recall that the true mean is 6.538) or provide an estimate
of 6.3±0.7 with 95% confidence.
Confidence interval:
σ σ
X̄ − zα/2 √ < µ < X̄ + zα/2 √
n n
ÒP Hwnthi: Calculate similar confidence intervals for confidence levels 67% and ∼99.7%.
1. Construct random variable Y (θ) which has θ as only unknown parameter and known
distribution if θ is fixed
-69-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
X̄ − µ
X ∼ N (µ, σ 2 ) ⇒ √ follows the T-distribution with n − 1 degrees of freedom.
S/ n
Definition 55: Student’s T-distribution with γ degrees of freedom (DOF)
γ+1
Γ 2 1 γ
f (t) = √ γ+1 − ∞ < t < ∞, µ = 0, σ2 = , γ>2
Γ γ
πγ 1 +
2
t 2 γ−2
2
γ
R ∞ z−1 −t
where gamma function is defined as Γ(z)=
ˆ 0 t e dt. f (t) is a function of both t and γ.
Figure 14: T-distribution for various degrees of freedom. Note the shape change as γ increases.
The right figure introduces a notation of tr as the point where the area to the right is r.
i.e. 100(1 − α)% confidence bounds on µ are x̄ ± tα/2 √sn . As with the normal distribution,
Tables or software is required for efficient construction of confidence intervals.
-70-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
-71-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Degrees of P [T t]
freedom 0.6000 0.7500 0.9000 0.9500 0.9750 0.9900 0.9950 0.9990 0.9995
1 0.3249 1.0000 3.0777 6.3138 12.7062 31.8205 63.6567 318.3088 636.6192
2 0.2887 0.8165 1.8856 2.9200 4.3027 6.9646 9.9248 22.3271 31.5991
3 0.2767 0.7649 1.6377 2.3534 3.1824 4.5407 5.8409 10.2145 12.9240
4 0.2707 0.7407 1.5332 2.1318 2.7764 3.7469 4.6041 7.1732 8.6103
5 0.2672 0.7267 1.4759 2.0150 2.5706 3.3649 4.0321 5.8934 6.8688
6 0.2648 0.7176 1.4398 1.9432 2.4469 3.1427 3.7074 5.2076 5.9588
7 0.2632 0.7111 1.4149 1.8946 2.3646 2.9980 3.4995 4.7853 5.4079
8 0.2619 0.7064 1.3968 1.8595 2.3060 2.8965 3.3554 4.5008 5.0413
9 0.2610 0.7027 1.3830 1.8331 2.2622 2.8214 3.2498 4.2968 4.7809
10 0.2602 0.6998 1.3722 1.8125 2.2281 2.7638 3.1693 4.1437 4.5869
11 0.2596 0.6974 1.3634 1.7959 2.2010 2.7181 3.1058 4.0247 4.4370
12 0.2590 0.6955 1.3562 1.7823 2.1788 2.6810 3.0545 3.9296 4.3178
13 0.2586 0.6938 1.3502 1.7709 2.1604 2.6503 3.0123 3.8520 4.2208
14 0.2582 0.6924 1.3450 1.7613 2.1448 2.6245 2.9768 3.7874 4.1405
15 0.2579 0.6912 1.3406 1.7531 2.1314 2.6025 2.9467 3.7328 4.0728
16 0.2576 0.6901 1.3368 1.7459 2.1199 2.5835 2.9208 3.6862 4.0150
17 0.2573 0.6892 1.3334 1.7396 2.1098 2.5669 2.8982 3.6458 3.9651
18 0.2571 0.6884 1.3304 1.7341 2.1009 2.5524 2.8784 3.6105 3.9216
19 0.2569 0.6876 1.3277 1.7291 2.0930 2.5395 2.8609 3.5794 3.8834
20 0.2567 0.6870 1.3253 1.7247 2.0860 2.5280 2.8453 3.5518 3.8495
21 0.2566 0.6864 1.3232 1.7207 2.0796 2.5176 2.8314 3.5272 3.8193
22 0.2564 0.6858 1.3212 1.7171 2.0739 2.5083 2.8188 3.5050 3.7921
23 0.2563 0.6853 1.3195 1.7139 2.0687 2.4999 2.8073 3.4850 3.7676
24 0.2562 0.6848 1.3178 1.7109 2.0639 2.4922 2.7969 3.4668 3.7454
25 0.2561 0.6844 1.3163 1.7081 2.0595 2.4851 2.7874 3.4502 3.7251
26 0.2560 0.6840 1.3150 1.7056 2.0555 2.4786 2.7787 3.4350 3.7066
27 0.2559 0.6837 1.3137 1.7033 2.0518 2.4727 2.7707 3.4210 3.6896
28 0.2558 0.6834 1.3125 1.7011 2.0484 2.4671 2.7633 3.4082 3.6739
29 0.2557 0.6830 1.3114 1.6991 2.0452 2.4620 2.7564 3.3962 3.6594
30 0.2556 0.6828 1.3104 1.6973 2.0423 2.4573 2.7500 3.3852 3.6460
31 0.2555 0.6825 1.3095 1.6955 2.0395 2.4528 2.7440 3.3749 3.6335
32 0.2555 0.6822 1.3086 1.6939 2.0369 2.4487 2.7385 3.3653 3.6218
33 0.2554 0.6820 1.3077 1.6924 2.0345 2.4448 2.7333 3.3563 3.6109
34 0.2553 0.6818 1.3070 1.6909 2.0322 2.4411 2.7284 3.3479 3.6007
35 0.2553 0.6816 1.3062 1.6896 2.0301 2.4377 2.7238 3.3400 3.5911
36 0.2552 0.6814 1.3055 1.6883 2.0281 2.4345 2.7195 3.3326 3.5821
37 0.2552 0.6812 1.3049 1.6871 2.0262 2.4314 2.7154 3.3256 3.5737
38 0.2551 0.6810 1.3042 1.6860 2.0244 2.4286 2.7116 3.3190 3.5657
39 0.2551 0.6808 1.3036 1.6849 2.0227 2.4258 2.7079 3.3128 3.5581
40 0.2550 0.6807 1.3031 1.6839 2.0211 2.4233 2.7045 3.3069 3.5510
41 0.2550 0.6805 1.3025 1.6829 2.0195 2.4208 2.7012 3.3013 3.5442
42 0.2550 0.6804 1.3020 1.6820 2.0181 2.4185 2.6981 3.2960 3.5377
43 0.2549 0.6802 1.3016 1.6811 2.0167 2.4163 2.6951 3.2909 3.5316
44 0.2549 0.6801 1.3011 1.6802 2.0154 2.4141 2.6923 3.2861 3.5258
45 0.2549 0.6800 1.3006 1.6794 2.0141 2.4121 2.6896 3.2815 3.5203
46 0.2548 0.6799 1.3002 1.6787 2.0129 2.4102 2.6870 3.2771 3.5150
47 0.2548 0.6797 1.2998 1.6779 2.0117 2.4083 2.6846 3.2729 3.5099
48 0.2548 0.6796 1.2994 1.6772 2.0106 2.4066 2.6822 3.2689 3.5051
49 0.2547 0.6795 1.2991 1.6766 2.0096 2.4049 2.6800 3.2651 3.5004
50 0.2547 0.6794 1.2987 1.6759 2.0086 2.4033 2.6778 3.2614 3.4960
-72-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
95% confidence, α = 0.05 ⇒ α2 = 0.025. Then for n−1 = 79 degrees of freedom, tα/2 = 1.990
(area to the right of t = 1.990 with γ = 79 is 0.025) and
√
5.6
s80 10.01
x̄80 ± tα/2 √ = 6.3 ± 1.990 √ = 6.3 ± 0.7 = with 95% confidence
n 80 7.0
Tables/software often provide critical values (areas to the right) instead of CDF. Make sure
you understand what is available.
Figure 15: Tables can provide either critical values or cumulative distributions
-73-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
MATLAB
% Plot T - distribution as a function of degrees of freedom
% and compare to normal
t = -4:0.1:4;
subplot (1 ,2 ,1) ,
plot (t , tpdf (t ,1) , t , tpdf (t ,3) , t , tpdf (t ,6) ,...
t , tpdf (t ,15) , t , normpdf (t ,0 ,1) , ’o ’ );
legend ( ’\ gamma ␣ =1 ’ , ’\ gamma ␣ =3 ’ , ’\ gamma ␣ =6 ’ ,...
’\ gamma =15 ’ , ’ st . ␣ normal ’ );
title ( ’ Probability ␣ Distribution ’)
subplot (1 ,2 ,2) ,
plot (t , tcdf (t ,1) , t , tcdf (t ,3) , t , tcdf (t ,6) ,...
t , tcdf (t ,15) , t , normcdf (t ,0 ,1) , ’o ’ );
legend ( ’\ gamma ␣ =1 ’ , ’\ gamma ␣ =3 ’ , ’\ gamma ␣ =6 ’ ,...
’\ gamma =15 ’ , ’ st . ␣ normal ’ );
title ( ’ Cumulative ␣ Probability ␣ Distribution ’)
0.9
0.35 γ =1
γ =3 0.8
0.3
γ =6 0.7
0.25 γ=15
0.6
st. normal
0.2 0.5
0.4 γ =1
0.15
γ =3
0.3
0.1 γ =6
0.2 γ=15
0.05 st. normal
0.1
0 0
−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4
ÒP Hwnthi Calculate in Excel, MATLAB or Mathematica the CDF for the standard
normal at z = 2 and the student-t distribution at t = 2 and γ = 5, 10, 100, 500 and
examine how by increasing DOF values values converge.
-74-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Revisit Example 67 and Figure 12 but use only the first 8 points for a 95% confidence interval
on µ.
6 + 10 + 9 + 2 + 4 + 7 + 10 + 9
x̄8 = = 7.126, s28 = 8.7, s8 = 2.95
8
Table 5
• DOF=7, 95% confidence interval −−−−→ t = 2.3646
7.126 ±t0.025 2.95
√ → 7.126 ± 2.3646 ∗ 1.043 = 7.12 ± 2.47. Compare to Example 80.
8
Table 5
• DOF=7, 99% confidence interval −−−−→ t = 3.4995
7.126 ±t0.005 2.95
√ → 7.126 ± 3.4995 ∗ 1.043 = 7.12 ± 3.65
8
s2 = 101.40, x̄ = 53.92
• The t-distribution is applicable when X follows the normal distribution. How can we check?
• If we do not know whether X ∼ N (µ, σ 2 ) what can we do? Hint: Check Examples 68 and
69.
-75-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Based on CLT theorem, Theorem 40, P̂ = Xn will follow a normal distribution if n large
enough (recall Theorem 29). In practice, the approximation with a normal will be accurate
if np̂ ≥ 5. If p is the unknown proportion, µP̂ = p and σP̂2 = pq
n
, 1 − α confidence interval:
P̂ − p
P −zα/2 < q < zα/2 = 1 − α (29)
pq/n
Known p̂ = x
n
and q̂ = 1 − p̂. Approximate solution,∗ assume pq ≈ p̂q̂:
s s
p̂q̂ p̂q̂
p̂ − zα/2 < p < p̂ + zα/2 (30)
n n
Following Example 2 if 20 cartons are sampled and the outcome is as listed in Example 62,
then np̂ = 3. You increase the sample taken to 100 cartons and find 7 defective. Then
p̂ = 0.07 and a 95% confidence interval is
s s
0.07 ∗ 0.93 0.07 ∗ 0.93
0.07 − 1.96 < p < 0.07 + 1.96 ⇒ p = 0.07 ± 0.05
100 100
∗
More accurate expressions are provided in statistics textbooks and can be useful for small samples
-76-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
1
xγ/2−1 e−x/2
x>0
2γ/2 Γ(γ/2)
f (x) = µ = γ, σ 2 = 2γ
0 elsewhere
Figure 16: χ2 -distribution for various degrees of freedom. Note that χ2 ≥ 0 and no-symmetry
MATLAB
% Plot chi2 - distribution as a function of degrees of freedom
% and compare to normal ( why 15 and 6.325 for parameters ?)
t =0:0.5:40;
subplot (1 ,2 ,1) ,
plot (t , chi2pdf (t ,1) , t , chi2pdf (t ,5) , t , chi2pdf (t ,10) ,...
t , chi2pdf (t ,20) , t , normpdf (t ,20 ,6.325) , ’o ’ );
legend ( ’\ gamma ␣ =1 ’ , ’\ gamma ␣ =5 ’ , ’\ gamma ␣ =10 ’ ,...
’\ gamma =20 ’ , ’ normal ’ );
title ( ’ Probability ␣ Distribution ’)
subplot (1 ,2 ,2) ,
plot (t , chi2cdf (t ,1) , t , chi2cdf (t ,5) , t , chi2cdf (t ,10) ,...
t , chi2cdf (t ,20) , t , normcdf (t ,20 ,6.325) , ’o ’ );
legend ( ’\ gamma ␣ =1 ’ , ’\ gamma ␣ =3 ’ , ’\ gamma ␣ =6 ’ ,...
’\ gamma =15 ’ , ’ normal ’ );
title ( ’ Cumulative ␣ Probability ␣ Distribution ’)
-77-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
0.4 0.9
0.8
0.35 γ =1
γ =5 0.7
0.3
γ =10 γ =1
0.6
0.25 γ=20 γ =3
normal 0.5 γ =6
0.2
0.4 γ=15
0.15 normal
0.3
0.1
0.2
0.05 0.1
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
• Values for the cumulative distribution using Excel function: CHIDIST(x,deg f).
(Careful! CHIDIST calculates right tails area! For the inverse you can use CHIINV)
" #
2 (n−1)S 2 (n−1)S 2
i.e. 100(1 − α)% confidence bounds on σ are χ2α
, χ2 α
2 1− 2
α α 1 1
Note that 2
<1− 2
⇒ χ2α > χ21− α ⇒ χ2α
< χ21− α
!!!
2 2
2 2
-78-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
-79-
Degrees of P[ X x2]
freedom 0.005 0.010 0.025 0.050 0.075 0.100 0.250 0.500 0.750 0.900 0.925 0.950 0.975 0.990 0.995
1 0.0000 0.0002 0.0010 0.0039 0.0089 0.0158 0.1015 0.4549 1.3233 2.7055 3.1701 3.8415 5.0239 6.6349 7.8794
2 0.0100 0.0201 0.0506 0.1026 0.1559 0.2107 0.5754 1.3863 2.7726 4.6052 5.1805 5.9915 7.3778 9.2103 10.5966
3 0.0717 0.1148 0.2158 0.3518 0.4720 0.5844 1.2125 2.3660 4.1083 6.2514 6.9046 7.8147 9.3484 11.3449 12.8382
4 0.2070 0.2971 0.4844 0.7107 0.8969 1.0636 1.9226 3.3567 5.3853 7.7794 8.4963 9.4877 11.1433 13.2767 14.8603
5 0.4117 0.5543 0.8312 1.1455 1.3937 1.6103 2.6746 4.3515 6.6257 9.2364 10.0083 11.0705 12.8325 15.0863 16.7496
6 0.6757 0.8721 1.2373 1.6354 1.9415 2.2041 3.4546 5.3481 7.8408 10.6446 11.4659 12.5916 14.4494 16.8119 18.5476
7 0.9893 1.2390 1.6899 2.1673 2.5277 2.8331 4.2549 6.3458 9.0371 12.0170 12.8834 14.0671 16.0128 18.4753 20.2777
8 1.3444 1.6465 2.1797 2.7326 3.1440 3.4895 5.0706 7.3441 10.2189 13.3616 14.2697 15.5073 17.5345 20.0902 21.9550
9 1.7349 2.0879 2.7004 3.3251 3.7847 4.1682 5.8988 8.3428 11.3888 14.6837 15.6309 16.9190 19.0228 21.6660 23.5894
10 2.1559 2.5582 3.2470 3.9403 4.4459 4.8652 6.7372 9.3418 12.5489 15.9872 16.9714 18.3070 20.4832 23.2093 25.1882
11 2.6032 3.0535 3.8157 4.5748 5.1243 5.5778 7.5841 10.3410 13.7007 17.2750 18.2942 19.6751 21.9200 24.7250 26.7568
12 3.0738 3.5706 4.4038 5.2260 5.8175 6.3038 8.4384 11.3403 14.8454 18.5493 19.6020 21.0261 23.3367 26.2170 28.2995
13 3.5650 4.1069 5.0088 5.8919 6.5238 7.0415 9.2991 12.3398 15.9839 19.8119 20.8966 22.3620 24.7356 27.6882 29.8195
14 4.0747 4.6604 5.6287 6.5706 7.2415 7.7895 10.1653 13.3393 17.1169 21.0641 22.1795 23.6848 26.1189 29.1412 31.3193
CBE301 Lecture Notes - Part 2
15 4.6009 5.2293 6.2621 7.2609 7.9695 8.5468 11.0365 14.3389 18.2451 22.3071 23.4522 24.9958 27.4884 30.5779 32.8013
16 5.1422 5.8122 6.9077 7.9616 8.7067 9.3122 11.9122 15.3385 19.3689 23.5418 24.7155 26.2962 28.8454 31.9999 34.2672
17 5.6972 6.4078 7.5642 8.6718 9.4522 10.0852 12.7919 16.3382 20.4887 24.7690 25.9705 27.5871 30.1910 33.4087 35.7185
18 6.2648 7.0149 8.2307 9.3905 10.2053 10.8649 13.6753 17.3379 21.6049 25.9894 27.2178 28.8693 31.5264 34.8053 37.1565
19 6.8440 7.6327 8.9065 10.1170 10.9653 11.6509 14.5620 18.3377 22.7178 27.2036 28.4581 30.1435 32.8523 36.1909 38.5823
20 7.4338 8.2604 9.5908 10.8508 11.7317 12.4426 15.4518 19.3374 23.8277 28.4120 29.6920 31.4104 34.1696 37.5662 39.9968
21 8.0337 8.8972 10.2829 11.5913 12.5041 13.2396 16.3444 20.3372 24.9348 29.6151 30.9200 32.6706 35.4789 38.9322 41.4011
22 8.6427 9.5425 10.9823 12.3380 13.2819 14.0415 17.2396 21.3370 26.0393 30.8133 32.1424 33.9244 36.7807 40.2894 42.7957
23 9.2604 10.1957 11.6886 13.0905 14.0648 14.8480 18.1373 22.3369 27.1413 32.0069 33.3597 35.1725 38.0756 41.6384 44.1813
24 9.8862 10.8564 12.4012 13.8484 14.8525 15.6587 19.0373 23.3367 28.2412 33.1962 34.5723 36.4150 39.3641 42.9798 45.5585
-80-
25 10.5197 11.5240 13.1197 14.6114 15.6447 16.4734 19.9393 24.3366 29.3389 34.3816 35.7803 37.6525 40.6465 44.3141 46.9279
26 11.1602 12.1981 13.8439 15.3792 16.4410 17.2919 20.8434 25.3365 30.4346 35.5632 36.9841 38.8851 41.9232 45.6417 48.2899
27 11.8076 12.8785 14.5734 16.1514 17.2414 18.1139 21.7494 26.3363 31.5284 36.7412 38.1840 40.1133 43.1945 46.9629 49.6449
28 12.4613 13.5647 15.3079 16.9279 18.0454 18.9392 22.6572 27.3362 32.6205 37.9159 39.3801 41.3371 44.4608 48.2782 50.9934
29 13.1211 14.2565 16.0471 17.7084 18.8530 19.7677 23.5666 28.3361 33.7109 39.0875 40.5727 42.5570 45.7223 49.5879 52.3356
30 13.7867 14.9535 16.7908 18.4927 19.6639 20.5992 24.4776 29.3360 34.7997 40.2560 41.7619 43.7730 46.9792 50.8922 53.6720
31 14.4578 15.6555 17.5387 19.2806 20.4780 21.4336 25.3901 30.3359 35.8871 41.4217 42.9479 44.9853 48.2319 52.1914 55.0027
32 15.1340 16.3622 18.2908 20.0719 21.2951 22.2706 26.3041 31.3359 36.9730 42.5847 44.1309 46.1943 49.4804 53.4858 56.3281
33 15.8153 17.0735 19.0467 20.8665 22.1151 23.1102 27.2194 32.3358 38.0575 43.7452 45.3110 47.3999 50.7251 54.7755 57.6484
34 16.5013 17.7891 19.8063 21.6643 22.9379 23.9523 28.1361 33.3357 39.1408 44.9032 46.4884 48.6024 51.9660 56.0609 58.9639
35 17.1918 18.5089 20.5694 22.4650 23.7633 24.7967 29.0540 34.3356 40.2228 46.0588 47.6631 49.8018 53.2033 57.3421 60.2748
36 17.8867 19.2327 21.3359 23.2686 24.5911 25.6433 29.9730 35.3356 41.3036 47.2122 48.8353 50.9985 54.4373 58.6192 61.5812
37 18.5858 19.9602 22.1056 24.0749 25.4214 26.4921 30.8933 36.3355 42.3833 48.3634 50.0051 52.1923 55.6680 59.8925 62.8833
38 19.2889 20.6914 22.8785 24.8839 26.2540 27.3430 31.8146 37.3355 43.4619 49.5126 51.1726 53.3835 56.8955 61.1621 64.1814
39 19.9959 21.4262 23.6543 25.6954 27.0889 28.1958 32.7369 38.3354 44.5395 50.6598 52.3378 54.5722 58.1201 62.4281 65.4756
40 20.7065 22.1643 24.4330 26.5093 27.9258 29.0505 33.6603 39.3353 45.6160 51.8051 53.5010 55.7585 59.3417 63.6907 66.7660
41 21.4208 22.9056 25.2145 27.3256 28.7648 29.9071 34.5846 40.3353 46.6916 52.9485 54.6620 56.9424 60.5606 64.9501 68.0527
42 22.1385 23.6501 25.9987 28.1440 29.6058 30.7654 35.5099 41.3352 47.7663 54.0902 55.8211 58.1240 61.7768 66.2062 69.3360
43 22.8595 24.3976 26.7854 28.9647 30.4487 31.6255 36.4361 42.3352 48.8400 55.2302 56.9783 59.3035 62.9904 67.4593 70.6159
44 23.5837 25.1480 27.5746 29.7875 31.2934 32.4871 37.3631 43.3352 49.9129 56.3685 58.1336 60.4809 64.2015 68.7095 71.8926
45 24.3110 25.9013 28.3662 30.6123 32.1399 33.3504 38.2910 44.3351 50.9849 57.5053 59.2872 61.6562 65.4102 69.9568 73.1661
46 25.0413 26.6572 29.1601 31.4390 32.9882 34.2152 39.2197 45.3351 52.0562 58.6405 60.4390 62.8296 66.6165 71.2014 74.4365
47 25.7746 27.4158 29.9562 32.2676 33.8380 35.0814 40.1492 46.3350 53.1267 59.7743 61.5892 64.0011 67.8206 72.4433 75.7041
48 26.5106 28.1770 30.7545 33.0981 34.6895 35.9491 41.0794 47.3350 54.1964 60.9066 62.7378 65.1708 69.0226 73.6826 76.9688
49 27.2493 28.9406 31.5549 33.9303 35.5426 36.8182 42.0104 48.3350 55.2653 62.0375 63.8848 66.3386 70.2224 74.9195 78.2307
50 27.9907 29.7067 32.3574 34.7643 36.3971 37.6886 42.9421 49.3349 56.3336 63.1671 65.0303 67.5048 71.4202 76.1539 79.4900
(n − 1)s2 (24)(1.407)
2
= = 2.723
χ0.975 12.4
i.e. 0.857 ≤ σ 2 ≤ 2.723 and 0.926 ≤ σ ≤ 1.650 with 95% likelihood (confidence).
The average particle size in sample of Example 63 can be found as 137.1 nm. The sample
variance s2 = 215 and the sample standard deviation s = 14.7.
99% Confidence interval for the variance σ 2 in particle size:
α α
1 − α = 0.99 ⇒ α = 0.01, = 0.005, 1 − = 0.995, n − 1 = 299
2 2
299 ∗ 215 299 ∗ 215
χ20.005 = 365.7, χ20.995 = 239.8 ⇒ < σ2 < ⇒ 175.8 < σ 2 < 268.1
365.7 239.8
or 13.3 < σ < 16.4, where χ20.005 and χ20.995 found using software for 299 DOF.
-81-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
3. If that value is in a range that is “very” improbable, based on the known probability
distribution of the previous step 1∗ , then it is “very” unlikely that H0 is true. Conse-
quently, it is “very” likely that H1 is true. Otherwise, H1 cannot be asserted.
∗
The range of improbable values of the statistic is roughly 2 or 3 standard deviations of the statistic (at
∼5% or ∼0% improbability level) away from the average of the statistic, according to the Rule-of-thumb
after Example 34
-82-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
For any value of p < 0.1, P [X ≥ 4] is smaller (compare the CDFs). Therefore if
you reject H0 for p = 0.1 you can reject for p < 0.1.
Agree to reject H0 if observed x ≥ 4 (this a design parameter)
(b) Collect experimental data: x = 5, i.e. 5 defective items found in batch of 20.
(c) Was that likely? NO, based on step above. Therefore, H0 is rejected.
H1 : p > 0.1 is accepted with confidence ≥ 1 − 0.133 = 87% in the above design ⇔ x ≥ 4
(the confidence level is in practice your design parameter).
• What is the true value of p?
• How to increase confidence level?
-83-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
If ptrue ≡ P [defective] = 0.15, what is P [not asserting the claim p > 0.1 | ptrue = 0.15]? given
that you designed your test at 87% confidence level?
The power of the test (the ability to reject H0 ) if p = 0.15 is only 0.353!
-84-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Increase sample size! For example, five-fold increase to n = 100 . . ., cut-off value: 20 or more
defective out of 100.
2. Accepting or rejecting H0
Try:
1. Rejecting H0
Instead of designing the test with specific cut-off values and reject/accept based on sampling,
you always reject H0 and report the “significance” level. Large values of α correspond to
large probabilities of false alarm.
-85-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
When designing hypothesis tests, significance level α reflects probability (area) that you
reject H0 while true. For two-tailed tests this is two areas contrary to one-tail.
LEFT-TAILED TEST
-86-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
RIGHT-TAILED TEST
3. Assume H0 : µ ≤ µ0 is true
- Significance-testing?
-87-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
TWO-TAILED TEST
Claim: Average concentration of sulfates in water, µ, has shifted away from µ0 = 10.
Preliminary thinking: Consider a random sample of size n. Then, by Theorem 37, the
best estimate of µ is x̄ = x1 +...+x
n
n
.
How much far from µ0 should x̄ be to assert the claim µ 6= µ0 ?
Hypothesis-testing thinking:
3. Assume H0 : µ = µ0 is true
-88-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
We already examined how hypothesis and significance testing can be performed with the
Binomial distribution to assert a claim regarding the proportion p in Example 86.
Often, to reduce α and β large samples are drawn (Example 88). In this case, it is
reasonable to assume that the Binomial approximates a normal distribution with mean np0
√
and standard deviation np0 q0 based on the values of the null hypothesis H0 .
x − np0 p̂ − p0
p̂ = 0.15, z= √ =q = 1.67 ⇒ P [Z ≤ −1.67] = 0.0475
np0 q0 p0 q0 /n
ÒP Hwnthi What is the value of the corresponding statistic if the Binomial was used?
α α
s2 = 215, = 0.025, 1 − = 0.975 ⇒ χ2α = 348.8, χ21− α = 253
2 2 2 2
299 ∗ 215
χ2 = = 321.45
200
-89-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Sample 1
size n1
Sample 2
size n2
Population 2
µ1 − µ2 =??
Comparing means, variances or populations represents performing tests using a statistic that
is a linear combination of sampled statistics. Theorems on combinations of random variables
are the basis of such tests (i.e. Theorems 21, 24, 31 )
Plastic Copper
3.0 5.3 6.9 4.1 7.1 9.3 8.2
8.0 6.7 6.3 10.4 9.1 8.7
7.1 4.2 7.2 12.1 10.7 10.6
5.1 5.5 5.8 10.5 11.3 11.5
75.2 119.5
x̄(1) = 13
= 5.78 x̄(2) = 12
= 9.96
-90-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
(standard normal)
where n1 ,n2 are sizes of random samples drawn from each population.
Theorem 48: Theorem 48 when random variables are not normally distributed
Standard Normal Distribution
Given α we can now perform hypothesis testing or form confidence intervals for true
µ1 − µ2 based on our sampled X̄ (1) − X̄ (2) :
100(1 − α)% confidence bounds on µ1 − µ2 for large samples are:
s s
(1) (2) σ12 σ22 (1) (2) S12 S22
µ1 − µ2 = X̄ − X̄ ± zα/2 + ≈ X̄ − X̄ ± zα/2 +
n1 n2 n1 n2
Since proportions can be considered as averages of random variables being true or false, for
large samples a similar approach can be used for the differences of proportions given sample
estimates p̄1 and p̄2
s
p̄1 q̄1 p̄2 q̄1
p1 − p2 ≈ p̄1 − p¯2 ± zα/2 +
n1 n2
-91-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
• If support for σ12 = σ22 (check section 4.8.4) pooled estimate Sp2 (1/n1 + 1/n2 ) where
(n −1)S12 +(n2 −1)S22
Sp2 =
ˆ 1 n1 +n 2 −2
and n1 + n2 − 2 degrees of freedom (n1 = n2 ⇒ Sp2 = 0.5S12 + 0.5S22 )
Theorem 49: Statistic involving µ1 − µ2 that follows T-distribution when σ12 = σ22
(S 2 /n + S22 /n2 )2
1 1
• If support for σ12 6= σ22 , use S12 /n1 + S22 /n2 with integer part of (S12 /n1 )2 (S22 /n2 )2
as
n1 −1
+ n2 −1
degrees of freedom
-92-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
1
using the relation f1−α/2 (v1 , v2 ) = fα/2 (v2 ,v1 )
.
Careful when v1 6= v2 !
• Can we assume σ12 6= σ22 because we observe that s21 /s22 ≈ 1.64?
T ables,Sof tware
H0 : σ12 = σ22 = σ02 , α = 0.1 −−−−−−−−−→ f0.05 (9, 9) = 3.1789, f0.95 (9, 9) = 0.3146
0.00782σ02
Form statistic F = 0.00478σ02
= 1.64 not in critical area! Fail to reject H0
σ12 σ12
90% confidence interval: 1.64/3.1789 < σ22
< 1.64 ∗ 3.1789 ⇒ 0.52 < σ22
< 5.21
So we can not support that σ12 6= σ22 .
-93-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
In critical region therefore with 90% confidence we support that the thickness is dif-
ferent between the two samples.
When comparing population parameters from two samples, we assumed random and inde-
pendent samples. This is often not a valid assumption.
Do coatings present a difference in the extent of corrosion since x̄L − x̄B = −6.3?
-94-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Approach 2: We observed that the assumption of random and independent sampling is not
valid based on a simple plot as shown in the figure: the corrosion clearly depends on the soil
type which seems to be a more important factor than the coating! As a result, we do not
have 20 degrees of freedom in our sampling and Sp2 /n1 + Sp2 /n2 (estimate as 308*2/11 = 56)
is overestimating the variance of the mean of XL − XB , given that there is covariance (see
Theorems 31, 24)
Solution: we pair observations therefore we have 11 pairs and 10 DOF. For each pair we
calculate the difference d and perform hypothesis testing checking whether d is significantly
different than zero.
-95-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Figure 19: Curve of regression. Distribution of Y for a collection of xi (i.e. fY |x (y) ) is shown.
-96-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
where {x1 , . . . , xn } are known values and {Y1 , . . . , Yn } are random variables.
- {Y1 , . . . , Yn } take different values by chance every time they are measured.
Figure 20: Collection of data for regression. 4 measurements are collected at each xi .
-97-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
- Why is it called “Regression”? (Hint: Do a web search with keywords Francis Galton,
regression.)
- Who discovered regression? (Hint: Do a web search with keywords Gauss, Legendre, Ceres,
regression.)
Random Random
variable variable
z }| { Number z }| {
= β 0 + β 1 xi ⇒ Y |xi ⇒
z}|{
µY |xi = β0 + β1 xi + Ei
| {z } | {z } |{z}
q Yi E[Ei ]=0
E[Y |xi −Ei ] (Why?)
y i = β0 + β1 xi + εi (31)
|{z} |{z} |{z}
measured measured impossible
to measure
-98-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
yy 6
5
y3
y2
y4
y1
x x
x1 x2 x3 x4 x5 x6 x1 x2 x3 x4 x5 x6
N N
e2i = (yi − b0 − b1 xi )2
X X
min (SSE)=
ˆ min ˆ min (32)
b0 , b1 b0 , b1 b0 , b1
i=1 i=1
xi yi − ( xi )( yi )
P P P
n
β̂1 = b1 = , β̂0 = b0 = ȳ − b1 x̄ (33)
n x2i − ( xi )2
P P
-99-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Examine relationship between humidity (X) and extent of solvent evaporation (Y ) in water-
reducible paints during sprayout (data from Journal of Coating Technology, 65, 1983)
Pn Pn
n = 25, i=1 xi = 1314.90, i=1 yi = 235.70
Pn Pn
i=1 x2i = 76308.53, i=1 yi2 = 2286.07,
Pn
i=1 xi yi = 11824.44
ˆ = 13.64 − 0.08x
µ̂Y |x =ŷ
12
11
10
30 40 50 60 70
. Relative humidity, %
µ̂Y |x = β0 + β1 x (35)
-100-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Residual=Measurement−Model Fit
1
0.5
0
−0.5
−1
−1.5
5 10 15 20 25
measurement point, i
Drifting Patterning
Residual,
Residual,
Residual,
Outlier
Residual,
Measurement point, i
-101-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
β0 , β1 : Numbers
B0 =
ˆ β̂0 , B1 =
ˆ β̂1 : Estimators for β0 , β1
(b0 , b1 values of B0 , B1 )
! Pn !
σ2 x2i i=1
B1 ∼ N β1 , Pn and B0 ∼ N β0 , Pn σ2
i=1 (xi − x̄) n i=1 (xi − x̄)2
2
Pn(xi −x̄)
Pn
Proof: : Yi = β0 + β1 xi + |{z}
Ei ⇒ Yi ∼ N (β0 + β1 xi , σ 2 ) ⇒ B1 = i=1 (xj −x̄)2
Yi ⇒ . . .
j=1
∼N (0,σ 2 ) | {z }
Ci
SSE
S 2 = σ̂ 2 = (37)
n−2
-102-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Sxy
B1 =
Sxx
σ2
σB2 1 =
Sxx
P 2 2
xσ i
σB2 0 =
nSxx
-103-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
B1√−β1
Proof: Tn−2 = S/ Sxx
is T-distributed with n − 2 DOF ⇒ . . .
Sxx = 7150.05
Syy = 63.89
H0 : β1 = 0
Sxy = −572.44
H1 : β1 6= 0
SSE = Syy − b1 Sxy = 18.09 ⇒
SSE
S 2 = n−2 = 0.79
b1
Therefore observed value of T23 is t = √
s/ Sxx
= −7.62
-104-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
−β0
Proof: B
√0P 2
is T-distributed with n − 2 DOF ⇒ . . .
S x
√ i
nSxx
Example 105 Does the straight line in Example 100 cross (0,0)?
√ √
2.807 0.79 2.807 0.79
−0.08 − √ ≤ β1 ≤ −0.08 + √
P = 0.99
7150.05 } 7150.05
| {z | {z }
−0.109 −0.051
3. Confidence interval on µY |x :
µ̂ −µY |x
Proof: qY |x is T-distributed with n − 2 DOF ⇒ . . .
1 (x−x̄)2
S n
+ S
xx
-105-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
14 14
12 12
10 10
8 8
6
6
0 20 40 60 80 100 0 20 40 60 80 100
-106-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
18 0.2
17.5 0
17
16.5 −0.2
16 −0.4
15.5
1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2 4 6 8 10
Car weight, tons measurement point, i
-What is best guess for average mileage of cars weighing 1.7 tons?
Eq. 35 ⇒ µ̂Y |x=1.7 =ŷ
ˆ = 23.75 − (4.03)(1.7) = 16.9 miles per gallon.
-What is 90%-confidence interval for average mileage of cars weighing 1.7 tons?
Eq. 40 ⇒
√ q 2
µY |x=1.7 = 16.9 ± 1.86 0.126 101
+ (1.7−1.675)
0.581
= 16.9 ± 0.2 with 90% confidence.
20 20
18 18
16 16
14 14
12 12
0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3
Car weight, tons Car weight, tons
-107-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
5.1.3 Correlation
x y 600
25 30
500
40 80
120 150 400
75 80 sxy
300
Eq. 42 ⇒ ρ̂=r
y
1 1+R 1+ρ
ln( 1−R )− 12 ln( 1−ρ )
Proof: 2
q
1
is approximately standard normal ⇒ . . .
n−3
-108-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Example 109 Coefficient of determination for Example 100 and Example 107
18.06
For Example 100: R2 = 1 − 63.89
= 0.72
0.9993
For Example 107: R2 = 1 − 10.46
= 0.90
-Is model for Example 107 more linear than model for Example 100?
-109-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
y
x
x1
x2
x1
x2
In all linear regression problems the solution can be easily found using matrix algebra.
-110-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
-Random Sample:
Yi
z }| {
x1i , x2i , . . . , xpi , Y |x1i , . . . , xpi : i = 1, . . . , n
or
y = Xβ + e (45)
-111-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
D
Chapra & Canale, Numerical Methods for Engineers,
Q S
Mc-GrawHill; (5th Ed., Case Study 20.4, p. 551)
where:
Q: flowrate (ft3 /s) Experiment D, ft S, ft/ft Q, ft3 /s
1 1 0.001 1.4
S: slope (ft/ft) 2 2 0.001 8.3
D: diameter (ft) 3 3 0.001 24.2
4 1 0.01 4.7
a0 , a1 , a2 : coefficients to determine 5 2 0.01 28.9
6 3 0.01 84.0
7 1 0.05 11.1
Eq. 48 is not linear in the parameters! 8 2 0.05 69.0
Linearization trick: Eq. 48 ⇒ 9 3 0.05 200
x1 x2
z}|{ z}|{
log Q = log a0 + a1 log D + a2 log S ⇒ µY |x1 ,x2 = β0 + β1 φ1 (x1 , x2 ) + β2 φ2 (x1 , x2 )
| {z } | {z } |{z} | {z } |{z} | {z }
Y β0 β1 φ1 (x1 ,x2 ) β2 φ2 (x1 ,x2 )
log Q1 1 log D1 log S1
log a
log Q 1 log D log S 0
2 2 2
ˆ . − .
e=
.. ..
a1
.
. . . . .
a2
log Q9 1 log D9 log S9 | {z }
| {z } | {z } b
y X
X= y=
1 0. -3. 0.146128
1 0.30103 -3. 0.919078
1 0.477121 -3. 1.38382
1 0. -2. 0.672098
1 0.30103 -2. 1.4609
1 0.477121 -2. 1.92428
1 0. -1.30103 1.04532
1 0.30103 -1.30103 1.83885
1 0.477121 -1.30103 2.30103
-112-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
XT X= XT y= Eq. 47⇒ b=
9. 2.33445 -18.9031 11.6915 1.74797
2.33445 0.954791 -4.90315 3.94623 2.61584
-18.9031 -4.90315 44.078 - 22.2077 0.53678
150
100Q, ft3 s
50
30
2.5
2
1.5 D, ft
1
-113-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
num_points=9;
k=2;
D=[1; 2; 3; 1; 2; 3; 1; 2; 3 ];
S=[ 0.001; 0.001; 0.001; 0.01; 0.01; 0.01; 0.05; 0.05; 0.05 ];
Q=[ 1.4; 8.3; 24.2; 4.7; 28.9; 84.0 ; 11.1; 69.0; 200];
logQ=log10(Q);
X=zeros(num_points,k+1);
for i=1:1:num_points,
X(i,1)=1;
X(i,2)=log10(D(i));
X(i,3)=log10(S(i));
end
XTX=(X’*X);
INV_XTX=inv(XTX);
XTy=((X’)*logQ);
b=INV_XTX*XTy
Y=X*b;
lnr=logQ-Y; % Residuals
YYi= ( (logQ-mean(logQ)).*(logQ-mean(logQ)));
Syy=sum(YYi);
E2=lnr.*lnr;
SSE= sum(E2);
R2=1-(SSE/Syy) % Coefficient of determination
r=Q-10.ˆY; % Residuals of non-linear
−3
x 10
10 1.6
8 1.4
1.2
6
1
4
0.8
ri=∆ logQi
ri=∆ Qi
2
0.6
0
0.4
−2
0.2
−4 0
−6 −0.2
0 2 4 6 8 10 0 2 4 6 8 10
-114-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
-Random Sample:
{(x1 , Y |x1 ), . . . , (xn , Y |xn )}
⇒ Yi = β0 + β1 xi + . . . + βp xpi + E i i = 1, . . . , n
|{z}
independent ,∼N (0,σ 2 )
⇒ yi = β0 + β1 xi + . . . + βp xpi + ei i = 1, . . . , n
or
y = Xβ + e (49)
Eq. 51⇒
xpi
P
x2i · · ·
P P P
n xi b0 yi
p+1
P P
x P 2 P
i xi xi b1 xi yi
.. .. .. .. = ..
. . . . .
P p P p+1 P p+2 P 2p P p
xi xi xi ··· xi bp xi y i
| {z } | {z } | {z }
XT X b XT y
-Estimate:
ŷ = µ̂Y |x = b0 + b1 x + . . . + bp xp (52)
-115-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
x y
µY |x = β0 + β1 x + β2 x2
5 14.0 P
x2i b0
P P
5 12.5 P
n xi yi
P 2 P 3
10 7.0 b1 = xi yi
P
xi xi xi
P P 2
10 5.0 x2i
P 3 P 4
xi x i b2 xi y i
15 2.1 | {z }
XT X
15 1.8
20 6.2
10 150 2750 b0 81.3
150 2750 56, 250 b = 1228
20 4.9
1
25 13.2 2750 56, 250 1, 223, 750 b2 24555
25 14.6
⇒ b0 = 27.3, b1 = −3.313, b2 = 0.111
20
15
10
y
0
0 5 10 15 20 25 30
x
X=zeros(num_points,k+1);
for i=1:1:num_points,
for j=1:1:k+1,
X(i,j)=x(i)ˆ(j-1);
end
end
XTX=(X’*X)
INV_XTX=inv(XTX);
XTy=((X’)*y)
b=INV_XTX*XTy
-116-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
⇒ Yi = β0 + β1 x1i + . . . + βk xki + Ei
|{z}
∼N (0,σ 2 )
⇒ yi = β0 + β1 x1i + . . . + βk xki + ei
or
y = Xβ + e (53)
-Estimate:
ŷ = µ̂Y |x = b0 + b1 x1 + b2 x2 + . . . + bk xk (56)
-117-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
y 17.9 6.5 16.4 16.8 18.8 15.5 17.5 16.4 15.9 18.3
x1 1.35 1.90 1.70 1.80 1.30 2.05 1.60 1.80 1.85 1.40
x2 90 30 80 40 35 45 50 60 65 30
P P P
Pn P x1i
2
P x2i b0 P yi
x1i b1 = x1i yi ⇒
P P x1i Px2i2x1i P
x2i x1i x2i x2i b2 x2i yi
10 16.75 525 b0 170
16.75 28.6375 874.5 b1 = 282.405
525 874.5 31, 475 b2 8887.0
x2 100
60 80
40
20
18
y
16
1.25
1.5
1.75
x1 2
-118-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
(1) (1)
y (1)
1 x1 ··· xk b0
.. . .
..
e = . − . ⇒
.
y (n) 1
(n)
x1 ···
(n)
xk bk
| {z } | {z } | {z }
y X b
-Least-squares estimate of β:
β̂ = bopt = (XT X)−1 XT y
-(1-α)-Confidence Interval for β:
p p
β̂i − tα/2 S Cii ≤ βi ≤ β̂i + tα/2 S Cii
where,
α=1-Confidence level
tα/2 =appropriate point of T-distribution with n − (k + 1) degrees of freedom
s s P
SSE e(i)2 ky − Xβ̂k2
S=ˆ = =p
n − (k + 1) n − (k + 1) n − (k + 1)
h i
−1
(ith Diagonal element of (XT X)−1 )
Cii = XT X
ii
-119-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
(i) (i)
Car y (i) x1 x2
µY |x1 ,x2 = β0 + β1 x1 + β2 x2
Number (mpg) (Tons) (◦ F)
1 17.9 1.35 90 1 1.35 90
.. .. ..
2 16.5 1.90 30 X = . . .
3 16.4 1.70 80 1 1.40 30
4 16.8 1.80 40
10 16.75 525
5 18.8 1.30 35
XT X = 16.75 28.6375 874.5
6 15.5 2.05 45 525 874.5 31, 475
7 17.5 1.60 50
8 16.4 1.80 60 170
XT y = 282.405
9 15.9 1.85 65
8887
10 18.3 1.40 30
Therefore
√
β0 = 24.75 ± (2.365)(0.14) 6.07 = 24.75 ± 0.825
β1 = −4.1593 ±
β2 = −0.0149 ±
-120-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
format short e;
x1=[1.35; 1.90; 1.70; 1.80; 1.30; 2.05; 1.60; 1.80; 1.85; 1.40];
x2=[90; 30; 80; 40; 35; 45; 50; 60; 65; 30];
y=[17.9; 16.5; 16.4; 16.8; 18.8; 15.5; 17.5; 16.4; 15.9; 18.3];
X=zeros(num_points,k+1);
for i=1:1:num_points,
X(i,1)=1;
X(i,2)=x1(i);
X(i,3)=x2(i);
end
X
XTX=(X’*X)
XTy=((X’)*y)
INV_XTX=inv(XTX);
b=INV_XTX*XTy
Yn=X*b;
S=norm(y-Yn)/(sqrt(num_points-(k+1)))
CII = [ INV_XTX(1,1) ; INV_XTX(2,2) ; INV_XTX(3,3) ]
-121-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
(a) (b)
UCL UCL
Sample value of
Sample value of
quality indicator
quality indicator
CL CL
LCL LCL
Figure 21: Production process in statistical control (a) and out of statistical control (b).
-122-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
15
1.4
1.2
1 USL
ppm NaOH
0.8
0.6 UCL
0.4 CL
0.2 LCL
5 10 15 20 25
Time point of sampling, k
At time point 19, the system is out of statistical quality control, long before it is off-spec.
Better to act when the process is out of statistical control rather than wait until it is off-spec.
Monitoring Means: The lower control limit (LCL) and the upper control limit (UCL) represent the
minimum and maximum values that the sample mean X̄ can assume without raising an alarm:
LCL ≤ X̄ ≤ UCL
p = 0.0228 + 0 = 0.0228 with average number of samples to have a signal 1/p = 43.86. If samples taken
every hour → 43.86 hours.
-123-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
where R is sample range and d2 is taken from tables as a function of n.(for n = 4, d2 = 2.059 and for
n = 5, d2 = 2.326.
R̄ 0.029
µ̂0 = 0.498, σ̂ = , given d2 = 2.059 for n = 4 ⇒ σ̂ = = 0.014
d2 2.059
3σ̂
and 3-sigma bounds are µ̂0 ± √ n
: 0.498 ± 0.021 (are X̄j within limits? What if there is one out of the
limits?)
• R Chart (Range)
The theoretical bounds for the range are µR ± 3σR . We estimate µ̂R = R̄, and σ̂R = dd32R̄ where d2 as before
(mean chart) and d3 given as function of n (for n = 4, d3 = 0.880 and for n = 5, d3 = 0.864.
-124-
CBE301 Lecture Notes - Part 2 Manolis Doxastakis
Using data from Example 116: µ̂R = 0.029, d3 = 0.880, d2 = 2.059 and UCL, LCL control limits for range
chart are 0.029 ± 0.012. (what if LCL is negative?)
• C Chart (Average Number of Defects): monitors the average number of defects per the item produced.
-125-