Download as pdf or txt
Download as pdf or txt
You are on page 1of 82

CHAPTER 1

Philosophy of Experimentation
Every observational undertaking should have a definite objective. Whether we shop for a
new car or measure the gas constant to six significant digits, the experience can become
fruitless if not pointless if the observer has no idea about the potential outcome. Even new
discoveries will go unnoticed unless her eyes open to see it.

1.1. The Need for an Experimental Objective

Experimentation can represent a "Catch-22" proposition because:

• How do we measure what we set out to measure?

• How do we know our data does not reflect some experimental artifact?

To answer these questions, we must anticipate the physical phenomena's behavior–postulate


a model no matter how crude before we ever run experiments. Without such a belief, we
could never build the apparatus to measure the data let alone interpret the results.
Therefore, we must have some inkling of what will happen when we change controlled
variables. More simply stated: we need an objective!

1.2. Engineering Experimentation

We conduct engineering experiments for many reasons. Some of the more important
objectives include:

• prediction of how dependent variables respond to changes in independent variables


(modeling),
• verification of a theory (hypothesis testing), and

• screening of variables to identify which ones influence on the response (analysis of


variance and factorial design).
Some issues can cloud our experimental objectives. These difficulties include:

• randomness of the physical world (variance),

• confusion of causation with correlation, and


• complexity of effects (interactions and nonlinearity).
The following chemical engineering example demonstrates how engineering experimentation
evolves because our thirst for the truth grows.

1
Example 1.1. The Evolution of Knowledge
Let us suppose you work along side a polymer chemist on a product development team. Your
role concerns, however, the development of a process to produce thermoplastic bottles.
Your team suspects chain-branching (Figure 1.1) and possibly chain length (molecular weight)
represent two polymer properties that may affect the polymer's zero shear viscosity, and
temperature (Figure 1.2) and shear rate represent two process variables the polymer's
apparent viscosity (Figure 1.3).

Figure 1.1. Difference Between Linear and Branched Polymers (Flory, 1953)

Figure 1.2. Zero Shear Viscosity Dependence on Temperature (Bird et al, 1987)

In the polymer development stage, your team might want to develop analytical methods to
measure the extent of polymerization. You will need to develop an on-line analyzer when the
process goes in production. You chose to monitor the polymer's zero shear viscosity
because you intend to use its value to predict the polymer's molecular weight as a function
of residence time. Therefore, you develop a viscometer to measure zero shear viscosity at
say 160°C.

After reviewing your undergraduate polymer engineering text, you realize can model the
zero shear viscosity as a function of molecular weight (Figure 1.4) as follows:

η 0 = b1 ( MW )
b2
(1.1)

2
Figure 1.3. Apparent Viscosity as a Function of Shear Rate (Bird et al, 1987)

Figure 1.4. Zero Shear Viscosity as a Function of Molecular Weight (Ferry, 1980)

You also realize that you can use Eq. (1.1) to determine when you have a useful polymer
because the parameter b2 transition from a value of 1.0 to 3.4 as polymer chain
entanglements (Fig. 1.5) become important.

Figure 1.5. Polymer Chain Entanglements (Ferry, 1980)

With such knowledge, you can now design the reactor and develop the process conditions to
produce a useful polymer resin.

3
During the development of the polymerization kinetics, your development discovers that
your catalyst promotes branching at elevated reaction temperatures. This discovery causes
some concern within the team so you might ask the question, "Does branching significant
effect the zero shear viscosity?" To test this hypothesis, you ask the research chemist to
synthesize polymer of equal molecular and various degrees of branching.
As we develop the operating conditions for the polymerization reactor, we may to screen
process variables, such as temperature and pressure, to identify which if any influence the
molecular weight, molecular weight distribution, or degree of branching.
Once we have the reactor conditions specified to produce our specific polymer resin, we
need to develop the processing conditions. Since we will the processing shear rates well
beyond the zero shear plateau, we propose that the power law model (Astarita and
Marrucci, 1974) will describe the apparent viscosity over the range of shear rates (Figure
1.3) and processing temperature.
η = Kγ& n (1.2)

Therefore, we objective will need to perform requires estimate of flow parameter n at


170°C.

To better understand the how your resin processes in the molten state, you wish to
determine the power law model's range of applicability. Therefore, you wish to strain or
test model to shear rates exceeding 105 s-1 and melt temperatures above 225°C.

The view of above experimentation appears rosy until we consider complicating factors. The
biggest and "baddest" of them all—random error. The melt viscosity could dependence any
number of factors: temperature, pressure, polymer molecular weight, molecular weight
distribution, fillers, chemical additives, impurities, just to name a few. We can only control
or reduce these factors' influence to a certain level.

To further complicate the picture, we often confuse correlation with causation. Huh? We
might measure the polymer viscosity and density (Figure 1.6 as we decrease the melt
temperature and wrongly conclude, in this case, the viscosity depends on density. The real
situation: the density did change with temperature but the viscosity actually responds to
the temperature. We simply got confused.

Figure 1.6. Polymer Viscosity as a Function of Polymer Density

4
In our polymer viscosity studies, we notice that viscosity increases with molecular weight
and decreases with branching. However, we discover a more complex effect between them
so the two effects do not add linearly.

The next chapter involves the statistical design of experiments. This valuable engineering
tool allows us to efficiently meet our experimental objectives as well as mitigate any
experimental difficulty we may encounter in our experimental endeavors.

References
1. Flory, P.J., Principles of Polymer Chemistry, Cornell Univ. Press, London (1953).
2. Bird, R.B., R.C. Armstrong, and O. Hassager, Dynamics of Polymeric Liquids, Vol.1, Fluid
Mechanics, 2nd ed., Wiley, New York (1987).
3. Ferry, J.D., Viscoelastic Properties of Polymers, 3rd ed., Wiley, New York (1980).

4. Astarita, G., and G. Marrucci, Principles of Non-Newtonian Fluid Mechanics, McGraw-


Hill, London (1974).

5
CHAPTER 2

Experimental Design
Experimental design represents the planning that must proceed any trip into the laboratory,
pilot plant, or chemical plant. Those engineers who pursue careers in manufacturing will
discover experimental design forms the basis for statistical process control (SPC)—the key
process improvement tool.

2.1. Experimental Design Definitions

Before discussing the method of experimental design, we must introduce the vocabulary
most commonly used by statisticians who design experiments. Table 2.1 lists the terms that
we shall use extensively on our development of the design method along with a brief
definition.

Table 2.1. Experimental Design Definitions

Term Definition

Factor an independent variable that we can set and control


Level numerical value of factor

Treatment specific combination of levels, one for each factor tested

Response numerical result of an observation made at a particular treatment

Replicate repeated treatment used to estimate experimental error

To better understand each term in Table 2.1, we shall consider the following chemical
engineering example.

Example 2.1. Performance of a Polymer Extrusion Process


Suppose you work as a polymer engineer assigned to improve the extrusion characteristics
of polyethylene and suspect that the following three process variables affect the polymer's
surface appearance: melt temperature, screw speed, and filtration level. Identify the range
of the factors, potential responses, some possible treatments and replicated treatments.
Figure 2.1 shows the polymer extrusion process.
The melt temperature, screw speed and filtration represent the process variables or
factors. The process melt temperature might range from polyethylene's melting point
(~135°C) to a temperature where severe thermal degradation occurs, say 250°C. Thus, we

6
can easily quantify this factor, and 135°F represents a level of the factor temperature.
Likewise, we can identify the range of operable screw speeds from say 50 rpm up to a speed
where we would begin to starve the screw say 200 rpm. However, the filtration factor has
discrete three levels: none, some (stacked metal screens) and a lot (sintered metal). These
represent a more qualitative factor but we can assign a level of +1 to the sintered metal
filters, 0 for the stacked screens, and -1 for no filtration.
A possible treatment (process condition) you might test: the extruder operating at a melt
temperature of 150°C and 100 rpm using sintered metal filters (+1). The assembly of
treatments we select should result in a balanced design. We will discuss more about balance
later in this chapter.
The extrudate's surface appearance represents a qualitative response, which we could scale
from one (smooth) to five (extremely rough), represents the response. Thus, we see
qualitative responses arise any time we require human judgement. This occurs when we sell
when we products directly to consumers, such as textiles, food and beverages, or
automobiles.

We run replicates at selected treatments to estimate the process variability. Also our
choice of repeated treatments, however, should maintain overall design balance.

Figure 2.1. Polymer Extrusion Process

2.2. Experimental Design Method

Figure 2.2 presents a cartoon that outlines the experimental design process. We start with
an objective to describe the true state of nature —a clouded state of affairs, over some
limited range of conditions. This limited range represents our "design filter". The filter as
good as we make still allows "noise" or random error to creep into our data so we must
insure that we can estimate the quality or accuracy of our data.

Before we can effectively design our filter, we must propose a model based on existing data
or theoretical development. This allows us to formulate some idea what to look for. With
our filter designed and built, we sample nature and evaluate the proposed model. If the
model gives an adequate description of our limited view of nature, we have met our

7
objective. Otherwise, we need to modify our theory and propose new models; or redesign
our experimental design filter and gather more data. This iterative process continues until
we have met our stated objective.

Figure 2.2. Experimental Design Process

We see that efficient experimental design requires a hypothesis about how the data
behave; i.e., a model. The requirement frustrates the naive experimenter because often
they embark on an experimental program without any direction? Alice faced a similar
situation when she met up with Cheshire Cat (Carroll, 1965) and asked for directions.

"Cheshire Puss," she began, rather timidly, as she did not at all know whether it would like the
name: however, it only grinned a little wider.
"Come, it's pleased so far," thought Alice, and she went on. "Would you tell me, please, which
way I ought to go from here?"

"That depends a good deal on where you want to get to," said the Cat.

"I don't much care where--" said Alice.

"Then it doesn't matter which way you go," said the Cat.

"--so long as I get somewhere," Alice added as an explanation.

"Oh, you're sure to do that," said the Cat, "if you only walk long enough…"

The message here: "Have an objective." You not only get somewhere, you will get to the
place you want go!
An objective or model when we speak of Design Of Experiments (DOE) allows us, before we
run any experiments, to answer questions, such as:
• How many treatments should we run and where?

• How many treatments should we repeat and where?

8
We should, however, prepare to amend our model and design if the data do not behave as
initially suspected.

Example 2.2. Case of the Leaky Lighter

Lone Rock Distributors, a wholesaler of novelty items, experiences a seasonal problem at


their Gila Bend, Arizona warehouse. During the summer when outside temperatures soar to
120 °F, the number of defective cigarette lighters returned by local merchants increases by
50 percent.

Since most returned lighters contained no lighter fluid (n-butane), Lone Rock's CEO, Meesa
Hermit, suspects the warehouse temperature as the cause. The Gila Bend manager, Letz
Sweat, reports the warehouse's temperature during the summer varies between 100 and
200°F. The lighter's manufacturer I. M. Cheep, LLC told Mr. Hermit they guarantee the
lighters leak-proof to five atmospheres.

Figure 2.3. Lone Rock's Distribution Warehouse in Gila Bend, Arizona

In Example 2.2, the temperature where the vapor pressure of n-butane exceeds five
atmospheres represents our objective. We shall refer to this example throughout this book
as we develop the notions of experimental design in Chapter 2, experimental error in
Chapter 3, and model building in Chapter 7.

2.2.1. Linear Experimental Designs


Before we can design a suitable set of experiments that satisfies our objective for Example
2.2, we must propose a model. Since we know vapor pressure increases with temperature,
we could propose a simple linear model to describe this behavior
p sat = b0 + b1T (2.1)

We do not necessarily know how the vapor pressure increases with temperature just that it
increases. As Figure 2.4 shows, it could increase linearly, concave upwardly, or concave
downwardly. Whatever experiments we decide to run they must provide enough information
so we can determine if we have a statistically adequate model. A quick rule of thumb says
we should run one or two more levels (actually treatments for multi-factor models) than the

9
number of model parameters and between six to ten replicates to obtain an adequate
estimate of experimental error.

Figure 2.4. Potential Vapor Pressure Dependence on Temperature

If Eq. (2.1) represents the vapor pressure and we use our design heuristics, the
experimental design for this linear model suggests that besides the two extremes of 100
and 200°F we run in the center of the design temperature range or 150°F. This specific
midpoint gives the design as sense of balance, an idea we shall explore in the next
subsection.

2.2.2. Coding of Factors and a Sense of Balance

Statisticians have introduced the notion of coding to facilitate the design process. The
basic idea involves the development of a neutral or balanced design, i.e., if you added up the
coded levels of the factor in the design the sum would equal zero. Thus, for a two-level
design, we might have a code of -1 and +1, for three-level -1, 0, and +1, for four-level -3, -1,
+1, and +3, for five-level -2, -1, 0, +1, and +2.
If we code the temperature factor for a simple linear model using the following formula

Ti − (midpoint of range)
θi = (2.2)
0.5(magnitude of range)

we find the immediate point (θ 0 = 0) for a three-level design as follows:


T0 − ( 200 + 100 ) 2 T0 − 150
0= =
0.5 ( 200 − 100 ) 50

T0 = 150°F

If we consider our three-level design with the factors having values of -1, 0, and +1, we can
see from Figure 2.5 a balanced design has its weight (experiments) equally distributed
through the range of interest.

10
Figure 2.5. Difference Between Balanced and Unbalanced Design

2.2.3. A Word about Randomizing the Design


Whenever possible we should always randomized the design to eliminate any unwanted bias.
Often chemical processes undergo seasonal variations because of ambient temperature and
humidity changes. Therefore, the process engineer should judiciously select when to run
test experiments that account for these variations, i.e., do not compare the "control"
process data taken when the process experiences its natural down cycle to "test" data
taken when the process operates at its peak. The design must also have replicates to
provide an unbiased error estimate. We shall discuss how to use the error estimate in
subsequent chapters.

2.2.4. Sample Experimental Design for Example 2.2

In proposing Eq. (2.1), we used our intuition but ignored a valuable resource—past
experience. If we research the chemical engineering literature, we find numerous
candidate models for the vapor pressure of a pure component (Smith and Van Ness, 1987).
Table 2.2 lists three potential models.

Suppose we proposed a design based on the two-parameter model in Table 2.2. This model
represents a linear model in terms of inverse absolute temperature. Thus, our experimental
design must contain at least three levels of temperature, two for the parameter estimates
and one for a curvature check. But what levels should we run?

Table 2.2. Models that Describe Temperature Dependence of Pure Component Vapor
Pressure (Smith and Van Ness, 1987)

No. of model Equation


Vapor Pressure Model Equation Name
Parameters No.
b1
ln p sat = b0 + Clapeyron 2 (2.3)
T
b1
ln p sat = b0 + Antonine 3 (2.4)
T + b2

b1
ln p sat = b0 + + b3 lnT + b4T 6 Riedel 5 (2.5)
T + b2

Since we want to model the vapor pressure over the temperature range from 100°F to
200°F, these two extremes seem the most logical choice for two the levels. To select the
immediate levels, we must consider how the factor appears in the model. An intermediate

11
level equidistant from the extremes of our design range or T=150°F represented best level
to run for Eq. (2.1) because temperature T represented the factor. The factor in Eq. (2.3)
appears in this model as 1/T. Thus, when we use Eq. (2.2) to select the immediate level, a
temperature of 147.8°F yields the most balanced design.
Table 2.3 represents a possible design for our example. We chose to run four levels instead
of three and restricted ourselves to just eight experiments. Box, Hunter and Hunter
(1978) suggested the "twenty-five percent" design rule where the experimenter runs a
quarter of the total experiments first before deciding on the rest of the experiments.
This rule works better for multi-factor experiments than our single factor example but we
shall discuss our choice in this light. The first two runs (25%) explore the range of the
factor—temperature. With these runs we can determine if temperature affects vapor
pressure, but we can not assign any confidence to such claim. The next two runs check the
reproducibility of the previous experiments and allow us to place some credence to the
temperature effect. If we noticed a large inherent error from the first four runs, we
would suspend all further data gathering and work to reduce this error before proceeding
on with any new experiments.

The last four runs check for higher order functionality and offer two more measures of
pure error. Assuming we found the data linear with respect to Eq. (2.3), the extra two
levels we ran for possible higher order effects jump from a prediction function into error
estimation function; thus our design would eventually provide six measures of error instead
of just four.

Table 2.3. Experimental Design to Determine the Temperature Dependence of n-butane's


Vapor Pressure between 100 and 200°F

Run No. T (°F)

1 100
2 200
3 200
4 100
5 128
6 161
7 128
8 161

2.2.5. Nonlinear Experimental Designs

When we consider nonlinear models such as


y = b0 exp ( b1 x1 ) (2.6)

we can not follow the same design methodology as we use for linear models because the
parameters we want to estimate strongly influence where we should run the experiments.
Hence, we feel like a cat chasing its own tail.

12
Figure 2.6 shows how dramatically the shape of model generated by Eq. (2.6) changes when
hold b0 constant at 1.0 and vary b1 from -10 to -1000. Without some judicious selection of
an intermediate temperature, the design may completely miss the "action", i.e., where the
function changes the most. We must remember this design issue only gets more clouded if
we superimpose error on top of the model.

1
10
0.8

0.6
100
y
0.4
b 
y = b0 exp  1 
0.2  x1 
b1=1000 b0 = 1.0
0
0 0.002 0.004 0.006 0.008 0.01
1
x1

Figure 2.6. The Effect of a Parameter on a Nonlinear, Exponential Function

Box and Lucas (1959) developed a method to design experiments for nonlinear models. The
method works as follows:

Consider the nonlinear model

( )
y = f ξ 1 , ξ 2 ,K , ξ k ; β 1 , β 2 , K β p = f ( ξ ; β ) (2.7)

for which we will explore each ξ k over the range


ξ k ( min ) ≤ ξ k1 K ≤ ξ kn ≤ ξ k ( max ) (2.8)

Since the nonlinear design represents an iterative process, we first must guess all the
parameters β1, β2,…, βp to form β ∗ . If we had some preliminary data, we would regress the
data to obtain β ∗ .

Next, we must evaluate the partial differentials of the function given by Eq. (2.7) with
respect to β. If we could ran n observations for a one-factor model, we would obtain n
equations with p parameters or an n × p matrix

{ }
F * = fij∗ (2.9)

where we define fij∗

13
 ∂f ( ξ i ; β ) 
fij∗ =   = φ j ( ξi ) (2.10)
 ∂β j  β=β ∗

Now we must determine the design matrix ξ in the experimental range of interest that
minimizes the modulus of det F ∗
.

We would proceed by the following algorithm.

1. Guess the design matrix ξ OLD and calculate det F ∗ OLD


.

2. Perturb guess of design matrix ξ NEW and recalculate det F ∗ NEW


.

3. Compare det F ∗ NEW


with det F ∗ OLD
. Determine which design matrix gives larger

modulus and retain det F ∗ and design matrix as old.

4. Iterate on Steps 2 and 4 until we have maximized det F ∗ .

In this procedure, we would run a minimum of p treatments across the k variables or factors
in the model. We should, however, replicate several of the treatments because once we
design the experiment and then gather data we will estimate the parameters using nonlinear
regression. As we will find out in Chapter 5 regression minimizes the variance between the
data and model but we can only strictly perform such a minimization until we start to fit
random error. Therefore, we will need a good measure of this error.

Actually we might even perform the nonlinear design sequentially. This would involve
amending the above algorithm. We would start with a guess for β ∗ and determine the initial
design matrix ξ . However, as soon as we run one experiment we could evaluate one of the
parameters, say β1. That would lead to a new β ∗ and ξ , hence we build the model and
experimental design simultaneously. This, of course, does not come without a massive
computation cost so variations on this theme may prove more beneficial.

Example 2.3. Nonlinear Design for Vapor Pressure

This example will show how the nonlinear design method works. Suppose we wish to use the
following nonlinear equation to describe the vapor pressure of n-butane over a temperature
from 310.9 to 366.5 K. For a pure substance, we know β 1 > 0 and β 2 < 0.

β 
p sat = β 1 exp  2  (2.11)
 T 

Let us redefine the variables in Eq. (2.11) as follows:


f = p sat (2.12a)

1
ξ1 = (2.12b)
T
We have an experimental design range of
2.73 × 10 −3 = ξ 1 ( min ) ≤ ξ 11 ≤ ξ 12 ≤ K ≤ ξ 1n ≤ ξ 1 ( max ) = 3.22 × 10 −3

14
We can now form the functions φ j ( ξ 1 ) :

φ 1 ( ξ 1 ) = exp ( β 2ξ 1 ) (2.13a)

φ 2 ( ξ 1 ) = β 1ξ 1 exp ( β 2ξ 1 ) (2.13b)

Now we must determine the design matrix ξ = { ξ 1n } that maximizes det F ∗ . For the sake of
illustration, we decide only to run two experiments, ξ 11 and ξ 12 . Therefore, we have

det F ∗ = φ 1 ( ξ 11 ) φ 2 ( ξ 12 ) − φ 1 ( ξ 12 ) φ 2 ( ξ 11 )
= exp ( β 2ξ 11 ) β 1ξ 12 exp ( β 2ξ 12 ) − exp ( β 2ξ 12 ) β 1ξ 11 exp ( β 2ξ 11 ) (2.14)

= β 1 ( ξ 12 − ξ 11 ) exp { β 2 ( ξ 12 + ξ 11 )}
Since β 2 < 0 , we find that det F ∗ attains its maximum value when

ξ 11 = ξ 1 ( min ) 
 (2.15)
ξ 12 = ξ 1 ( min ) − 1 β 2* 

If we guess parameter values of β 1 = 2.5 × 10 6 and β 2 = −2.5 × 10 3 and apply the optimal
design conditions given by Eq. (2.15), we find

ξ 11 = 2.73 × 10 −3 K -1 or T = 366.5 K

and
ξ 11 = 2.73 × 10 −3 − 1 ( −2500 ) = 3.13 × 10 −3 K -1 or T = 319.6 K

From Eq. (2.15), we can see if -2.04 × 10 3 ≤ β 2 < 0 , the minimum temperature represents the
other level to run.

2.3. Multi-Factor Designs

The previous section considered just single-factor design; the rest of the chapter will
consider some useful two- and three-factor experimental designs for first- and second-
order models; i.e., linear and quadratic models.

2.3.1. Experimental Designs for First-Order Models


In our discussion on first-order models, we shall extend upon our knowledge of the simple
linear model.
y = b0 + b1 x1 (2.16)

From high school algebra, we know two points determine a straight line, i.e., the slope b1 and
the intercept b0. Therefore, we need two levels of x1 for the parameters, b0 and b1, and an
additional level of x1 to determine if we truly have a linear model.
If we now extend the first-order model to two dimensions, the complete, first-order model
contains four terms, the constant term, two main effects for the two factors x1 and x2, and
a first-order interaction between these two factors.

15
y = b0 + b1 x1 + b2 x2 + b12 x1 x2 (2.17)

Table 2.4 shows a possible design to test the complete, two-factor linear model presented
in Eq. (2.17). The design has four treatments that we coded (±1, ±1) and additional
treatments at a center point (0,0). Figure 2.7 graphically depicts the design shown in Table
2.4, where it shows the simplex portion of the design (the corners of the square) in dark
gray circles and the center point as a lighter gray circle.
Table 2.4. Coded Treatments for Two-Factor Linear Model

Treatment x1 x2
1 1 1
2 -1 1
3 1 -1
4 -1 -1
5 0 0
6 0 0
7 0 0

The simplex design allows us to evaluate the four parameters in Eq. (2.17) while the center
point (0,0) allows us to check the adequacy of Eq. (2.17). The adequacy check requires
replicates (a measure of pure error) so we decided to replicate the center point twice.

The projections of the design onto the x1-axis and x2-axis represent a quick and dirty way
to check the design's balance−a characteristic for any good design. Figure 2.8 shows those
projections. From Figure 2.8, we can see the design balances (or center of gravity if you
like that analogy) on the center point, i.e., no region has any special emphasis associated
with it.

To avoid any unwanted bias, we randomly select the order in which we ran the treatments
given in Table 2.4. The numbers on the circles in Figure 2.7 show this randomization.

Figure 2.7. 22 Factorial Design (Simplex) with a Center Point

16
Figure 2.8. Projection of the Design Shown in Figure 2.7 onto the x1- and x2-axes
Table 2.5 represents a workable design to test adequacy of the complete, three-factor
linear model

y = b0 + b1 x1 + b2 x2 + b3 x3 + b12 x1 x2 + b13 x1 x3 + b23 x2 x3 + b123 x1 x2 x3 (2.18)

The design table, which represents an extension of the design given in Table 2.4 to a third
dimension, has eight treatments to estimate the eight parameters in Eq. (2.18) and
additional treatments at the center point (0,0,0) to test the model's adequacy. Two
replicates at the center point provide two degrees of freedom for the pure error estimate.

Table 2.5. Coded Treatments for Two-Factor Linear Model

Treatment x1 x2 x3
1 1 1 1
2 1 1 -1
3 1 -1 1
4 1 -1 -1
5 -1 1 1
6 -1 1 -1
7 -1 -1 1
8 -1 -1 -1
9 0 0 0
10 0 0 0
11 0 0 0

Figure 2.9 graphically presents the Table 2.5's design. When project the cube of Figure 2.9
onto the x1-x2, x1-x3, and x2-x3 planes, we obtain a result similar to Figure 2.7 except the
corners of the resultant square have two treatments. We can conclude, therefore, that we
have a design balanced on the center of the cube. In Fig. 2.9, we have also included the
order in which we would run the design. Once again, we randomized the order to avoid
unwanted bias.

If we extend the designs given in Tables 2.4 and 2.5 to even higher dimensions, we quickly
realize the number experiments we would double with each factor we add. This leads to a
costly amount of experimentation. To mitigate the situation statisticians introduced the
concept of aliasing or confounding effects. Without going into a detailed discussion, all of
the previous designs represent orthogonal designs, i.e., independent. Therefore, if we wish
to reduce the number of experiments, then we must run the right experiments, the ones
that maintain orthogonality.

17
Figure 2.9. 23 Factorial Design (Simplex) with a Center Point

So what insures orthogonality? The answer−confounding main effects with higher order
interactions. For the three-factor case, we start by aliasing the constant term with the
three-factor interaction. In terms of the statistician's notation, we shall define this
confounding pattern as I=±123. The "I" represents the constant term b0 and 123. The
sign accounts for the positive or negative half replicate. We can construct these two half
replicates by considering treatments 1-8 in Table 2.5 and crossing out the x3-column. What
do we find? Two halves of the design that looks like a 22 factorial design, i.e., (±1, ±1). If
we now add the signs in each row, four rows yield a "plus" sign, hence the positive half
replicate. The other four rows will yield a "negative" sign or the negative half replicate.
Tables 2.6a and 2.6b show the two half-replicate designs, graphically depicted in Figure 2.7.

If we multiply both sides of the defining relationship I=123 by 1, 2 and 3 and note that
11=22=33=I, we find that

1=23 (2.19a)

2=13 (2.19b)

3=12 (2.19c)

Table 2.6a. Positive Half Replicate of 23 Factorial Design where I=123

Treatment x1 x2 x3
1 1 1 1
2 1 -1 -1
3 -1 1 -1
4 -1 -1 1

Equations (2.19a)-(2.19c) tell us the two-factor interactions have become aliased with the
main effects. What does all this mean? Suppose we used ran the design shown in Table
2.6A and fit the data to the equation

y = b0 + b1 x1 + b2 x2 + b3 x3 (2.20)

18
or applied an ANalysis Of VAriance (ANOVA) to determine which factors (main effects)
significantly describe the data and found a significant effect of x2. We assume only the
main effect accounts for the data's description (sum of squares) but we could not
conclusively make that claim because the x1x3 interaction might actually account for some or
all of the description. When we confound (do fewer experiments) or smear together the
contribution of these two terms, we lose the ability to statistically assign accountability to
one or the another.

Table 2.6B. Negative Half Replicate of 23 Factorial Design where I=-123

Treatment x1 x2 x3
1 1 1 -1
2 1 -1 1
3 -1 1 1
4 -1 -1 -1

Figure 2.10. The Positive and Negative 23-1 Designs Represented Graphically

2.3.2. Experimental Designs for Second-Order Models

As we move from the first-order to second-order models, the numbers of experiments we


need to run escalates even more dramatically. We can see the complete, second-order
model for two factors contains nine parameters.

y = b0 + b1 x1 + b2 x2 + b12 x1 x2 + b11 x12 + b22 x22 + b112 x12 x2 + b122 x1 x22 + b1122 x12 x22 (2.21)

If we consider the last three terms in Eq. (2.21) as third degree or higher, then we can
invoke the second-order approximation or quadratic model to reduce Eq. (2.21)

y = b0 + b1 x1 + b2 x2 + b12 x1 x2 + b11 x12 + b22 x22 (2.22)

19
The simplex design with a center point (treatments 1-7 in Table 2.7) would determine the
adequacy of the model represented by Eq. (2.17). If we determined the model inadequate,
then we could add a star design (treatments 8-11 in Table 2.7) over the simplex design. In
Figure 2.11, we graphically represent Table 2.7's graph. As always we would randomize the
order in which we ran treatments 1-7 and 8-11.

Table 2.7. Box-Behnken Design for Two-Factor Quadratic Model

Treatment x1 x2
1 1 1
2 1 -1
3 -1 1
4 -1 -1
5 0 0
6 0 0
7 0 0
8 a 0
9 -a 0
10 0 a
11 0 -a

We can see from Figure 2.11 that we actually have a 3 × 3 design rotated by 45° if we let
a=3/2. In addition, the design has five levels of x1 and x2, which allows us to potentially
explore a cubic model, and balances on the center point (0,0).

Figure 2.11. Two-factor Box-Behnken Design Graphically Represented

If we extend the second-order approximation to three dimensions, two useful designs


emerge.
y = b0 + b1 x1 + b2 x2 + b2 x2 + b12 x1 x2 + b13 x1 x3 + b23 x2 x3 + b11 x12 + b22 x22 + b33 x32 (2.23)

20
Table 2.8 shows three-factor Box-Behnken design analogous design to two-factor design
presented in Table 2.7.
We would most likely run the design in a fashion similar to what we discussed about the two-
factor; i.e., run the simplex design with the center point, which we depict as the gray
spheres in Figure 2.12. This methodology allows to first check for the significance of each
factor. Obviously, if a factor does not display a main effect (linear), then it can not display
a quadratic effect so we would have to explore that factor any further. The design would,
therefore, collapse into a design such as one shown in Figure 2.11.
Table 2.8. Coded Treatments for Three-Factor Quadratic Model

Treatment x1 x2 x3
1 1 1 1
2 1 1 -1
3 1 -1 1
4 1 -1 -1
5 -1 1 1
6 -1 1 -1
7 -1 -1 1
8 -1 -1 -1
9 a 0 0
10 -a 0 0
11 a a 0
12 -a -a 0
13 a 0 a
14 -a 0 -a
15 0 0 0
16 0 0 0
17 0 0 0

If we do determine that all three factors significantly describe the data, then we would run
the treatments displayed as black spheres in Figure 2.12. These spheres that form a star
protrude outward from the cube's six planes a distance of ±a from the center of the design.
Again, we randomize the order of the runs.
Normally, we will find that most designs for quadratic models will require five levels of each
factor as seen in Tables 2.7 and 2.8. However, statisticians have developed orthogonal
designs that only require three levels. Table 2.9 gives one such design for a three-factor
quadratic model. Figure 2.10 illustrates the three-dimensional design. By examining Figure
2.13, we discover that the positive and negative planes of the cube have a rotated simplex
design while the three planes that divide the center of the cube result in a normal simplex
design with a center point.

21
Figure 2.12. Box-Behnken Design for Three-Factor, Quadratic Model

Table 2.9. Coded Treatments for a Three-Factor, Quadratic Model Using Only Three Levels

Treatment x1 x2 x3
1 0 1 1
2 0 1 -1
3 0 -1 1
4 0 -1 -1
5 1 0 1
6 1 0 -1
7 -1 0 1
8 -1 0 -1
9 1 -1 0
10 -1 -1 0
11 1 -1 0
12 -1 -1 0
13 0 0 0
14 0 0 0
15 0 0 0

We refer to Box, Hunter, and Hunter (1978) or Box and Draper (1987) for designing
experiments with factors greater than k=3. These references offer a host on fractional
factorial design tables with worked examples.

22
Figure 2.13. Three-Factor Design for a Quadratic Model Using Only Three Levels

Appendix 2.A. Useful Coding Formulas

The formulas below should help the experimenter select intermediate design levels. For
two-level (Xi = -1 and +1) and three-level designs (Xi = -1, 0 +1), we have
xi − (midpoint of factor range)
Xi = (2.A.1)
0.5(magnitude of range)

For four-level (Xi = -3, -1, +1, and +3), we have


xi − (midpoint of factor range)
Xi = (2.A.2)
0.166(magnitude of factor range)

For five-level (Xi = -2, -1, 0, +1, and +2), we have


xi − (midpoint of factor range)
Xi = (2.A.3)
0.25(magnitude of factor range)

Example 2.A.1.

Suppose in Example 2.1 we want to run the extruders at four screw speeds. What levels
should we choose? Recall the screw speeds range from 50 to 200 rpm. Obviously, the
coded values of ±3 represent the extremes, 50 and 200 rpm so we need only to calculate
from Eq. (2A.2) the levels for ±1.
Using Eq. (2A.2), we find for Xi = ±1
xi − (200 + 50)/2 x i − 125
±1 = = = 125 ± 25 rpm
0.166(200 − 50) 25

Therefore, we would speeds of 50, 100, 150, and 200 rpm.

References

1. Carroll, L., Alice's Adventures in Wonderland, Random House, New York (1965).

23
2. Box, G.E.P, and H.L. Lucas, "Design of Experiments in Non-Linear Situations," Biometrika,
46, 77 (1959).

3. Smith, J.M. and H.C. Van Ness, Introduction to Chemical Engineering Thermodynamics,
4th. ed., McGraw-Hill, New York (1987).

4. Box, G.E.P, W.G. Hunter, and J.S. Hunter, Statistics for Experimenters: An Introduction
to Design, Data Analysis and Model Building, Wiley, New York (1978).

5. Box, G.E.P, and N.R. Draper, Empirical Model-Building and Response Surfaces, Wiley,
New York (1987).

24
CHAPTER 3

Experimental Error
When we development the method of experimental design in Chapter 2, we introduced the
concept of replication. We stated that we run replicates determine the reproducibility of
our measurement, i.e., a measure of the data's randomness or experimental error.
We often can quantify some contributors to experimental error, such as measurement,
analysis and sampling. However, when we refer to "error" we do not mean experimenter's
mistakes like using the wrong reagent or procedure. We can always discard those data. We
use error in the strictly denote the inherent randomness of the physical world.

Example 3.1. O'Where, O'Where, Can She Be?

Let us suppose we ask at a group of eager students at the first meeting of a Monday
afternoon section of Unit Operations Laboratory where they might find a particular female
student on the first Monday in September. This student just recently transferred from
out-of-state college so no one in the class knows much about this student.
Since a recent student poll listed chemical engineering laboratory as "the best class on
campus," the class would emphatically respond, "in class!" This might represent a very good
guess except without multiple observations of the student's attendance such a prediction
seems risky. We have no idea if the student will develop medical problems or lose interest
in the course and haphazardly attend class. Therefore, the prediction would require not
only a measure of where the student will be on the average (location) but also a measure of
how much the student on the average deviates from this average location (standard
deviation).
It just so happens that Labor Day falls on this date so the laboratory does not even meet.
The moral: statistics will not yield the correct conclusion if you have a poorly design
experiment.

In this chapter, we shall introduce two methods used to estimate experimental error,
uncertainty analysis or propagation of error and replicate analysis. An uncertainty analysis
represents a pre-experiment estimate while replication analysis represents a post-
experiment estimate. Because replication analysis deals with hard evidence (physical data),
it represents the preferred method. However, we will see that uncertainty analysis has
some major benefits.

25
3.1. Uncertainty Analysis
We often calculate a measured response R from a set of intermediate measurements xi
through a functional relationship
R = R(x1 , x2 ,K, xn ) (3.1)

In an uncertainty analysis, we want to determine how errors in the intermediate


measurements propagate to the error in the response (Kline and McClintock, 1953). We
perform this pre-experiment error analysis at a given treatment. We may actually find the
error structure varies greatly with a given treatment. For example, a ruler with 0.1 mm
gradations would work perfectly well if we had to measure the thickness of a ream of paper
(500 sheets) but very poorly for a single sheet. Therefore, we must amend our experimental
measurement technique—switch to say a micrometer, for the thickness measurement.

 ∂R 
 
 ∂x  x = x1

Figure 3.1. Graphical Representation of How a Measurement Error Propagates into Error in
the Response

For small deviations in the intermediate values, Eq. (3.1) can be expressed as a linear
expansion about the treatment (x1, x2, …)
∂R ∂R ∂R
δR = δ x1 + δ x2 + K δ xm (3.2)
∂x1 ∂x2 ∂x m

The measurement xi deviates from its true value at any instant

δ x1 = x1' − x1 (3.3)

which results in an instantaneous deviation in the response

δ R = R' − R (3.4)

We are, however, more interested in an average uncertainty in the response rather than an
instantaneous one. Therefore, we shall introduce the statistical quantity, standard
deviation

σ R = V(R) (3.5)

26
where V(R) = E  ( R'− R )  . The expectation operator E represents a linear operator that
2

 
performs a long-term averaging and possesses the following properties:

E  f(x) + g(x)  = E  f(x)  + E  g(x)  (3.6)

and

E  α f(x)  = α E  f(x)  (3.7)

We obtain the variance in the response by squaring Eq. (3.2) and then taking the expected
value of the result
2
  ∂R ∂R ∂R  
V(R) = E   δ x1 + δ x2 + K δ xm   (3.8)
  ∂x1 ∂x2 ∂x m  

The partial derivatives in Eq. (3.8) reduce to constants when we evaluate them at a
particular treatment. The scalar multiplication property given by Eq. (3.7) allows us to bring
the partials outside the expectation operator. The cross-product terms, E[dxidxj]
represent the covariance CoV(xi,xj). The covariance CoV(xi,xj) relates how much the
measurement xi depends of the measurement of xj. If we assume the intermediate
measurements xi's as independent, then the covariance terms equal zero and Eq. (3.8)
reduces to
2 2 2
 ∂R   ∂R   ∂R 
V(R) =   V ( x1 ) +   V ( x2 ) + K   V ( xm ) (3.9)
 ∂x1   ∂x2   ∂x m 

Appendix 3.A shows the derivation of the variance equation, Eq. (3.9).

We use Eq. (3.9) to determine how uncertainties in the xi's propagate to the uncertainty in
the response. We assume we know the variances V(xi)'s, which we determine by
"guestimation". In this process, the experimenter must rely on her past experience and
judgement. Example 3.2 shows how we perform an uncertainty analysis.

Example 3.2. Propagation of Error


Suppose the vapor pressure of n-butane between 100 and 200°F is given by:

 2700 
p sat = 2.04 × 10 6 exp  −  (3.10)
 T 

where psat has units of kPa and T has units of K. How accurately would we have to control
the temperature if we wanted the vapor pressure at 100 °F (311 K) known to within ±1 %?
We must first calculate the uncertainty in the vapor pressure

  2700  
σ p sat = 0.01  2.04 × 10 6 exp  −   = 3.46 kPa
  311  

Applying Eq. (3.9) to Eq. (3.10), we find the variance in the vapor pressure is
2
 ∂p sat 
V ( p sat ) =  V(T) (3.11)
 ∂T 

27
Taking the partial derivative of Eq. (3.10), we find
∂p sat  2700  2700 
= ( 2.04 × 10 6 )  − exp  −  (3.12)
∂T  T 2
 T 

For T = 311 K, we have


∂p sat  (2700)  2700 
= ( 2.04 × 10 6 )  − exp  −   = −9.66 kPa/K
∂T  (311)
2
 311 

If we rearrange Eq.(3.11) and insert the numerical values, we find the following temperature
variance will meet our experimental objective

( 3.46 )
2

V(T) = = 0.13 K 2
( −9.66 )
2

Therefore, we must hold the temperature between 99.6 and 100.4 °F to measure the vapor
pressure at the desired accuracy.

3.2. Utility of the Uncertainty Analysis

Although propagation of error does not represent the preferred method for estimating
error, it does possess some utility. The experimenter can make experimental error
estimate before the start of any experimental work if she assumes the uncertainty in each
measurement. By this method, she can identify all known major sources of error by
comparing the magnitude of each term in Eq. (3.9), thus she finds out which measurements
to improve to reduce the overall error. She can also compare the error propagation results
to the error estimated from replicate data to determine if any unidentified sources
contribute to the error. Lastly, this method represents a way to obtain an error estimate in
the absence of replicates.

However, the experimenter should exercise extreme caution before placing a great deal of
confidence in such a calculation. The total calculated error tends to represent an
underestimated error because the experimenter underestimates the uncertainties and
often omits critical sources of error. For example, a stopwatch may register to the nearest
0.01 s but a human's reaction time actually dictates the time measurement's overall
accuracy.
3.3. Replication Analysis

The prediction of the response's average location for a fixed set of conditions represents
the objective in any experimental endeavor. To do this, we sample the population "n times"
and then average these observations to obtain an estimate of the population's true mean.
n
y = ∑ yj n (3.13)
j =1

If we took an infinitely large sample, then the sample average would become the true mean
or E [ y ] = η .

As well as using data to predict an average location, we also use data to determine the
reproducibility of the measurement, i.e., a measure of the data's randomness or pure error.

28
We estimate this error by determining how individual observations yj deviate from the
sample population estimated location y .
n

( )
2
s2 = ∑ yj − y n −1 (3.14)
j =1

Equation (3.14) represents an estimate of the population's variance E(s2) = σ2. In this
equation, we average all "n" deviations squared except and divide by n -1 not n. Why?
Because we used one available degree of freedom to estimate the sample average's location.
If we knew the true mean, the divisor would become n.

Example 3.3. Pure Error Estimate


Estimate the amount of error in the following vapor pressure data taken at 100°F:

Run No. psat (kPa)

1 350.0
2 357.8

3 362.9

4 369.3

Since we do not know the true mean, we first must calculate the average at 100°F.
p sat = ( 350.0 + 357.8 + 362.9 + 369.3 ) 4 = 360.0 kPa

Next we calculate the sample variance using Eq. (2) as follows:

s100
2
°F = ( ( 350.0 − 360.0 ) 2
+ ( 357.8 − 360.0 ) + ( 362.9 − 360.0 ) + ( 369.3 − 360.0 )
2 2 2
)3
= 66.58 ( kPa )
2

Therefore, the estimated sample standard deviation equals 8.16 kPa.

Since the prediction of a response over a range of controllable factors (model) represents a
major goal of engineering experimentation, we must not only take the data over this range
but also replicate selected treatments, ones that allow balance. The replicated treatments
provide us much needed information about the error structure. We obtain an independent
error estimate at a repeated level by calculating the average yi at the i-th level and
applying Eq. (3.14) to calculate the sample variance.
n

( )
2
si2 = ∑ yij − yi νi (3.15)
j =1

The quantity νi represents the number of replicates at the i-th level.


In the model building process that we will discuss in subsequent chapters, we normally
assume a constant error structure or variance over the entire range of data. If this
assumption holds true, we can pool these independent estimates of the sample variance at

29
each level to obtain a better estimate, which allows us to use all the error information. We
pool by degrees of freedom

∑ν s 2
i i
s 2
= i
(3.16)
∑ν
p
i
i

When we pool by degrees of freedom, we emphasize the levels where we have better
location and error estimates.
In Chapter 5, we will see how we use the pure error estimate to determine if an assumed
effect (term in the model) significantly describes the data or merely represents another
independent estimate of error. Once we prove that we have statistically adequate model,
we will make one last refinement to the error estimate as outlined in Chapter 6.
Appendix 3.A. Derivation of Variance Equation

If we expand Eq. (3.8) termwise, we find


2
  ∂R   ∂R   ∂R 
V(R) = E    δ x1 δ x1 +    δ x1 δ x2 + K
  ∂x1   ∂x1   ∂x2 
2
 ∂R   ∂R   ∂R  
+   δ x m −1 δ x m +   δ xmδ xm 
 ∂x m −1   ∂x m   ∂x m  
(3.A.1)

In Eq. (3.A.1), we evaluate each partial derivative at the particular treatment, which results
in a scalar quantity

∂R
= αi (3.A.2)
∂xi x1 ,x2 ,K,xm

Using the scalar multiplication property, Eq. (3.7), we can pull all the partials outside the
expectation operator
2
 ∂R   ∂R   ∂R 
V(R) =   E [ δ x1 δ x1 ] +    E [ δ x1 δ x2 ] + K
∂x
 1   ∂x1   ∂x2 
2
(3.A.3)
 ∂R   ∂R   ∂R 
+   E  δ x m −1 δ xm  +   E [ δ xmδ xm ]
 ∂x m −1   ∂x m   ∂x m 

If we invoke the statistical definitions of the variance and covariance, Eq. (3.A.3) reduces
to
2
 ∂R   ∂R   ∂R 
V(R) =   V ( x1 ) +    CoV ( x1 , x2 ) + K
 ∂x1   ∂x1   ∂x2 
2
(3.A.4)
 ∂R   ∂R   ∂R 
+   CoV ( x m −1 ,x m ) +   V ( xm )
∂x ∂x
 m −1   m   ∂x m 

(
Since we assume independence for all measurements, then CoV x i ,x j = 0 and Eq. (3.A.4) )
reduces to Eq. (3.9).

30
References
1. Kline, S. J., and F. A. McClintock, "Describing Uncertainties in Single-Sample
Experiments," Mech. Eng., 75, No.1, 3, January 1953.

31
CHAPTER 4

Modeling
In engineering science, we concern ourselves with the prediction how measurable quantities
respond to changes in controlled factors. The prediction comes in the form of a
mathematical model. Therefore, we strive for the "true form" of the model.
y = f( x1 ) + ε ( x2 ) (4.1)

Equation 4.1 represents a "statistically adequate" model, one that transforms the data into
"white noise". In such a model we have a set of known or controlled variables x1 and a set of
unknown and hence uncontrolled variables x2. The latter variables account for the error or
white noise. We assume this noise does not relate to any known variable and possesses an
individual and independent distribution of constant variance with a zero mean or IIND
(0,σ2).

Our search for an adequate model may involve either an empirical or mechanistic approach.
The empirical method involves curve fitting, which we usually perform by linear methods.
Such models provide locally adequate description but often ignore the physicochemical
phenomena under investigation. Therefore, we can use only them for interpolation.

The mechanistic modeling approach considers the physics of the phenomena. In the model
development, differential equations often result. These differential equations have
nonlinear solutions that require iterative, nonlinear regression techniques to evaluate the
model parameters. This nonlinearity, however, offers little or no calculational difficulties
for modern, high-speed computers unless the model contains an unusually large number of
parameters.
The mechanistic model represents a more parsimonious description of the true state of
nature than its empirical counterpart. For example, we know a Taylor series expansion can
represent any continuously differentiable function f(x). Suppose we want to describe the
following nonlinear function
f ( x ) = b1 exp ( b2 x ) (4.3)

If we let b1 = b2 = 1 , the following Taylor series describes the nonlinear function f ( x ) = e x

x x2 xn ∞
xn
ex = 1 + + +L +L = ∑ for all x (4.3)
1! 2! n! n = 0 n!

If we wanted to approximate the exponential function to within 0.5 percent, then we would
need only four terms at x = 1 but twenty terms for the same accuracy at x = 10 . A twenty-
parameter, linear model requires eighteen more levels of the independent variable x than

32
the two-parameter nonlinear model like Eq. (4.2). We would much rather use all those extra
experiments for the error estimate (replication) than for parameter estimation. A good
rule to follow: the fewer the parameters the less error propagates to the estimated
response (Box, Hunter and Hunter, 1978).
Since a mechanistic model draws upon the physical phenomena, we can extrapolate outside
its range of applicability. By straining the model in this fashion, we can suggest regions for
further experimentation and identify weaknesses in the present model. We must, however,
beware of any extrapolation because physical mechanisms may change in regions beyond the
current data.
The development of a truly mechanistic model can challenge even the most gifted intellect.
The time and effort consumed in the pursuit of a theoretical model often makes this
approach improvident. Thus, the empirical description will often prove the more economical
alternative especially when the company's net profit represents the "bottom line".

We shall use the batch heating of a storage tank as an example to demonstrate the
differences in the two approaches. We shall also that example to show how model straining
can improve mechanistic models.

Example 4.1. Batch Heating of a Mixed Tank

Suppose we have an uninsulated, metal tank with m kg of water as shown in Figure 4.1. We
assume we have a well-mixed tank and its liquid contents initially at room temperature T∞ .
An electric heater with Qs kW of power heats the liquid. At t=0, we turn on the heater and
obtain the heating curve shown in Figure 4.2. Model the water temperature as a function of
time.

Figure 4.1. An Electrically-Heated Mixed Tank

Figure 4.2. Temperature Profile for the Batch Heating of a Mixed Tank

33
4.1. Empirical Approach
If we followed an empirical approach, we would probably try a second-order polynomial
T = b0 + b1 t + b2 t 2 (4.4)

If we found the quadratic fit inadequate, we would introduce higher order terms until we
obtained a reasonable fit.
Suppose, however, Eq. (4.4) describes the heating curve in Figure 4.2 and we want to know
the temperature at t = ∞, a time well beyond the data used to fit Eq. (4.4). What
temperature would we find? Since the quadratic term dominates Eq. (4.4) at large t and its
coefficient negative (reverse curvature), Eq. (4.4) would predict a temperature of negative
infinity, a truly unrealistic result!
4.2. Mechanistic Approach
The alternative approach to empirical modeling involves to the development of a mechanistic
model. Since we wish to describe the batch heating, we would start by writing an energy
balance to the accumulation of energy in the tank. If we assume the heat loss occurs only
through the tank walls, no evaporation, and a constant liquid heat capacity, the following
differential equation describes the liquid temperature in the tank
dT
mCp = Qs − UA ( T − T∞ ) (4.5)
dt { 14 4244 3
1424 3
accumulation heat heat loss
of source through tank walls
energy in liquid

We shall introduce the following change variable Θ = (T − T∞ ) in Eq. (4.5) and rearrange the
result to obtain
dΘ UA
+ Θ = Qs (4.6a)
dt mCp

subject to the initial condition


Θ = 0 at t = 0 (4.6b)

Equation (4.6) has a solution of the form


Θ = b0 + b1 exp ( b2 t ) (4.7)

The coefficients in this mechanistic model have physical significance as they contain
knowledge of the power input, overall heat transfer coefficient, heat transfer area, liquid
heat capacity and mass.
Qs
b0 = (4.8a)
UA
Qs
b1 = − (4.8b)
UA
UA
b2 = (4.8c)
mCp

34
Assuming Eq. (4.7) describes the heating curve, we see that as t approaches infinity the
temperature reaches a constant value
Qs
T= + T∞ (4.9)
UA

Equation (4.9) offers some useful design criteria. If we wish to increase the steady-state
temperature, Eq. (4.9) suggests two possible pathways:
• increase the heat input Qs, or
• decrease UA by insulating the tank.
An empirical model would never lead us to these criteria because the coefficients in Eq.
(4.4) do not contain information about the physical phenomena.
4.3. Model Testing

Let us suppose we wanted a higher steady-temperature and purchased an electric heater


with a larger power rating. When we heated the contents in the mixed tank, we find Eq.
(4.7) no longer adequately describes the experimental heating curve. Why? Because the
physics must have changed. Did some water evaporate? Possibly but evaporation probably
represents a second-order effect. A more likely explanation would involve the constant Cp
assumption.

If the heat capacity depended linearly on temperature

Cp = b 0 + b1T (4.10)

the solution to the resultant differential equation becomes


t = b0 + b1 Θ + b2 ln(b3 − Θ) (4.11)

Note we can no longer solve for Θ explicitly.

If we require quadratic temperature dependence, the solution becomes


t = b0 + b1 Θ + b2 Θ 2 + b3 ln(b4 − Θ) (4.12)

Thus, we can see that if we strain the original model beyond its limits identified the
weakness (constant heat capacity assumption) in the model formulation. This process
resulted in a sequential improvement in the mechanistic model.

4.4. Model Discrimination

Model discrimination frequently occurs in chemical engineering when we search for chemical
reaction mechanisms. Here, we would hypothesize several mechanisms, and then take the
reaction rate data to determine the correct kinetic expression. Example 4.2 represents
something that might occur.

Example 4.2. Chemical Kinetics

Suppose we have a chemical reaction that consumes two moles of A for every mole of B
produced.
2A → B (4.13)

35
Reaction Schemes I and II represent two possible mechanisms that might to describe the
overall reaction kinetics given by Eq. (4.13).
Scheme I:
k1
A + A → B (4.14)

Scheme II:
k1

→I
A ←
 k−1 (4.15)
k2
I + A 
→B

In scheme I, two molecules of A react irreversibly to form B. A simple second-order


reaction, therefore, describes the kinetics for the formation of B:

( rB ) = k1 [ A ]2 (4.16)

If the reaction occurred by scheme II, first an intermediate I forms reversibly from A
followed by an irreversible reaction of I with a second A to form B. The kinetic expression
for the formation of B becomes:

k1k2 [ A ]
2

( rB ) = (4.17)
k −1 + k2 [ A ]

We can fit the two kinetic models, Eqs. (4.16) and (4.17), to experimental data to estimate
the rate constants. But how do we decide if either model adequately describes that kinetic
data?

We need a principle by which we can decisions about the adequacy of a model. In the next
chapter, we will see how to estimate the model's parameters and then decide if we have an
adequate model.

References

1. Box, G.E.P., W.G. Hunter, and J.S. Hunter, Statistics for Experimenters: An
Introduction to Design, Data Analysis, and Model Building, Wiley, New York (1987).

36
CHAPTER 5

Regression
In this chapter, we develop the statistical methods we use to determine an adequate model.
Once we have such a model, we show how to establish confidence intervals on the model.
5.1. Least Squares

To evaluate a model, we need a principle by which we can determine to fit a proposed model
to the experimental data. The "principle of least squares" represents the most common
method used by statisticians. This method involves minimizing the variance between the
model and the experimental data. We refer to this variance as "the residual sum of
squares." Figure 5.1 shows a graphical representation of a residual R.

Figure 5.1. The residual represents the difference between the experimental data and the
fitted model.

The regression goal, therefore, becomes one of finding the unique set of parameters bn that
minimizes the sum of squares residual. We shall use the following example to graphically
illustrate this point.

Example 5.1. Minimizing the Sum of Squares Residual

Suppose we did not know anything about calculating an average but we did know the residual
represented the difference between an observation (data) and our guess at where the
average location (model) resides.

37
For the data in Example 3.3, we can then guess different averages and construct a plot
similar to Figure 5.2. We calculated the residual sum of squares as follows:
4 2

SR = ∑ ( yi − b0 ) (5.1)
i =1

Figure 5.2. How the Residual Sum of Squares changes with model parameter.

From Figure 5.2, we can see the average ( p sat = 360.0 kPa ) of the four observations in
Example 3.3 minimizes the residual sum of squares. When we test a model with replicates
at multiple levels, the principle of least squares will minimize the difference between
calculated average at each level and the model being tested throughout the range.

5.2. Matrix Representation

Consider the general model that consists of a predictable and unpredictable part
yi = f ( x1 , x2 , x3 ,K, β 1 , β 2 , β 3 ,K ) + ε i (5.2)

We normally will assume the unpredictable or error term ei as additive with a zero mean and
constant variance. This assumption may not hold true for all data, but it represents the
distribution normally assumed when we do not know the error structure.
For a set of data, we can write Eq. (5.2) in matrix form as
y = f ( X, b ) + e (5.3)

The bold lower case letters, y, f, and e, represent the response, model and error vectors.
These column vectors have n rows, i.e., the number of observations. The column vector b
contains the parameter estimates. It has p rows, the numbers of parameters or terms for a
linear model. The upper case letter X signifies a matrix that has dimension of n rows and p
columns. It contains the independent variables in the model. For a linear model, Eq. (5.3)
reduces to
y = Xb + e (5.4)

38
We shall call a model linear if we can express it as:
y = b1 x1 + b2 x2 + b3 x3 + Kbp x p (5.5)

5.2.1. Geometric Interpretation of Least Squares

Figure 5.3 offers a geometrical interpretation to our regression criteria—least squares


error. Here we have decomposed the data vector y ( n ×1 ) into the model vector ŷ ( p × 1 )
and the error vector e ( {n − p} × 1 ) .

Figure 5.3. Geometrical Interpretation of the Least Squares Error

The model vector lies somewhere along the dotted line in Fig. 5.3. As stated earlier we wish
to find the unique set of parameters b for a given model represented by Eq. (5.4) such that
it minimizes the distance between the data vector and the model vector. An error vector
orthogonal to the model represents minimum distance. From vector calculus, we know the
length of the data vector
2 2
y ˆ
= y + e
2
(5.6)

Equation (5.6) shows how we arrive at the idea of least squares error.

5.2.2. Formulation of Linear Regression Equations


We can rearrange Eq. (5.4) to obtain an expression for the error vector
e = y − Xb (5.7)

If we now square Eq. (5.7), we obtain an expression that represents the variance between
the data and the model, which the residual sum of squares.
n
e t e = ∑ ei2 = ( y − Xb ) ( y − Xb ) = y t y − 2b t X t y + b t X t Xb
t
(5.8)
i =1

Since we wish to minimize this variance with respect to the model parameters, we must
differentiate Eq. (5.8) with respect to b and set the result equal to zero.

39

∂b
( e t e ) = 2X t Xb − 2X t y = 0 (5.9)

To satisfy the resultant "normal equations"


X t Xb = X t y (5.10)

we pre-multiply both sides of Eq. (5.10) by the inverse of XtX and find

b = ( XtX ) Xty
-1
(5.11)

At the end of this chapter, we provide in Appendix 5.A a brief introduction some important
matrix properties. In Appendix 5.B, we derive Eq. (5.10).
We refer to the matrix (XtX)-1 as the C-matrix. When we multiply the C-matrix times the
sample population variance s2, we obtain the variance-covariance matrix
V(b) = Cs 2 (5.12)

The diagonal elements contain the parameter variances. When we take the square of the
diagonal elements of Eq. (5.12), we obtain the standard errors for the estimated
parameters, s(bi ) = cii s .

Equation (5.11) shows that linear regression involves the simultaneous solution of a set of
linear algebraic equations. Statistical software packages, such as SAS and SPSS, perform
the matrix algebra needed to solve for b, the unbiased parameter estimates. If we have a
nonlinear model, the solution of system of equations represented by Eq. (5.3) requires an
iterative procedure for b, which requires an initial guess for b. Appendix B gives sample
SAS programs for both multiple linear regression procedure PROC REG and nonlinear
regression procedure, PROC NLIN. In Appendix C, we present the SAS Output for these
procedures. Tables C.1-C.5 give the linear regression results while Tables C.6-C.8 give the
nonlinear regression results.

Example 5.2. Vapor Pressure of n-butane at 100°F

We know from Example 5.2 the average of the four vapor pressures represents our best
estimate, in terms of the least squares principle, of the vapor pressure at 100°F; however,
we can show the same result using the matrix algebra method we just outlined.
The matrix form of the linear model (constant) is expressed as y = Xb + e or in indices
yi = η + ε i (5.13)

For the given data, we have y and X

350.0 1
357.8 1
y= , X= .
362.9 1
369.3 1

The column of "ones" in X corresponds to the constant η. The b vector represents the
scalar quantity, η. Since Eq. (5.9) states that the solution vector is b = (XtX)-1Xty, we must

40
first determine XtX and invert it to find (XtX)-1. Next we must find Xty and then multiply it
by (XtX)-1.
We write XtX as

1
1
XtX = 1 1 1 1 ⋅ = ( 1 )( 1 ) + ( 1 )( 1 ) + ( 1 )( 1 ) + ( 1 )( 1 ) = 4
1
1

The inverse (XtX)-1is simply (4)-1 = 1/4. We write Xty as

350.0
357.8
Xty = 1 1 1 1 ⋅ = ( 1 )( 350.0 ) + ( 1 )( 357.8 ) + ( 1 )( 362.9 ) + ( 1 )( 369.3 )
362.9
369.3
= 1440.0
Therefore, we find that b0 = (1/4)(1440) = 360.0.

5.3. Sum of Squares and Degrees of Freedom

To interpret the results of a linear (or nonlinear) regression, we shall divide into the sum of
squares and degrees of freedom. The total sum of squares yty decomposes into the model
and residual sum of squares
y t y = bt Xt y + e te (5.14)

The total number of observations, n, represents the degrees of freedom associated with
the total sum of squares. The number of model parameters, p, represents the degrees of
freedom for the model, thus the difference between the total and model degrees of
freedom equals those associated with the residual sum of squares.
νR = n − p (5.15)

Example 5.4. Decomposition of Sum of Squares

We revisit Example 5.3 to show how to calculate the total, model and residual sum of
squares. We define the total sum of squares as
n
S = y t y = ∑ yi2 (5.16)
i =1

Thus, we find for S


350.0
357.8
= ( 350.0 ) + ( 357.8 ) + ( 362.9 ) + ( 369.3 )
2 2 2 2
350.0 357.8 362.9 369.3 ⋅
362.9
369.3
= 518,599.74
We calculate the model sum of squares knowing SM = btXty. From Example 5.2, we found
that b = 360.0 and Xty = 1440.0. Therefore, we have SM = (360.0)(1440.0) = 518,400.0.

41
The residual sum of squares represents the sum of squares not described by the model
SR = e t e = S − SM (5.17)

Thus, we find SR = 518,599.74 - 518,400.0 = 199.74.

The residual sum of squares subdivides further into two independent sums of squares, pure
error and "terms-left-out-of-model" (TLO). We will also see TLO sometimes referred to as
"lack-of-fit" (LOF). We calculate the pure error using the procedure outlined in the
Chapter 3, Section 3.2. To determine the sum of squares and degrees of freedom for
"terms-left-out," we subtract the pure error quantities from the residual ones.
SL = S R − SE (5.18)

ν L = ν R −ν E (5.19)

5.4. Summarizing the Regression: The ANOVA Table

The ANalysis Of VAriance (ANOVA) table summarizes the regression and partitions the
sum of squares and degrees of freedom into their respective sources. Table 5.1 shows a
generic ANOVA table. As we test different models on the same data, the rows MODEL,
RESIDUAL, AND TLO, will change; however the rows TOTAL and PURE ERROR will remain
invariant unless we transform the response yi.

All the entries in the ANOVA table represent positive and random (except for the degrees
of freedom) numbers. This means if we sample another set of n observations the numerical
value of individual squares would not be identical to the first set because the responses
contain an aura of randomness as shown in Eq. (5.2). However, we should draw the same
statistical conclusions about either set if we sampled the data subject to identical
conditions.

We can view the ANOVA table as analogous to an accountant's ledger sheet because if the
individual squares and degrees of freedom do not balance with the TOTAL row we have
miscalculated something. Thus, a negative entry anywhere within the table signifies a
mathematical error.

Table 5.1. Generic ANOVA Table

Source Sum of Degrees of Mean F-ratios


Squares Freedom Square Calculated Tabulated

Total S = yty n ——— ——— ———

Model SM = btXty νM = p msM= SM /νM msM /s2 F(α, νM,νE)

Residual SR = ete νR=n-p ——— ——— ———

Terms-Left SL = SR - SE νL=νR -νE msL= SL /νL msL /s2 F(α, νL,νE)

Pure Error SE νE s2 ——— ———

42
5.5. F-Tests for Model Adequacy
After we perform a regression analysis, we would like to determine if we have an adequate
model. We do this by means of F-tests. One test determines the model's significance, and
the other determines if we have omitted terms, yet unnamed, from the model.
The F-test represents a one-sided statistical test where we compare variances. It
determines if the effect or bias (mean square) comes from the same population as pure
error. Generally speaking, a mean square, like any statistical quantity, consists of a fixed
(predictable) and random (unpredictable) part.
mseffect = σ 2 + bias (5.20)

To prove that the mean square does not come from the same population as pure error, the
ratio of the mean square to the pure error estimate must exceed a critical F-ratio.

σ 2 + bias
> Fcrit (5.21)
s2

This critical ratio depends on the desired confidence level (α-risk), and the degrees of
freedom going into the numerator's estimate and the mean square pure error. Statisticians
tabulated F-ratios at specific α-risks and Table A.1 in Appendix A gives the F-distribution
for the a-risk of 0.05. The symbols ν1 and ν2 in Table A.1 represent the degrees of freedom
for the numerator and denominator.

In significance testing, we actually play a game. For the first F-test, we take a small α-risk
to force the effect (term-in-the-model) out of the model (insignificant). For the second
test, we take a large α-risk to force the "terms-left-out" into the model (significant). The
calculated F-ratio for the second test actually must be on the order of unity (0.8 to 1.4) to
prove beyond a shadow of a doubt that we have an adequate model.

5.6. Confidence Intervals on Statistical Quantities

Once we have an adequate model, we often desire to place a confidence interval on the
estimated parameters. This interval represents the probability the true parameter will lie
between an upper and lower bound. We use the t-statistic to establish the interval.
We define the t-statistic as the ratio of the deviation between an estimated parameter y
and its true value E(y) over an estimate of the parameter variability or "standard error"
s(y).

y − E( y)
t= (5.22)
s( y)

We use "y" to symbolize any estimated parameter (statistic) not necessarily the response.
We view the t-statistic as an estimate to the unit normal deviate.
y −η
z= (5.23)
σ

Figure 5.3 shows the t-distribution to have broader shape and lower peak at the
distribution's center than the normal distribution. It only approaches the normal
distribution for very large sample sizes. We should note, however, more replicates never

43
reduce error but only improve the error estimate. The t-value gives the probability that we
have a significant difference between two values, one of which we assume known. Dividing
the difference by the standard deviation normalizes the effect, i.e., we only need one t-
distribution to determine the assigned probability.

Figure 5.3. Shape of t-distribution relative to the Unit Normal Distribution

We have much more interest in bounding the true value than determining if we have a
significant difference between two values. The establishment of a significant difference
becomes especially difficult when we do not know the true value.

We base the confidence interval calculation on an estimated knowledge of the parameter


and its variability. We select a desired confidence or α-risk, the 95 percent confidence
level or a α-risk of 0.05 represents the standard risk. Since the interval has two-sides, we
place half the risk on the low side and half on the high side. We calculate the two-sided
interval for a parameter y as follows:

 α  α  
P  y − t  ,ν R  s ( y ) < E ( y ) < y + t  ,ν R  s ( y )  = 1 − α (5.24)
 2  2  
Equation (5.24) sets the probability P the "true value of y" or the expected value (long-term
average) of the estimated parameter, lies between an interval

α  α 
y − t  ,ν R  s ( y ) < E ( y ) < y + t  ,ν R  s ( y ) (5.25)
2  2 

Example 5.5. Confidence Interval on an Average


For the data given in Example 5.1, calculate the 95 % confidence interval on the vapor
pressure at 100 °F. From Example 5.1, we have

p sat = 360.0 kPa

s100°F = 8.16 kPa

We calculate the confidence interval for the true mean of vapor pressure at 100 °F using
Eq. (5.23)

44
p sat − ts ( p sat ) < p sat < p sat + ts ( p sat )

The following formula represents standard error of a sample average


s
s( y ) = (5.24)
n

where s represents the sample standard deviation of the entire population. From Eq. (5.24),
we see that sample standard deviation gets reduced by a factor of n , simply stated the
averages vary less than the individual observations. We offer some words of caution. When
establishing confidence interval on a parameter, you must insure you use the standard error
associated with that parameter.
For our example, we have
8.16
s ( p sat ) = = 4.08 kPa
4

For an α-risk of 0.05 and three degrees of freedom associated with the error estimate, we
find from Table A.2 that t(0.05,3) = 3.182. The t-values given in Table A.2 have half the
risk distributed on the lower end and half on the upper end. NOTE: One should always
exercise caution when using t-tables because not all have two-sided construction; some have
one-sided construction.

Substituting the standard error and t-value into Eq. (5.22), we obtain the confidence
interval and make the statistical statement that "19 times out of 20" the vapor pressure of
n-butane at 100 °F will lie between 347.0 and 373.0 kPa.

We could also determine the standard of the average from the matrix equation. From Eq.
(5.10), we know that
V(b0 ) = c11 s 2 (5.25a)

or

s(b0 ) = c11 s (5.25b)

From Example 5.2, we note the matrix (XtX)-1 = 1/4 so c11=1/4 or 1/n. Therefore, we end
with the same result for s(b0).

45
Appendix 5.A. Some Properties of Matrix Algebra
To understand some of the matrix algebra involved in the derivation and solution of linear
regression equation, we introduce some elementary properties.

Multiplication
For the sake of illustration, we shall consider the following matrices, A and B.

b11 b12
a11 a12 a13
A= and B = b21 b22
a21 a22 a23
b31` b32

To multiply two matrices,such as A and B, the columns of A must equal the rows of B. Thus,
we have
AB = C (5.A.1)

c11 c12 c13


where C = c21 c22 c23 . We form the elements of C as follows
c31 c32 c33
m
cij = ∑ aimbmj (5.A.2)

For example, we have for c11 = a11b11 + a12b21 + a13b31 .

Special Matrices

A special square matrix (n rows by n columns) called the identity matrix I, which has ones
along its diagonal elements and zeros in all the off diagonal elements has the property
AI = A (5.A.3)
-1
Every nonsingular matrix A has an inverse A that when multiply by A, the following
relationship occurs
AA −1 = A −1A = I (5.A.4)

Transpose
The matrix transpose operation interchanges rows with columns. For example, At read "the
transpose of A" would yield
a11 a21
A t = a12 a22 (5.A.5)
a13` a23

Some useful identities that involve the transpose operation

( AB )
t
= B tA t (5.A.6)

A tB = B t A (5.A.7)

46
Appendix 5.B. Derivation of the Normal Equations
In this appendix, we present the full derivation of Eq. (5.10)—the "normal equations."
We first apply transpose property Eq. (5.A.6) to Eq. (5.7) and expand, we obtain

e t e = ( y − Xb ) ( y − Xb ) = ( y t − b t X t ) ( y − Xb )
t

(5.B.1)
= y t y − b t X t y − y t Xb + b t X t Xb

When we apply transpose properties Eq. (5.A.6) and (5.A.7) to the third term in Eq. (5.B.1),
we find

y t Xb = ( Xb ) y = b t X t y
t
(5.B.2)

Therefore, Eq. (5.B.1) simplifies to

e t e = y t y − 2b t X t y + b t X t Xb (5.B.3)

If we apply transpose property Eq. (5.A.5) to the last term on the R.H.S of Eq. (5.B.3), we
see that

b t X t Xb = b t ( b t X t X ) (5.B.4)

Now we can minimize the residual sum of squares ete with respect to b vector

∂ ∂
∂b
( e te ) =  y t y − 2b t X t y + b t ( b t X t X )  = 0
∂b   (5.B.5)

We should note the b vector has dimension (px1) so we have


∂b1

= M (5.B.6)
∂b

∂bp


If we now apply that vector differentiation operator to Eq. (5.B.4) and note
∂b
( yty) = 0

∂ ∂ ∂ ∂
∂b
( e t e ) = −2 ∂b  b t  X t y + ∂b  b t  ( b t X t X ) + b t ∂b  b t  X t X
(5.B.7)
∂ ∂
= −2  b t  X t y + 2  b t  ( b t X t X )
∂b ∂b

47

If expand the vector product of  b t  , we find
∂b  

∂b1 ∂b2 ∂bp


L
∂b1 ∂b1 ∂b1
∂ ∂b1 ∂b2
L 1 0 L 0
∂b1 ∂b2 ∂b2 0 1 0 M
M ⋅ b1 L bp = = =I (5.B.8)
∂bp M 0 O 0
∂ M O
∂bp −1 0 L 0 1
∂bp
∂b1 ∂bp −1 ∂bp
L
∂bp ∂bp ∂bp

Applying the above result to Eq. (5.B.8), we recover the "normal equations".

btXtX = Xty (5.10)

References

1. Draper, N. R. and H. Smith, Applied Regression Analysis, 2nd ed., Wiley-Interscience,


New York (1981).

48
CHAPTER 6

SAS Output
In Appendix B, we include sample SAS multiple linear (PROC REG) and nonlinear regression
(PROC NLIN) procedures and present these programs' output in Appendix C. We also
present an abbreviated and annotated version of the PROC REG's output in Table 6.1. In
this chapter, we focus our discussion on Table 6.1 because the procedure's ANOVA table
deserves some words of caution.

6.1. Quick Word about PROC REG

The following two SAS statements generate the preformatted output shown in Table 6.1.

PROC REG;
MODEL Y = X / R CLM;

For our example, we allowed a constant term to enter the model so the model statement
actually fits the equation
y = b0 + b1 x (6.1)

When we include the constant term, the SAS procedure PROC REG automatically adds a
column of ones to the design matrix to generate the X matrix

1 x1
1 x2
X= M M (6.2)
1 xn −1
1 xn

The column of ones corresponds to the constant term.

If we want to force the regression through the origin, we need to invoke the no intercept
option NOINT so the MODEL statement becomes

MODEL Y = X / NOINT R CLM;

which fits the data to the model


y = b1 x (6.3)

49
6.2. Corrected Total and Model Sum of Squares
When the regression includes the constant term, the SAS procedure PROC REG subtracts
out the contribution the mean or constant term has on the total sum of squares; and a row C
TOTAL, which represents the corrected total sum of squares SCT, appears in the SAS
output.
The corrected total represents the total sum of squares minus the sum of squares due to
the average (constant term).
SCT = S − SA (6.4)

The program effectively translates the x-axis to the average of all the observations y .

Recall the calculation of the average reduces the total number of degrees of freedom by
one; therefore, the degrees of freedom for the corrected total also gets reduces by one
ν CT = n − 1 (6.5)

Since the program removed effect of the constant term from the model, the row MODEL in
the SAS output represents the model without this term. The model degrees of freedom in
the SAS output similarly reduce by one.

To obtain the sum of squares for the entire model we must add the sum of squares due to
the average
SM = SM' + SA (6.6)

SM' represents the sum of model squares without the average, which we find in the MODEL
row and SUM OF SQUARES column. SA represents the sum of squares about the average
and we calculate its value by taking the DEP MEAN (dependent mean or average of all the
responses, y ) from the SAS output and substituting it into the following expression

2
 ∑ yi  2
SA = n ( y )  =  ∑ yi  n
2
=n (6.7)
 n 

If we specify the no intercept option NOINT, then SAS program does not take out the
average and SAS output correctly represents the total (or uncorrected total) and model
sum of squares. We would also notice that row in Table 6.1 labeled "C TOTAL" would read
"U TOTAL".

6.3. Difference Between Residual Error and Pure Error

For the sake of statistical correctness, we wish to point the difference between residual
error (variance between data and model) and pure data (variance in data). The ERROR row
in the SAS PROC REG output should actually read RESIDUAL. The pure error estimate
requires replicates, which PROC REG cannot identify. We must perform the pure error
calculation off-stream of the computer program by the method outlined in Section 3.3.
Since the SAS output actually reports mean square residual rather than mean square error,
the reported F-ratio represents an incorrect F-ratio.

50
As outlined in Chapter 5, we calculate the first F-ratio by dividing the model mean square
(msM) by mean square pure error (s2 = msE) not by the mean square residual (msR). Without
the "true mean square error", we can not perform the second F-test and test for the
significance of "terms-left-out". Therefore, the model's adequacy goes unresolved.

6.4. Updating the Error Estimate


When we finally prove the model adequate, the msL and s2 represent two independent error
estimates. Therefore, to obtain a "better" estimate of error we pool these two estimates

( ν L )( msL ) + ( ν E )( msE )
snew
2
= msR = (6.8)
ν L +ν E

We find the residual mean square msR in the ERROR row and MEAN SQUARE column of the
SAS output. Therefore, the msR only represents the proper error estimate for an adequate
model. Note: the degrees of freedom for the error estimate increases to the value of the
residual degrees of freedom. We use these degrees of freedom for the confidence interval
calculations.

6.5. PROC REG Output

In this section, we shall outline the PROC REG's output shown in Table 6.1. We have
numbered the portions of the output that we deem most important. Table 6.1 shows that
the output contains three major sections: Analysis of Variance; Parameters Estimates; and
Prediction and Residuals. The last two sections only have validity when we have found an
adequate model.

6.5.1. Analysis of Variance

The first section contains information, we need to complete the entire ANOVA. The first
entry represents the degrees of freedom associated with model with the average's
contribution subtracted out when we include the constant term. Otherwise, it represents
the number of parameters in the model, p. The second entry represents the residual
degrees of freedom, n-p. The third entry, for our example output, represents the degrees
of freedom for the corrected total, n-1.
Moving across each row, we first encounter the sum of squares. For our example, we find
SM' (Entry 4), SR (Entry 5), and SCT (Entry 6). Next we encounter the two mean squares;
msM' the mean square entire model without the average (Entry 7) or in our example mean
square for contribution of linear term, and msR the mean square residual (Entry 8). Entry 9
represents an incorrectly calculated F-ratio formed by dividing Entry 7 by Entry 8. Recall
we F-ratios dividing by mean square pure error.

The dependent mean, DEP MEAN (Entry 11) represents only number of interest in the next
cluster of numbers. We use this mean value in Eq. (6.7) to calculate SA. The square root of
Entry 9 yields ROOT MSE (Entry 10), an incorrect standard error until we have the correct
model.
Next, we encounter that most misused regression statistic, R2 (Entry 12). We define R2 as

51
SM SM SM
R2 = = = (6.9)
S SM + SR SM + SE + SL

Most experimenters respond unfavorably to an R2 far removed from unity and conclude the
model poorly fits the data. We must, however, realize data scatter so SR could never reach
zero, which must occur for R2 to equal one. The absolute best we could ever anticipate
happening: a ratio of mean square TLO to mean square pure error to equal one. The bottom
line: use the F-ratio criteria rather than R2 to determine how well the model fits.
A much more difficult modeling task arises when we encounter very large scatter in the
data, such as shown in Figure 6.1a. The sampled data may have come from the population
described by the straight line in Fig. 6.1b but the large error masks the linear effect. Thus,
we could only statistically prove that the average (Fig. 6.1c) describes the data. Had we
resorted to an R2 = 1.0 as our criteria—a condition where model passes through all the data,
we end up with the model in Fig. 6.1d or a truly questionable correlation.

Figure 6.1. Correlation of Highly Variable Data

6.5.2. Parameter Estimates and Standard Errors

The second major section involves the parameter estimates for each parameter associated
with the model (Entry 14) and the parameters standard errors (Entry 15). If we divide
Entry 14 by Entry 15, we obtain Entry 16, a t-value associated with the null hypothesis
H0 : E ( y ) = 0 . Figure 6.2 graphically shows what this statistic test involves.

Recalling the definition of t-value as

y − E( y )
t= (6.8)
s( y )

52
we see if the parameter E(y) equaled zero instead of the estimated value we would have to
move "t" standard errors (Entry 16) before encountering zero. The shaded area indicates
the probability (Entry 17) of E(y) = 0.

Figure 6.2. The One-Sided t-Test for the Null Hypothesis

6.5.3. Prediction, Confidence Interval, and Residuals

The third part of Table 6.1 gets generated when we insert the R and CLM options following
the forward slash "/" in the MODEL statement. Without either of these two options, the
PROC REG generates only the ANOVA and parameter estimates sections.

We will invoke either the CLM or R option PROC REG generates the actual response, DEP
VAR (Column 18), and outputs the model prediction, PREDICT VALUE (Column 19), STD ERR
PREDICT (Column 20), and RESIDUAL (Column 23).

The CLM option generates the confidence interval on the prediction at the 95 percent level,
the LOWER95% MEAN (Column 21) and UPPER95% MEAN (Column 22). Table 6.1 shows all
these values depend on the location of X, which indicates we know the prediction (model)
better in the center of the range; hence, the smaller standard errors on prediction (Column
20) in the center of the range of X.

The R option generates the rest of the columns; STD ERR RESIDUAL (Column 24),
STUDENT RESIDUAL (Column 25), the scattergram of the student residuals, and Cook's D.
The student residual represents a normalized residual formed if we divide Column 23 by
Column 24.
Example 6.1. The Calculation on a Confidence Interval on Prediction
We see from Table 6.1 PROC REG automatically calculates the 95 percent confidence
interval. How would we calculate the 99 percent C.I. at x=310.9?

We must first assume we have an adequate model so we can use the statistical information
from Table 6.1. Our experimental design replicates four levels of three times, which yields
twelve replicates from a total of sixteen treatments. Since we have an adequate model, we
can update the error, which brings the total degrees of freedom to fourteen.
From Eq. (5.25), we can write the confidence interval for prediction
α α
ˆ − t  ,ν R  s y
y ( ) ( ) ˆ + t  ,ν R  s yˆ
ˆ < E yˆ < y ( ) (6.9)
2  2 

53
We find from Table A.6 that t ( 0.01,14 ) = 2.977. From Table 6.1, we find ŷ = 5.8765
x =310.9

( )
ˆ
and s y
x =310.9
= 0.017 . Substituting these values into Eq. (6.9), we find

5.8169 = 5.8675 − ( 2.977 )( 0.017 ) < E y ( )


ˆ < 5.8675 + ( 2.977 )( 0.017 ) = 5.9181

Therefore, we can conclude that "99 times out of 100" at x=310.9 the true response y will
lie between 5.8169 and 5.9181.

54
Table 6.1. Sample SAS Output for Multiple Linear Regression PROC REG
The SAS System 15:50 Monday, November 6, 2000 1

LINEAR REGRESSION
Dependent Variable: Y

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Prob>F

Model 1 1 4 3.82861 7 3.82861 9 2384.646 0.0001

Error 2 14 5 0.02248 8 0.00161

C Total 3 15 6 3.85109

Root MSE 10 0.04007 R-square 12 0.9942

Dep Mean 11 6.51946 Adj R-sq


13 0.9937

C.V. 0.61461

Parameter Estimates

Parameter Standard T for H0:


Variable DF Estimate Error Parameter=0 Prob > |T|

INTERCEP 1 14.521247 0.16416669 88.454 0.0001


14 15 16 17
X 1 -2687.657258 55.03791223 -48.833 0.0001

Dep Var Predict Std Err Lower95% Upper95% Std Err Student Cook's
Obs X Y Value Predict Mean Mean Residual Residual Residual -2-1-0 1 2 D

18 19 20 21 22 23 24 25
1 310.9 5.8579 5.8765 0.017 5.8410 5.9120 -0.0185 0.036 -0.508 | *| | 0.027
2 310.9 5.8800 5.8765 0.017 5.8410 5.9120 0.00349 0.036 0.096 | | | 0.001
3 310.9 5.8941 5.8765 0.017 5.8410 5.9120 0.0176 0.036 0.483 | | | 0.024
4 310.9 5.9116 5.8765 0.017 5.8410 5.9120 0.0351 0.036 0.963 | |* | 0.095
5 326.4 6.2356 6.2870 0.011 6.2632 6.3108 -0.0514 0.039 -1.335 | **| | 0.074
6 326.4 6.2716 6.2870 0.011 6.2632 6.3108 -0.0154 0.039 -0.401 | | | 0.007
7 326.4 6.3391 6.2870 0.011 6.2632 6.3108 0.0521 0.039 1.354 | |** | 0.076
8 326.4 6.3026 6.2870 0.011 6.2632 6.3108 0.0156 0.039 0.406 | | | 0.007
9 344.8 6.7348 6.7264 0.011 6.7031 6.7497 0.00841 0.039 0.218 | | | 0.002
10 344.8 6.7149 6.7264 0.011 6.7031 6.7497 -0.0115 0.039 -0.299 | | | 0.004
11 344.8 6.7113 6.7264 0.011 6.7031 6.7497 -0.0152 0.039 -0.393 | | | 0.006
12 344.8 6.6358 6.7264 0.011 6.7031 6.7497 -0.0906 0.039 -2.349 | ****| | 0.220
13 366.5 7.2104 7.1879 0.017 7.1516 7.2243 0.0224 0.036 0.618 | |* | 0.042
14 366.5 7.2002 7.1879 0.017 7.1516 7.2243 0.0123 0.036 0.338 | | | 0.012
15 366.5 7.1567 7.1879 0.017 7.1516 7.2243 -0.0312 0.036 -0.860 | *| | 0.081
16 366.5 7.2547 7.1879 0.017 7.1516 7.2243 0.0668 0.036 1.840 | |*** | 0.370

Sum of Residuals 3.41E-14


Sum of Squared Residuals 0.0225
Predicted Resid SS (Press) 0.0287

TABLE LEGEND
1. Model w/o Average Degrees of Freedom, p-1 14. Parameter Estimates, bi
2. Residual Degrees of Freedom, n-p 15. Standard Error for Parameters, s ( bi )
3. Corrected Total Degrees of Freedom, n-1 16. t-value for Null Hypothesis
4. Model Sum of Squares w/o Average, SM' 17. Probability that bi = 0
5. Residual Sum of Squares, SR 18. Actual Responses, yi
6. Corrected Total Sum of Squares, SCT 19. Predicted Values, ŷi
7. Mean Square for Model w/o Average, msM' ˆ
20. Standard Errors on Prediction, s y ( )
8. Mean Square Residual, msR 21. Lower 95% C.I. on Prediction
9. Incorrect F-Ratio, msM' msR 22. Upper 95% C.I. on Prediction
10. Root Mean Square Residual, msR 23. Residuals, R
11. Dependent Mean, y 24. Standard Errors of Residual, s(R)
12. Correlation Coefficient, R 2
25. Student Residuals, R s ( R )
13. Adjusted R2

55
CHAPTER 7

Model Building with SAS


In this chapter, we shall show how to perform the model building procedure. Experience
should always guide us when we propose models to test; however, for the sake of
illustration, we shall abandon this notion and start with an example of empirical model
building.

7.1. Modeling Vapor Pressure with Purely Empirical Models

Two independent experimenters have measured on identical apparatus the vapor pressure of
n-butane. We shall assume that both experimenters have equal ability; therefore, their
experimental error should have no inherent differences. Table 7.1 gives first
experimenter's data set whereas Table 7.2 gives the data set for the second. Since the
same model should describe both data sets, we shall use the first experimenter's data to
build our vapor pressure model.

Table 7.1 Vapor Pressure Data Set for Experimenter One

Run No. T (K) psat (kPa)


1 310.9 350.0
2 366.5 1353.4
3 366.5 1339.7
4 326.4 510.6
5 310.9 357.8
6 344.8 841.2
7 326.4 529.3
8 344.8 824.6

Since we wish to determine the functional dependence of vapor pressure on temperature,


we need to propose a model. Physical intuition tells vapor pressure should increase with
temperature so the simplest model we can propose would include a linear temperature
effect
p sat = b0 + b1T (7.1)

We, will show during our empirical model building process that a constant term alone does
not adequately describe the data.
As any good experimenter in the engineering sciences, we should first obtain a pure error
estimate using the method outlined in the Chapter 3. The row marked "Pure Error" in Table

56
7.3 shows the result of our error calculation. This value remains invariant during the model
building procedure unless we transform the response. Next we should plot the data to
determine possible candidate models—mathematical forms, such as linear, quadratic or
cubic.

Table 7.2 Vapor Pressure Data Set for Experimenter Two

Run No. T (K) psat (kPa)


1 310.9 362.9
2 366.5 1282.7
3 366.5 1414.8
4 326.4 566.3
5 310.9 369.3
6 344.8 821.9
7 326.4 546.0
8 344.8 761.9

We can determine the constant term model, psat = b0, from the PROC REG output for the
model, psat = b0 + b1T given in Appendix C, Table C.3. The entry DEP MEAN represents the
average of the eight vapor pressures in Table 7.1, which we find as 763.325; thus, b0 =
763.325 for the constant term model.
We construct the ANOVA table, Table 7.3, using SAS some selected SAS results given in
Table C.3. We know the residual sum of squares represents the difference between the
total and model sum of squares, SR = S-SM, and for the constant model, the sum of squares
about the average, SA, represents the model sum of squares SM. We outlined the calculation
of SA in the Chapter 6. We know S-SA = SCT; thus, for the constant model and only for this
model, the corrected total sum of squares equals the residual sum of squares, SCT = SR. The
entries in the row RESIDUAL in Table 7.3 come from row C TOTAL in Table C.3. The total
sum of squares and its degrees of freedom result from adding the appropriate columns in
RESIDUAL and MODEL rows; they also remain invariant throughout the model building
process unless, as stated previously we transform the responses. We find the terms-left-
out sum of squares by difference.

Table 7.3. ANOVA for Model: psat = b0 Using Data Set No.1

SOURCE SS d.f. ms Fcal Ftab


TOTAL 5805461.945 8
MODEL 4661320.445 1
Constant 4661320.445 1 4661320.445 42677.29 7.71
RESIDUAL 1144141.5 7
Terms-Left-Out 1143704.61 3 381234.87 3490.44 6.59
Pure Error 436.89 4 109.2225

57
In our model building example, we test the significance of each term as it enters the model.
Table 7.3 shows for the constant term Fcal > Ftab so we conclude this term as significant at
the 95% confidence level. In fact, we will always find the constant term significant if we
include it in the model. Every empirical model should include such a term unless the physics
dictates the data should pass through the origin (no intercept). Table 7.3 also shows the
terms-left-out (TLO) as significant at the 95% confidence level; therefore, we must
proceed with our empirical model building and add a linear term.
Table 7.4 gives the ANOVA for the linear model. We constructed the table from
information in the SAS output, Table C.3. Because we included a constant term in the
model, the SAS procedure REG automatically subtracts out the contribution of SA from the
total sum of squares. Thus, the row MODEL in Table C.3 represents the effect of adding
the linear term. Until we have an adequate model, the row ERROR in Table C.3 actually
represents RESIDUAL in Table 7.4. Again we calculate the Terms-Left-Out row by
difference. The two F-tests in Table 7.4 show that both the linear term and TLO as
significant.

Table 7.4. ANOVA for Model: psat = b0 + b1T Using Data Set No. 1

SOURCE SS d.f. ms Fcal Ftab


TOTAL 5805461.945 8
MODEL 5776778.515 2
Constant 4661320.445 1
Linear 1115458.07 1 1115458.07 10212.71 7.71
RESIDUAL 28683.43 6
Terms-Left-Out 28246.54 2 14123.27 129.31 6.94
Pure Error 436.89 4 109.2225

We can see the inadequacy of the linear model in Figure 7.1. Figure 7.1 shows the data lie
below the model for the midrange temperatures and above at the two extremes. For a
statistically-adequate model, we recall from Chapter 4 that the data must scatter randomly
about the model. This strongly suggests a quadratic behavior and that we add such a term
to the model.
p sat = b0 + b1T + b2T 2 (7.2)

A plot of the student residuals (information found in Table C.3 for the linear model and
Table C.4 for the quadratic) as a function of temperature offers a better way to view the
idea of model adequacy—the transform of the data into "white noise". We plotted the
student residuals in Figure 7.2 for both Eqs. (7.1) and (7.2). We clearly see the error
structure (residual or difference between model and data) for linear model has some type
of functionality whereas the quadratic model apparently does not. From the residual plot,
we, therefore, make our final model assessment about the appropriateness of the model.
Not only must the model satisfy our second F-test but it must also contain bias or put
another way perceptible functionality.

58
1500

1200 psat= -5300 + 18T

900
sat
p (kPa)

600

300
310 330 350 370
T (K)

Figure 7.1. Linear Fit of First Experimenter's Vapor Pressure Data for n-butane

2 2

1 1

R R
0 0
s( R ) s( R )

-1 -1

-2 -2
310 330 350 370 310 330 350 370
T(K) T(K)
LINEAR QUADRATIC

Figure 7.2. Student Residuals for Eqs. (7.1) and (7.2) Fit to Data Given in Table 7.1.

We have constructed the ANOVA table for the quadratic model, Table 7.5, using
information provided in Table C.4. The construction of Table 7.5 mimics that of Table 7.4
except the row MODEL in Table C.4 contains the contribution of both the linear and
quadratic terms, thus so the model degrees of freedom in Table C.4 jump to two. Note:
When we include a particular term in the model, that term always accounts for the same
amount of the sum of squares even as we add more terms to the model. This statement
holds true unless we change the order in which the terms enter the model. However, the

59
model sum of squares will always total to the same amount regardless of how the terms
entered.
SM = Sterm 1 + Sterm 2 + K + Sterm n (7.3)

Therefore, to determine the contribution of just the quadratic term we must subtract the
linear term's effect.
From Table 7.5, we find the quadratic term significant at the 95% level and TLO
insignificant. In fact, the F-ratio for TLO becomes unsuspectingly low (0.15). This low F-
ratio alarms us because if we truly transform the data into "white noise", which a
statistically adequate model does, we should have an F-ratio on the order of unity.

Table 7.5. ANOVA for Model: psat = b0 + b1T + b2T2 Using Data Set No. 1

SOURCE SS d.f. ms Fcal Ftab


TOTAL 5805461.945 8
MODEL 5805008.215 3
Constant 4661320.445 1
Linear 1115458.07 1
Quadratic 28229.70 1 28229.70 258.46 7.71
RESIDUAL 453.73 5
Terms-Left-Out 16.84 1 16.84 0.15 7.71
Pure Error 436.89 4 109.2225

If we consider the second experimenter's data set (Table 7.2) and proceed on with the
empirical model building procedure through the quadratic term, we find an F-ratio for TLO
of 1.5. Again we find the quadratic model, adequate at the 95 % confidence level but it
becomes only marginally significant if we take a α-risk of 0.25 where F(0.25, 1, 4) = 1.81.
This fact suggests we may not have an adequate model.

Some incriminating evidence surfaces when we look at the sample variances for the two
experimenters' data. Table 7.6 shows the sample variances for the first experimenter's
data (Table 7.1) while Table 7.7 shows variances the second experimenter's data (Table
7.2). We have plotted the variances for both experimenters in Figure 7.3 as a function of
temperature. Figure 7.3 shows the first experimenter's variances fluctuate randomly
around the mean square error of 109.2225. However, Figure 7.3 shows the second
experimenter's variances grow dramatically with increasing temperature, which indicates
the constant variance assumption indeed seems suspect.

If we concluded the quadratic model adequately describes both experimenters' data, we


then determine the confidence intervals on the true parameters. With model deemed
adequate, we can use the parameter estimates and standard errors for the first
experimenter's data given in Table C.4. We should note, as described in Chapter 6, the
degrees of freedom for the error estimate have increased from four to five. Table 7.8
shows the parameter estimates and their 95% confidence intervals for both experimenters.
We see that the parameter estimates vary somewhat for the two data sets, an the

60
confidence intervals for the first experimenter's parameter estimates do not significantly
encompass the parameter estimates of the second experimenter. This results from the
first data set having an unusually low error estimate. On the other hand, the second
experimenter's confidence intervals spread more widely than the first experimenter's and
encompass both the first experimenter's parameter estimates and confidence intervals.
The bottom line: we feel suspicious of the quadratic model and recommend we possibly
search for a better model.

Table 7.6. Sample Averages and Variances for Experimenter One

T (K) p sat (kPa) s p2sat (kPa) 2 msE (kPa) 2

310.9 353.9 30.42


326.4 529.95 174.845
109.225
344.8 832.9 137.780
366.5 1346.55 93.845

Table 7.7. Sample Averages and Variances for Experimenter Two

T (K) p sat (kPa) s p2sat (kPa) 2 msE (kPa) 2

310.9 366.1 20.48


326.4 556.15 206.045
2687.9325
344.8 791.9 1800.000
366.5 1348.75 8725.205

4
10
Experimenter Data Set
No. 1 ×
No. 2 •
103
si2

2
10

10
310 330 350 370
T(K)

Figure 7.3. Sample Variances of Two Experimenters' Data

61
Table 7.8. Parameter Estimates and 95% Confidence Intervals for Quadratic Model

EXPERIMENTER
Coefficient
1 2
Constant 14619 ± 2906 17632 ± 16615
Linear - 100.0 ± 17.2 - 117.3 ± 98.3
Quadratic 0.174 ± 0.025 0.199 ± 0.145

7.2. Modeling Vapor Pressure with Semi-empirical Model


We fit the data from Section 7.1 to a purely empirical model. Now we wish to see if one of
the thermodynamic models propose in Chapter 2 can describe these data. We shall try the
simplest model first
ln p sat = b0 + b1 T (7.4)

We can evaluate the transformed vapor pressure model using linear regression and Table
C.5 shows the regression results. We construct the ANOVA table, Table 7.9 as we did in
Section 7.1. Since we transformed the response, we must recalculate the total and pure
error sum of squares. We must also recalculate sum of squares about the average because
the average of all responses changed to 6.513169. From Table 7.9, we find the model
adequate at the 95% confidence level.

In Figure 7.4, we present the prediction and its 99% confidence intervals. We calculated
the confidence intervals using six degrees of freedom for the error estimate (From Table
A.2, we find t(0.01,6) = 3.707.) and the standard error on the prediction, found at the
bottom of Table C.5. Note the curved confidence intervals on the prediction (model).

Table 7.9. ANOVA for Model: ln psat = b0 + b1/T Using Data Set No. 1

SOURCE SS d.f. ms Fcal Ftab


TOTAL 341.3845999 8
MODEL 341.3820132 2
Constant 339.3709634 1
Linear 2.01104979 1 2.01104979 1763.78 7.71
RESIDUAL 0.002586676 6
Terms-Left-Out 0.001446479 2 0.000723 2.54 6.94
Pure Error 0.0011401969 4 0.000285

Though we have an adequate model at the 95% level, we still feel somewhat suspicious
because the F-ratio (2.54) for TLO does not meet the acid test, i.e., 0.8 < Fcal < 1.4 . In fact,
if we take a lower a-risk of 0.25 in an attempt to make the TLO significant, we find that
F(0,25,2,4) = 2.00 and TLO becomes significant. Our conclusion: we do not have an adequate
model.

62
2000
2755
ln p sat = 14.7 −
T
1000
Upper 99% C.I.

p sat (kPa)

Lower 99% C.I.

200
2.7 2.9 3.1 3.3
1
T
× 10 3 (K )
-1

Figure 7.4. Semi-Log Vapor Pressure Model for n-butane

7.3. Revision, Revision, Revision

We may have an inadequate model but we can recommend to Lone Rock Novelties that we
need to further replicate our vapor pressure data. Suppose we repeat the initial design
given in Example 2.2 and collect the data show in Table 7.2. What would we find? Table
7.10 tells the story. Using both data sets, we have a total of 12 replicates, which provides a
more than sufficient error estimate. Naturally all the sum of squares get larger with the
increased amount of data but the major difference to notice in Table 7.10 centers on the
calculated F-ratio for TLO. It has dropped to 1.52! Of course, the tabulated F-ratio for a
α-risk of 0.05 dropped as well to 3.89 because we have a much better error estimate with
12 degrees of freedom. When we increase our risk to 0.25, we find F(0.25,2,12) = 1.56.
Therefore, the semi-log model looks much better from a statistical standpoint.

Table 7.10. ANOVA for Model: ln psat = b0 + b1/T Using Data Sets Nos. 1 and 2

SOURCE SS d.f. ms Fcal Ftab


TOTAL 683.8823 16
MODEL 2
Constant 680.0537 1
Linear 3.82861 1 3.82861 1683.65 4.75
RESIDUAL 0.02248 14
Terms-Left-Out 0.004548 2 0.00149 1.52 3.89
Pure Error 0.017932 12 0.002274

63
To solidify our thoughts about the model's adequacy, we have plotted the student residuals
for the revised semi-log model in Figure 7.5. The figure shows the residuals randomly
scattering around the model, which further strengthens the case for the semi-log model.
In addition, we find the data normally distribute about the model because; e.g., 75 percent
of the data lie within one standard deviation and 94 percent within two standard deviations.
For a true normal distribution, 67 percent of the data lie between one standard deviation
and 95 percent between two.

2.5

1.5

R 0.5
s(R)
-0.5

-1.5

-2.5
300 320 340 360 380
T (K)

Figure 7.5. Student Residuals Plotted Versus Temperature for the Semi-Log Model Fit to
Data Sets Nos. 1 and 2

7.4. Making Decisions Based on Accuracy of Results

Once we have an adequate model, we desire to put closure on our model by making technical
or economical decisions based on our model's accuracy. This, of course, means we must first
decide how much risk we wish to take. If we gamble on horse racing or profession football
games, we might take large risks and bet on long shots or underdogs for the chance to reap
huge financial payoffs. But if we deal with the design of nuclear power plants, our desired
level of certainty increases astronomically; hence, engineers invoke in large safety factors,
which most certainly drive up the facilities capital and operating costs.
Returning to our butane lighter problem (Example 2.2), let us assume we have established
the vapor pressure's temperature dependence as the semi-log model given in Section 7.2.
Using these modeling results, we must now recommend the level air conditioning the
warehouse requires.

We know the model gives us the best estimate of the temperature where vapor pressure
will exceed 5 atm or 506 kPa. From Figure 7.4, we actually see a range of temperature
(confidence interval) might yield this stated vapor pressure. Therefore, we should select
the worst case situation or lowest temperature to insure with, in our case, a 99% confidence
level we do not exceed the pressure limit. From Figure 7.4, we find that we should air the
warehouse to 323.6 K or 123°F.

64
7.5. In Quest of the Correct Model: A Nonlinear Approach
The big assumption we have made in our search for the "correct model" revolves around the
transformation of the data into white noise—an error structure we assume as additive with
a zero mean and constant variance. Therefore, if we revert Eq. (7.4) back to its nonlinear
form

b 
p sat = b1 exp  2  (7.5)
 T 

how does the error structure transform or does this model have an error structure that
constitutes white noise? We might have a difficult time coming up with the answers to
these questions but we should keep them in mind as we search for the correct model.
Tables C.6-C.8 in Appendix C gives the regression results for Eq. (7.5). From the results in
Tables 7.6 and C.7, we constructed Table 7.11, an ANOVA table for Eq. (7.5), so we can
compare the nonlinear results to those for the linear models.

Table 7.11. ANOVA for Model: psat = b1 exp(b2/T) Using Data Set No. 1

SOURCE SS d.f. ms Fcal Ftab


TOTAL 5805461.9400 8
MODEL 5804726.2084 2
RESIDUAL 735.7316 6
Terms-Left-Out 298.8416 2 149.4208 1.37 7.71
Pure Error 436.89 4 109.2225

The F-Test for the TLO in Table 7.11 clearly shows the nonlinear as adequate. In addition,
the value for Fcal falls between 0.8 and 1.4, which satisfies the "acid test" criteria for the
TLO F-ratio. Therefore, we conclude that the nonlinear form describes the data better
than either the quadratic or semi-log model.

We have plotted the raw data and prediction results in Fig. 7.6 and the residuals in Fig. 7.7.
These results came from Table C.8 in Appendix C. A close, visual inspection of Figs. 7.6 and
7.7 does not reveal any unusual bias between the model and data, which strengthens our
argument that Eq. (7.5) more adequately describes the data than either the quadratic or
semi-log linear models.

65
1600

1200

Psat (kPa) 800

400

0
300 320 340 360 380

T(K)

Figure 7.6. Nonlinear Regression Results. The solid black circles represent the data while
the solid represents p sat = 2.72 × 10 6 exp ( −2790 T ) .

20

10

R (kPa) 0

-10

-20
300 320 340 360 380
T (K)

Figure 7.7. Residuals for Nonlinear Regression of Eq. (7.5) Using Data Set No.1

66
Appendix A.

Statistical Tables
This appendix contains Tables A.1-A.5 F-Distributions for α-risks of 0.25, 0.10, 0.05, 0.01,
and 0.001 and Table A.6 t-Distribution, with half the risk placed on the lower tail and half
on the upper tail of distribution.

67
Table A.1 F-Distribution for an α-risk of 0.25

F-Distribution Upper 25% Points [F(ν1,ν2,0.75)]


Degrees of Freedom for Numerator
ν1 ∞
ν2 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120

1 5.83 7.50 8.20 8.58 8.82 8.98 9.10 9.19 9.26 9.32 9.41 9.49 9.58 9.63 9.67 9.71 9.76 9.80 9.85
2 2.57 3.00 3.15 3.23 3.28 3.31 3.34 3.35 3.37 3.38 3.39 3.41 3.43 3.43 3.44 3.45 3.46 3.47 3.48
3 2.02 2.28 2.36 2.39 2.41 2.42 2.43 2.44 2.44 2.44 2.45 2.46 2.46 2.46 2.47 2.47 2.47 2.47 2.47
4 1.81 2.00 2.05 2.06 2.07 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08

5 1.69 1.85 1.88 1.88 1.89 1.89 1.89 1.89 1.89 1.89 1.89 1.89 1.88 1.88 1.88 1.88 1.87 1.87 1.87
6 1.62 1.76 1.78 1.79 1.79 1.78 1.78 1.78 1.77 1.77 1.77 1.76 1.76 1.75 1.75 1.75 1.74 1.74 1.74
7 1.57 1.70 1.72 1.72 1.71 1.71 1.70 1.70 1.69 1.69 1.68 1.68 1.67 1.67 1.66 1.66 1.65 1.65 1.65
8 1.54 1.66 1.67 1.66 1.66 1.65 1.64 1.64 1.63 1.63 1.62 1.62 1.61 1.60 1.60 1.59 1.59 1.58 1.58
9 1.51 1.62 1.63 1.63 1.62 1.61 1.60 1.60 1.59 1.59 1.58 1.57 1.56 1.56 1.55 1.54 1.54 1.53 1.53

10 1.49 1.60 1.60 1.59 1.59 1.58 1.57 1.56 1.56 1.55 1.54 1.53 1.52 1.52 1.51 1.51 1.50 1.49 1.48
11 1.47 1.58 1.58 1.57 1.56 1.55 1.54 1.53 1.53 1.52 1.51 1.50 1.49 1.49 1.48 1.47 1.47 1.46 1.45
12 1.46 1.56 1.56 1.55 1.54 1.53 1.52 1.51 1.51 1.50 1.49 1.48 1.47 1.46 1.45 1.45 1.44 1.43 1.42
13 1.45 1.55 1.55 1.53 1.52 1.51 1.50 1.49 1.49 1.48 1.47 1.46 1.45 1.44 1.43 1.42 1.42 1.41 1.40
14 1.44 1.53 1.53 1.52 1.51 1.50 1.49 1.48 1.47 1.46 1.45 1.44 1.43 1.42 1.41 1.41 1.40 1.39 1.38

15 1.43 1.52 1.52 1.51 1.49 1.48 1.47 1.46 1.46 1.45 1.44 1.43 1.41 1.41 1.40 1.39 1.38 1.37 1.36
16 1.42 1.51 1.51 1.50 1.48 1.47 1.46 1.45 1.44 1.44 1.43 1.41 1.40 1.39 1.38 1.37 1.36 1.35 1.34
17 1.42 1.51 1.50 1.49 1.47 1.46 1.45 1.44 1.43 1.43 1.41 1.40 1.39 1.38 1.37 1.36 1.35 1.34 1.33
18 1.41 1.50 1.49 1.48 1.46 1.45 1.44 1.43 1.42 1.42 1.40 1.39 1.38 1.37 1.36 1.35 1.34 1.33 1.32
19 1.41 1.49 1.49 1.47 1.46 1.44 1.43 1.42 1.41 1.41 1.40 1.38 1.37 1.36 1.35 1.34 1.33 1.32 1.30

20 1.40 1.49 1.48 1.47 1.45 1.44 1.43 1.42 1.41 1.40 1.39 1.37 1.36 1.35 1.34 1.33 1.32 1.31 1.29
21 1.40 1.48 1.48 1.46 1.44 1.43 1.42 1.41 1.40 1.39 1.38 1.37 1.35 1.34 1.33 1.32 1.31 1.30 1.28
22 1.40 1.48 1.47 1.45 1.44 1.42 1.41 1.40 1.39 1.39 1.37 1.36 1.34 1.33 1.32 1.31 1.30 1.29 1.28
23 1.39 1.47 1.47 1.45 1.43 1.42 1.41 1.40 1.39 1.38 1.37 1.35 1.34 1.33 1.32 1.31 1.30 1.28 1.27
24 1.39 1.47 1.46 1.44 1.43 1.41 1.40 1.39 1.38 1.38 1.36 1.35 1.33 1.32 1.31 1.30 1.29 1.28 1.26

25 1.39 1.47 1.46 1.44 1.42 1.41 1.40 1.39 1.38 1.37 1.36 1.34 1.33 1.32 1.31 1.29 1.28 1.27 1.25
26 1.38 1.46 1.45 1.44 1.42 1.41 1.39 1.38 1.37 1.37 1.35 1.34 1.32 1.31 1.30 1.29 1.28 1.26 1.25
27 1.38 1.46 1.45 1.43 1.42 1.40 1.39 1.38 1.37 1.36 1.35 1.33 1.32 1.31 1.30 1.28 1.27 1.26 1.24
28 1.38 1.46 1.45 1.43 1.41 1.40 1.39 1.38 1.37 1.36 1.34 1.33 1.31 1.30 1.29 1.28 1.27 1.25 1.24
29 1.38 1.45 1.45 1.43 1.41 1.40 1.38 1.37 1.36 1.35 1.34 1.32 1.31 1.30 1.29 1.27 1.26 1.25 1.23

30 1.38 1.45 1.44 1.42 1.41 1.39 1.38 1.37 1.36 1.35 1.34 1.32 1.30 1.29 1.28 1.27 1.26 1.24 1.23
40 1.36 1.44 1.42 1.40 1.39 1.37 1.36 1.35 1.34 1.33 1.31 1.30 1.28 1.26 1.25 1.24 1.22 1.21 1.19
60 1.35 1.42 1.41 1.38 1.37 1.35 1.33 1.32 1.31 1.30 1.29 1.27 1.25 1.24 1.22 1.21 1.19 1.17 1.15
120 1.34 1.40 1.39 1.37 1.90 1.82 1.77 1.72 1.68 1.65 1.60 1.55 1.48 1.45 1.41 1.37 1.32 1.13 1.10
∞ 1.32 1.39 1.37 1.35 1.33 1.77 1.72 1.67 1.63 1.60 1.55 1.49 1.42 1.38 1.34 1.30 1.24 1.08 1.00

68
Table A.2 F-Distribution for an α-risk of 0.10

F-Distribution Upper 10% Points [F(ν1,ν2,0.90)]


Degrees of Freedom for Numerator
ν1 ∞
ν2 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120

1 39.86 49.50 53.59 55.83 57.24 58.20 58.91 59.44 59.86 60.19 60.71 61.22 61.74 62.00 62.26 62.53 62.79 63.06 63.33
2 8.53 9.00 9.16 9.24 9.29 9.33 9.35 9.37 9.38 9.39 9.41 9.42 9.44 9.45 9.46 9.47 9.47 9.48 9.49
3 5.54 5.46 5.39 5.34 5.31 5.28 5.27 5.25 5.24 5.23 5.22 5.20 5.18 5.18 5.17 5.16 5.15 5.14 5.13
4 4.54 4.32 4.19 4.11 4.05 4.01 3.98 3.95 3.94 3.92 3.90 3.87 3.84 3.83 3.82 3.80 3.79 3.78 3.76

5 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 3.30 3.27 3.24 3.21 3.19 3.17 3.16 3.14 3.12 3.10
6 3.78 3.46 3.29 3.18 3.11 3.05 3.01 2.98 2.96 2.94 2.90 2.87 2.84 2.82 2.80 2.78 2.76 2.74 2.72
7 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2.72 2.70 2.67 2.63 2.59 2.58 2.56 2.54 2.51 2.49 2.47
8 3.46 3.11 2.92 2.81 2.73 2.67 2.62 2.59 2.56 2.54 2.50 2.46 2.42 2.40 2.38 2.36 2.34 2.32 2.29
9 3.36 3.01 2.81 2.69 2.61 2.55 2.51 2.47 2.44 2.42 2.38 2.34 2.30 2.28 2.25 2.23 2.21 2.18 2.16

10 3.29 2.92 2.73 2.61 2.52 2.46 2.41 2.38 2.35 2.32 2.28 2.24 2.20 2.18 2.16 2.13 2.11 2.08 2.06
11 3.23 2.86 2.66 2.54 2.45 2.39 2.34 2.30 2.27 2.25 2.21 2.17 2.12 2.10 2.08 2.05 2.03 2.00 1.97
12 3.18 2.81 2.61 2.48 2.39 2.33 2.28 2.24 2.21 2.19 2.15 2.10 2.06 2.04 2.01 1.99 1.96 1.93 1.90
13 3.14 2.76 2.56 2.43 2.35 2.28 2.23 2.20 2.16 2.14 2.10 2.05 2.01 1.98 1.96 1.93 1.90 1.88 1.85
14 3.10 2.73 2.52 2.39 2.31 2.24 2.19 2.15 2.12 2.10 2.05 2.01 1.96 1.94 1.91 1.89 1.86 1.83 1.80

15 3.07 2.70 2.49 2.36 2.27 2.21 2.16 2.12 2.09 2.06 2.02 1.97 1.92 1.90 1.87 1.85 1.82 1.79 1.76
16 3.05 2.67 2.46 2.33 2.24 2.18 2.13 2.09 2.06 2.03 1.99 1.94 1.89 1.87 1.84 1.81 1.78 1.75 1.72
17 3.03 2.64 2.44 2.31 2.22 2.15 2.10 2.06 2.03 2.00 1.96 1.91 1.86 1.84 1.81 1.78 1.75 1.72 1.69
18 3.01 2.62 2.42 2.29 2.20 2.13 2.08 2.04 2.00 1.98 1.93 1.89 1.84 1.81 1.78 1.75 1.72 1.69 1.66
19 2.99 2.61 2.40 2.27 2.18 2.11 2.06 2.02 1.98 1.96 1.91 1.86 1.81 1.79 1.76 1.73 1.70 1.67 1.63

20 2.97 2.59 2.38 2.25 2.16 2.09 2.04 2.00 1.96 1.94 1.89 1.84 1.79 1.77 1.74 1.71 1.68 1.64 1.61
21 2.96 2.57 2.36 2.23 2.14 2.08 2.02 1.98 1.95 1.92 1.87 1.83 1.78 1.75 1.72 1.69 1.66 1.62 1.59
22 2.95 2.56 2.35 2.22 2.13 2.06 2.01 1.97 1.93 1.90 1.86 1.81 1.76 1.73 1.70 1.67 1.64 1.60 1.57
23 2.94 2.55 2.34 2.21 2.11 2.05 1.99 1.95 1.92 1.89 1.84 1.80 1.74 1.72 1.69 1.66 1.62 1.59 1.55
24 2.93 2.54 2.33 2.19 2.10 2.04 1.98 1.94 1.91 1.88 1.83 1.78 1.73 1.70 1.67 1.64 1.61 1.57 1.53

25 2.92 2.53 2.32 2.18 2.09 2.02 1.97 1.93 1.89 1.87 1.82 1.77 1.72 1.69 1.66 1.63 1.59 1.56 1.52
26 2.91 2.52 .31 2.17 2.08 2.01 1.96 1.92 1.88 1.86 1.81 1.76 1.71 1.68 1.65 1.61 1.58 1.54 1.50
27 2.90 2.51 2.30 2.17 2.07 2.00 1.95 1.91 1.87 1.85 1.80 1.75 1.70 1.67 1.64 1.60 1.57 1.53 1.49
28 2.89 2.50 2.29 2.16 2.06 2.00 1.94 1.90 1.87 1.84 1.79 1.74 1.69 1.66 1.63 1.59 1.56 1.52 1.48
29 2.89 2.50 2.28 2.15 2.06 1.99 1.93 1.89 1.86 1.83 1.78 1.73 1.68 1.65 1.62 1.58 1.55 1.51 1.47

30 2.88 2.49 2.28 2.14 2.05 1.98 1.93 1.88 1.85 1.82 1.77 1.72 1.67 1.64 1.61 1.57 1.54 1.50 1.46
40 2.84 2.44 2.23 2.09 2.00 1.93 1.87 1.83 1.79 1.76 1.71 1.66 1.61 1.57 1.54 1.51 1.47 1.42 1.38
60 2.79 2.39 2.18 2.04 1.95 1.87 1.82 1.77 1.74 1.71 1.66 1.60 1.54 1.51 1.48 1.44 1.40 1.35 1.29
120 2.75 2.35 2.13 1.99 1.90 1.82 1.77 1.72 1.68 1.65 1.60 1.55 1.48 1.45 1.41 1.37 1.32 1.26 1.19
∞ 2.71 2.30 2.08 1.94 1.85 1.77 1.72 1.67 1.63 1.60 1.55 1.49 1.42 1.38 1.34 1.30 1.24 1.17 1.00

69
Table A.3 F-Distribution for an α-risk of 0.05

F-Distribution Upper 5% Points [F(ν1,ν2,0.95)]


Degrees of Freedom for Numerator
ν1 ∞
ν2 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120

1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 214.9 243.9 245.9 248.0 249.1 250.1 251.1 252.2 253.3 254.3
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 19.41 19.43 19.45 19.45 19.46 19.47 19.48 19.49 19.50
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.53
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.91 5.86 5.80 5.77 5.75 5.72 5.69 5.66 5.63

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.36
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.00 3.94 3.87 3.84 3.81 3.77 3.74 3.70 3.67
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.23
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.28 3.22 3.15 3.12 3.08 3.04 3.01 2.97 2.93
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.71

10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.91 2.85 2.77 2.74 2.70 2.66 2.62 2.58 2.54
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.79 2.72 2.65 2.61 2.57 2.53 2.49 2.45 2.40
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.69 2.62 2.54 2.51 2.47 2.43 2.38 2.34 2.30
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 2.60 2.53 2.46 2.42 2.38 2.34 2.30 2.25 2.21
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60 2.53 2.46 2.39 2.35 2.31 2.27 2.22 2.18 2.13

15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.48 2.40 2.33 2.29 2.25 2.20 2.16 2.11 2.07
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.42 2.35 2.28 2.24 2.19 2.15 2.11 2.06 2.01
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45 2.38 2.31 2.23 2.19 2.15 2.10 2.06 2.01 1.96
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.34 2.27 2.19 2.15 2.11 2.06 2.02 1.97 1.92
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.88

20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.28 2.20 2.12 2.08 2.04 1.99 1.95 1.90 1.84
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.25 2.18 2.10 2.05 2.01 1.96 1.92 1.87 1.81
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30 2.23 2.15 2.07 2.03 1.98 1.94 1.89 1.84 1.78
23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27 2.20 2.13 2.05 2.01 1.96 1.91 1.86 1.81 1.76
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 2.18 2.11 2.03 1.98 1.94 1.89 1.84 1.79 1.73

25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24 2.16 2.09 2.01 1.96 1.92 1.87 1.82 1.77 1.71
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.15 2.07 1.99 1.95 1.90 1.85 1.80 1.75 1.69
27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.20 2.13 2.06 1.97 1.93 1.88 1.84 1.79 1.73 1.67
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19 2.12 2.04 1.96 1.91 1.87 1.82 1.77 1.71 1.65
29 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.18 2.10 2.03 1.94 1.90 1.85 1.81 1.75 1.70 1.64

30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 2.09 2.01 1.93 1.89 1.84 1.79 1.74 1.68 1.62
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 2.00 1.92 1.84 1.79 1.74 1.69 1.64 1.58 1.51
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99 1.92 1.84 1.75 1.70 1.65 1.59 1.53 1.47 1.39
120 3.92 3.07 2.68 2.45 2.29 2.17 2.09 2.02 1.96 1.91 1.83 1.75 1.66 1.61 1.55 1.50 1.43 1.35 1.25
∞ 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.83 1.75 1.67 1.57 1.52 1.46 1.39 1.32 1.22 1.00

70
Table A.4. F-Distribution for an α-risk of 0.01

F-Distribution Upper 1% Points [F(ν1,ν2,0.99)]


Degrees of Freedom for Numerator
ν1 ∞
ν2 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120

1 4052 4999.50 5403 5625 5764 5859 5928 5982 6022 6056 6106 6157 6209 6235 6261 6287 6313 6339 6366
2 98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40 99.42 99.43 99.45 99.46 99.47 99.47 99.48 99.49 99.50
3 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 27.23 27.05 26.87 26.69 26.60 26.50 26.41 26.32 26.22 26.13
4 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55 14.37 14.20 14.02 13.93 13.84 13.75 13.65 13.56 13.46

5 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 9.89 9.72 9.55 9.47 9.38 9.29 9.20 9.11 9.02
6 13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.72 7.56 7.40 7.31 7.23 7.14 7.06 6.97 6.88
7 12.25 9.55 8.45 7.85 7.45 7.19 6.99 6.84 6.72 6.62 6.47 6.31 6.16 6.07 5.99 5.91 5.82 5.74 5.65
8 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.67 5.52 5.36 5.28 5.20 5.12 5.03 4.95 4.86
9 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 5.11 4.96 4.81 4.73 4.65 4.57 4.48 4.40 4.31

10 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.71 4.56 4.41 4.33 4.25 4.17 4.08 4.00 3.91
11 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 4.54 4.40 4.25 4.10 4.02 3.94 3.86 3.78 3.69 3.60
12 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30 4.16 4.01 3.86 3.78 3.70 3.62 3.54 3.45 3.36
13 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 4.10 3.96 3.82 3.66 3.59 3.51 3.43 3.34 3.25 3.17
14 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 3.94 3.80 3.66 3.51 3.43 3.35 3.27 3.18 3.09 3.00

15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.67 3.52 3.37 3.29 3.21 3.13 3.05 2.96 2.87
16 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.55 3.41 3.26 3.18 3.10 3.02 2.93 2.84 2.75
17 8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.46 3.31 3.16 3.08 3.00 2.92 2.83 2.75 2.65
18 8.29 6.01 5.09 4.58 4.25 4.01 3.84 3.71 3.60 3.51 3.37 3.23 3.08 3.00 2.92 2.84 2.75 2.66 2.57
19 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.30 3.15 3.00 2.92 2.84 2.76 2.67 2.58 2.49

20 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 3.23 3.09 2.94 2.86 2.78 2.69 2.61 2.52 2.42
21 8.02 5.78 4.87 4.37 4.04 3.81 3.64 3.51 3.40 3.31 3.17 3.03 2.88 2.80 2.72 2.64 2.55 2.46 2.36
22 7.95 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26 3.12 2.98 2.83 2.75 2.67 2.58 2.50 2.40 2.31
23 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21 3.07 2.93 2.78 2.70 2.62 2.54 2.45 2.35 2.26
24 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 3.17 3.03 2.89 2.74 2.66 2.58 2.49 2.40 2.31 2.21

25 7.77 5.57 4.68 4.18 3.85 3.63 3.46 3.32 3.22 3.13 2.99 2.85 2.70 2.62 2.54 2.45 2.36 2.27 2.17
26 7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.18 3.09 2.96 2.81 2.66 2.58 2.50 2.42 2.33 2.23 2.13
27 7.68 5.49 4.60 4.11 3.78 3.56 3.39 3.26 3.15 3.06 2.93 2.78 2.63 2.55 2.47 2.38 2.29 2.20 2.10
28 7.64 5.45 4.57 4.07 3.75 3.53 3.36 3.23 3.12 3.03 2.90 2.75 2.60 2.52 2.44 2.35 2.26 2.17 2.06
29 7.60 5.42 4.54 4.04 3.73 3.50 3.33 3.20 3.09 3.00 2.87 2.73 2.57 2.49 2.41 2.33 2.23 2.14 2.03

30 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.07 2.98 2.84 2.70 2.55 2.47 2.39 2.30 2.21 2.11 2.01
40 7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.89 2.80 2.66 2.52 2.37 2.29 2.20 2.11 2.02 1.92 1.80
60 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63 2.50 2.35 2.20 2.12 2.03 1.94 1.84 1.73 1.60
120 6.85 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 2.47 2.34 2.19 2.03 1.95 1.86 1.76 1.66 1.53 1.38
∞ 6.63 4.61 3.78 3.32 3.02 2.80 2.64 2.51 2.41 2.32 2.18 2.04 1.88 1.79 1.70 1.59 1.47 1.32 1.00

71
Table A.5. F-Distribution for an α-risk of 0.001

F-Distribution Upper 0.1% Points [F(ν1,ν2,0.999)]


Degrees of Freedom for Numerator
ν1 ∞
ν2 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120

1 4053 5000 5404 5625 5764 5859 5929 5981 6023 6056 6107 6158 6209 6235 6261 6287 6313 6340 6366
2 998.5 999.0 999.2 999.2 999.3 999.3 999.4 999.4 999.4 999.4 999.4 999.4 999.4 999.5 999.5 999.5 999.5 999.5 999.5
3 167.0 148.5 141.1 137.1 134.6 132.8 131.6 130.6 129.9 129.2 128.3 127.4 126.4 125.9 125.4 125.0 124.5 124.0 123.5
4 74.14 61.25 56.18 53.44 51.71 50.53 49.66 49.00 48.47 48.05 47.41 46.76 46.10 45.77 45.43 45.09 44.75 44.40 44.05

5 47.18 37.12 33.20 31.09 29.75 28.84 28.16 27.64 27.24 26.92 26.42 25.91 25.39 25.14 24.87 24.60 24.33 24.06 23.79
6 35.51 27.00 23.70 21.92 20.81 20.03 19.46 19.03 18.69 18.41 17.99 17.56 17.12 16.89 16.67 16.44 16.21 15.99 15.75
7 29.25 21.69 18.77 17.19 16.21 15.52 15.02 14.63 14.33 14.08 13.71 13.32 12.93 12.73 12.53 12.33 12.12 11.91 11.70
8 25.42 18.49 15.83 14.39 13.49 12.86 12.40 12.04 11.77 11.54 11.19 10.84 10.48 10.30 10.11 9.92 9.73 9.53 9.33
9 22.86 16.39 13.90 12.56 11.71 11.13 10.70 10.37 10.11 9.89 9.57 9.24 8.90 8.72 8.55 8.37 8.19 8.00 7.81

10 21.04 14.91 12.55 11.28 10.48 9.92 9.52 9.20 8.96 8.75 8.45 8.13 7.80 7.64 7.47 7.30 7.12 6.94 6.76
11 19.69 13.81 11.56 10.35 9.58 9.05 8.66 8.35 8.12 7.92 7.63 7.32 7.01 6.85 6.68 6.52 6.35 6.17 6.00
12 18.64 12.97 10.80 9.63 8.89 8.38 8.00 7.71 7.48 7.29 7.00 6.71 6.40 6.25 6.09 5.93 5.76 5.59 5.42
13 17.81 12.31 10.21 9.07 8.35 7.86 7.49 7.21 6.98 6.80 6.52 6.23 5.93 5.78 5.63 5.47 5.30 5.14 4.97
14 17.14 11.78 9.73 8.62 7.92 7.43 7.08 6.80 6.58 6.40 6.13 5.85 5.56 5.41 5.25 5.10 4.94 4.77 4.60

15 16.59 11.34 9.34 8.25 7.57 7.09 6.74 6.47 6.26 6.08 5.81 5.54 5.25 5.10 4.95 4.80 4.64 4.47 4.31
16 16.12 10.97 9.00 7.94 7.27 6.81 6.46 6.19 5.98 5.81 5.55 5.27 4.99 4.85 4.70 4.54 4.39 4.23 4.06
17 15.72 10.66 8.73 7.68 7.02 6.56 6.22 5.96 5.75 5.58 5.32 5.05 4.78 4.63 4.48 4.33 4.18 4.02 3.85
18 15.38 10.39 8.49 7.46 6.81 6.35 6.02 5.76 5.56 5.39 5.13 4.87 4.59 4.45 4.30 4.15 4.00 3.84 3.67
19 15.08 10.16 8.28 7.26 6.62 6.18 5.85 5.59 5.39 5.22 4.97 4.70 4.43 4.29 4.14 3.99 3.84 3.68 3.51

20 14.82 9.95 8.10 7.10 6.46 6.02 5.69 5.44 5.24 5.08 4.82 4.56 4.29 4.15 4.00 3.86 3.70 3.54 3.38
21 14.59 9.77 7.94 6.95 6.32 5.88 5.56 5.31 5.11 4.95 4.70 4.44 4.17 4.03 3.88 3.74 3.58 3.42 3.26
22 14.38 9.61 7.80 6.81 6.19 5.76 5.44 5.19 4.99 4.83 4.58 4.33 4.06 3.92 3.78 3.63 3.48 3.32 3.15
23 14.19 9.47 7.67 6.69 6.08 5.65 5.33 5.09 4.89 4.73 4.48 4.23 3.96 3.82 3.68 3.53 3.38 3.22 3.05
24 14.03 9.34 7.55 6.59 5.98 5.55 5.23 4.99 4.80 4.64 4.39 4.14 3.87 3.74 3.59 3.45 3.29 3.14 2.97

25 13.88 9.22 7.45 6.49 5.88 5.46 5.15 4.91 4.71 4.56 4.31 4.06 3.79 3.66 3.52 3.37 3.22 3.06 2.89
26 13.74 9.12 7.36 6.41 5.80 5.38 5.07 4.83 4.64 4.48 4.24 3.99 3.72 3.59 3.44 3.30 3.15 2.99 2.82
27 13.61 9.02 7.27 6.33 5.73 5.31 5.00 4.76 4.57 4.41 4.17 3.92 3.66 3.52 3.38 3.23 3.08 2.92 2.75
28 16.50 8.93 7.19 6.25 5.66 5.24 4.93 4.69 4.50 4.35 4.11 3.86 3.60 3.46 3.32 3.18 3.02 2.86 2.69
29 13.39 8.85 7.12 6.19 5.59 5.18 4.87 4.64 4.45 4.29 4.05 3.80 3.54 3.41 3.27 3.12 2.97 2.81 2.64

30 13.29 8.77 7.05 6.12 5.53 5.12 4.82 4.58 4.39 4.24 4.00 3.75 3.49 3.36 3.22 3.07 2.92 2.81 2.64
40 12.61 8.25 6.60 5.70 5.13 4.73 4.44 4.21 4.02 3.87 3.64 3.40 3.15 3.01 2.87 2.73 2.57 2.41 2.23
60 11.97 7.76 6.17 5.31 4.76 4.37 4.09 3.87 3.69 3.54 3.31 3.08 2.83 2.69 2.55 2.41 2.25 2.08 1.89
120 11.38 7.32 5.79 4.95 4.42 4.04 3.77 3.55 3.38 3.24 3.02 2.78 2.53 2.40 2.26 2.11 1.95 1.76 1.54
∞ 10.83 6.91 5.42 4.62 4.10 3.74 3.47 3.27 3.10 2.96 2.74 2.51 2.27 2.13 1.99 1.84 1.66 1.45 1.00

72
Table A.6. t-Distribution

"Probability = Area in Two Tails of Distribution Outside ±t-value in Table"


Degrees of
Freedom 0.9 0.7 0.5 0.3 0.2 0.1 0.05 0.02 0.01 0.001

1 0.158 0.510 1.000 1.963 3.078 6.314 12.706 31.821 63.657 636.619
2 0.142 0.445 0.816 1.386 1.886 2.920 4.303 6.965 9.925 31.598
3 0.137 0.424 0.765 1.250 1.638 2.353 3.182 4.541 5.841 12.924
4 0.134 0.414 0.741 1.190 1.533 2.132 2.776 3.747 4.604 8.610
5 0.132 0.408 0.727 1.156 1.476 2.015 2.571 3.365 4.032 6.869

6 0.131 0.404 0.718 1.134 1.440 1.943 2.447 3.143 3.707 5.959
7 0.130 0.402 0.711 1.119 1.415 1.895 2.365 2.998 3.499 5.480
8 0.130 0.399 0.706 1.108 1.397 1.860 2.306 2.896 3.355 5.041
9 0.129 0.398 0.703 1.100 1.383 1.833 2.262 2.821 3.250 4.781
10 0.129 0.397 0.700 1.093 1.372 1.812 2.228 2.764 3.169 4.587

11 0.129 0.396 0.697 1.088 1.363 1.796 2.201 2.718 3.106 4.437
12 0.128 0.395 0.695 1.083 1.356 1.782 2.179 2.681 3.055 4.318
13 0.128 0.394 0.694 1.079 1.350 1.771 2.160 2.650 3.012 4.221
14 0.128 0.393 0.692 1.076 1.345 1.761 2.145 2.624 2.977 4.140
15 0.128 0.393 0.691 1.074 1.341 1.753 2.131 2.602 2.947 4.073

16 0.128 0.392 0.690 1.071 1.337 1.746 2.120 2.583 2.921 4.015
17 0.128 0.392 0.689 1.069 1.333 1.740 2.110 2.567 2.898 3.965
18 0.127 0.392 0.688 1.067 1.330 1.734 2.101 2.552 2.878 3.922
19 0.127 0.391 0.688 1.066 1.328 1.729 2.093 2.539 2.861 3.883
20 0.127 0.391 0.687 1.064 1.325 1.725 2.086 2.528 2.845 3.850

21 0.127 0.391 0.686 1.063 1.323 1.721 2.080 2.518 2.831 3.819
22 0.127 0.390 0.686 1.061 1.321 1.717 2.074 2.508 2.819 3.792
23 0.127 0.390 0.685 1.060 1.319 1.714 2.069 2.500 2.807 3.767
24 0.127 0.390 0.685 1.059 1.318 1.711 2.064 2.492 2.797 3.745
25 0.127 0.390 0.684 1.058 1.316 1.708 2.060 2.485 2.787 3.725

26 0.127 0.390 0.684 1.058 1.315 1.706 2.056 2.479 2.779 3.707
27 0.127 0.389 0.684 1.057 1.314 1.703 2.052 2.473 2.771 3.690
28 0.127 0.389 0.683 1.056 1.313 1.701 2.048 2.467 2.763 3.674
29 0.127 0.389 0.683 1.055 1.311 1.699 2.045 2.462 2.756 3.659
30 0.127 0.389 0.683 1.055 1.310 1.697 2.042 2.457 2.750 3.646

40 0.126 0.388 0.681 1.050 1.303 1.684 2.021 2.423 2.701 3.551

60 0.126 0.387 0.679 1.046 1.296 1.671 2.000 2.390 2.660 3.460

120 0.126 0.386 0.677 1.041 1.289 1.658 1.980 2.358 2.617 3.373

∞ 0.126 0.385 0.674 1.036 1.282 1.645 1.960 2.326 2.576 3.291

73
Appendix B.

Samples SAS Programs


This appendix contains two sample SAS programs. The first program performs the
following tasks:
• reads in the raw data TF and PSIA into the data set VAPOR;

• echoes the raw data using PROC PRINT; converts raw data to variables TK and PKP with
SI units and calculate three new variables TK2 (T2), INVT (1/T), and LNPKP (ln psat);

• echoes these new calculated variables using a second call to PROC PRINT; and

• finally tests three linear models using PROC REG multiple linear regression procedure.

p sat = b0 + b1T (B-1)

p sat = b0 + b1T + b2T 2 (B-2)

ln p sat = b0 + b1 T (B-3)

The second program tests a nonlinear model using PROC NLIN nonlinear regression
procedure.
p sat = b0 exp ( b1 T ) (B-4)

It also creates two new variables PKHAT (predicted response) and PKRES (residual) and
outputs the actual and predicted responses along with the residuals using PROC PRINT.

74
B.1. SAS Program for Linear Regression

TITLE3 '----------------- VAPOR PRESSURE OF N-BUTANE -----------------';*


| |
| THE VAPOR PRESSURE OF N-BUTANE IS MODELED AS A FUNCTION OF |
| TEMPERATURE. THE RAW DATA HAVE UNITS OF: |
| |
| TEMPERATURE - DEGREES FAHRENHEIT |
| PRESSURE - PSIA |
| |
| THE FOLLOWING LINEAR MODELS ARE TESTED USING PROC REG: |
| |
| P = B0 + B1 * T (1) |
| |
| P = B0 + B1 * T + B2 * T ** 2 (2) |
| |
| LN(P) = B0 + B1 / T (3) |
| |
| WHERE P = VAPOR PRESSURE (KILOPASCALS) |
| T = ABSOLUTE TEMPERATURE (K) |
| |
*-----------------------------------------------------------------------;

*************************************************************************
******************** SET-UP DATA FILE NAMED VAPOR *******************
************************************************************************;

DATA VAPOR;

***** INPUT TEMPERATURE (F) AND PRESSURE (PSIA);

INPUT TF PSIA;

***** CALCULATE DATA FROM RAW INPUT DATA. THESE DATA


WILL ALSO BE CONTAINED IN DATA FILE VAPOR *****;

TK = (TF + 459.67) / 1.8; * CONVERT TEMPERATURE TO KELVIN;

TK2 = TK * TK; * CREATE SQUARE OF ABSOLUTE TEMPERATURE;

PKP = 6.8947573 * PSIA; * CONVERT FROM PSI TO KILOPASCALS;

INVT = 1.0 / TK; * INVERT TEMPERATURE;

LNPKP = LOG(PKP); * LOG TRANSFORM OF SI PRESSURE;

CARDS;

100.0 50.76
100.0 51.90
128.0 74.05
128.0 76.77
161.0 122.0
161.0 119.6
200.0 196.3
200.0 194.3
;

75
*************************************************************************
************** THIS PROCEDURE ECHOES THE RAW INPUT DATA *************
************************************************************************;

PROC PRINT;

TITLE5 'RAW VAPOR PRESSURE DATA'; * PUT TITLE ON LINE 5 OF SAS OUTPUT;

VAR TF PSIA; * RESTRICTS PROC PRINT TO TF AND PSIA;

*************************************************************************
************* THIS PROCEDURE ECHOES THE CALCULATED DATA *************
************************************************************************;

PROC PRINT;

TITLE5 'CALCULATED DATA'; * PUT NEW TITLE ON LINE 5;

VAR TK TK2 INVT PKP LNPKP; * RESTRICTS PROC PRINT TO CALCULATED DATA;

*************************************************************************
******** THIS PROCEDURE PERFORMS MULTIPLE LINEAR REGRESSION *********
************************************************************************;
| |
| THE OPTIONS USED ARE R AND CLM. THESE PRODUCE A RESIDUAL AND |
| 95% CONFIDENCE INTERVAL ON THE PREDICTION. THREE LINEAR MODELS |
| ARE TESTED FOR THE DATA SET VAPOR. |
| |
************************************************************************;

PROC REG DATA = VAPOR;

TITLE5 'LINEAR REGRESSION', * PUTS NEW TITLE ON LINE 5;

ID TK; * IDENTIFIES OBSERVATION BY TEMPERATURE;

EQ1: MODEL PKP = TK / R CLM; * LINEAR TEMP DEPENDENCE MODEL;

EQ2: MODEL PKP = TK TK2 / R CLM; * QUADRATIC TEMP DEPENDENCE MODEL;

EQ3: MODEL LNPKP = INVT / R CLM; * SEMILOG INVERSE TEMP MODEL;

76
B.2. SAS Program for Nonlinear Regression

TITLE3 '----------------- VAPOR PRESSURE OF N-BUTANE -----------------';*


| |
| THE VAPOR PRESSURE OF N-BUTANE IS MODELED AS A FUNCTION OF |
| TEMPERATURE. THE RAW DATA HAVE UNITS OF: |
| |
| TEMPERATURE - DEGREES FAHRENHEIT |
| PRESSURE - PSIA |
| |
| THE FOLLOWING NONLINEAR MODEL IS TESTED USING PROC NLIN: |
| |
| P = B0 * EXP(B1 / T) (1) |
| |
| WHERE P = VAPOR PRESSURE (KILOPASCALS) |
| T = ABSOLUTE TEMPERATURE (K) |
| |
*-----------------------------------------------------------------------;

*************************************************************************
******************** SET-UP DATA FILE NAMED VAPOR *******************
************************************************************************;

DATA VAPOR;

***** INPUT TEMPERATURE (F) AND PRESSURE (PSIA);

INPUT TF PSIA;

***** CALCULATE DATA FROM RAW INPUT DATA. THESE DATA


WILL ALSO BE CONTAINED IN DATA FILE VAPOR *****;

TK = (TF + 459.67) / 1.8; * CONVERT TEMPERATURE TO KELVIN;

PKP = 6.8947573 * PSIA; * CONVERT FROM PSI TO KILOPASCALS;

CARDS;

100.0 50.76
100.0 51.90
128.0 74.05
128.0 76.77
161.0 122.0
161.0 119.6
200.0 196.3
200.0 194.3
;

77
*************************************************************************
*********** THIS PROCEDURE PERFORMS NONLINEAR REGRESSION ************
************************************************************************;
| |
| THE OPTIONS USED ARE R AND CLM. THESE PRODUCE A RESIDUAL AND |
| 95% CONFIDENCE INTERVAL ON THE PREDICTION. THREE LINEAR MODELS |
| ARE TESTED FOR THE DATA SET VAPOR. |
| |
************************************************************************;

PROC NLIN DATA = VAPOR

METHOD = MARQUARDT

CONVERGE = 1.0E-10;

***** INITIALIZE STARTING PARAMETERS *****;

PARAMETERS B0 = +2.5E+06

B1 = -2.5E+03;

**** NONLINEAR MODEL ****;

MODEL PKP = B0 * EXP(B1/TK);

**** ANALYTICAL PARTIAL DERIVATIVES WITH RESPECT TO PARAMETERS ****;

DER.B0 = EXP(B1/TK);

DER.B1 = (B0/TK) * EXP(B1/TK);

**** SET UP RESIDUALS FOR PRINT PROCEDURE ****;

OUTPUT PREDICTED = PKHAT

RESIDUAL = PKRES;

**** PRINT RESIDUALS FOR NONLINEAR REGRESSION ****;

PROC PRINT;

TITLE5 ' RESIDUALS FOR NONLINEAR REGRESSION';

VAR PKP PKHAT PKRES;

78
Appendix C.

SAS Outputs

This appendix contains the SAS outputs for the two programs given in Appendix B. Each
table corresponds to a preformatted SAS output generated by calling a SAS procedure.
Tables C.1-C.5 give the output for the linear regression program. Tables C.6-C.8 give the
results of the nonlinear regression program.

79
C.1. Linear Regression Results

Table C.1. SAS Output for PROC PRINT ran on the raw data in data set VAPOR.

THE SAS SYSTEM 08:05 MONDAY, OCTOBER 23, 2000 1

----------------- VAPOR PRESSURE OF N-BUTANE -----------------

RAW PRESSURE DATA

OBS TF PSIA

1 100 50.76
2 100 51.90
3 128 74.05
4 128 76.77
5 161 122.00
6 161 119.60
7 200 196.30
8 200 194.30

Table C.2. SAS Output for PROC PRINT ran on the calculated data in data set VAPOR.

THE SAS SYSTEM 08:05 Monday, October 23, 2000 2

----------------- VAPOR PRESSURE OF N-BUTANE -----------------

CALCULATED DATA

OBS TK TK2 INVT PKP LNPKP

1 310.928 96676.08 .0032162 349.98 5.85787


2 310.928 96676.08 .0032162 357.84 5.88008
3 326.483 106591.37 .0030629 510.56 6.23550
4 326.483 106591.37 .0030629 529.31 6.27158
5 344.817 118898.53 .0029001 841.16 6.73478
6 344.817 118898.53 .0029001 824.61 6.71491
7 366.483 134310.03 .0027286 1353.44 7.21041
8 366.483 134310.03 .0027286 1339.65 7.20016

Table C.3. SAS Output for PROC REG Using Eq. B.1 (linear model) ran on the data shown in Table C.2.

The SAS System 08:05 MONDAY, OCTOBER 23, 2000 3

LINEAR REGRESSION

Model: EQ1
Dependent Variable: PKP

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Prob>F

Model 1 1115458.0732 1115458.0732 233.332 0.0001


Error 6 28683.42175 4780.57029
C Total 7 1144141.495

Root MSE 69.14167 R-square 0.9749


Dep Mean 763.32500 Adj R-sq 0.9708
C.V. 9.05796

Parameter Estimates

Parameter Standard T for H0:


Variable DF Estimate Error Parameter=0 Prob > |T|

INTERCEP 1 -5299.770072 397.67626616 -13.327 0.0001


TK 1 17.983376 1.17729268 15.275 0.0001

Dep Var Predict Std Err Lower95% Upper95% Std Err Student Cook's
Obs TK PKP Value Predict Mean Mean Residual Residual Residual -2-1-0 1 2 D

1 310.9 350.0 291.3 39.403 194.8 387.7 58.7386 56.815 1.034 | |** | 0.257
2 310.9 357.8 291.3 39.403 194.8 387.7 66.5386 56.815 1.171 | |** | 0.330
3 326.4 510.6 570.0 27.527 502.6 637.4 -59.4037 63.426 -0.937 | *| | 0.083
4 326.4 529.3 570.0 27.527 502.6 637.4 -40.7037 63.426 -0.642 | *| | 0.039
5 344.8 841.2 900.9 26.052 837.2 964.6 -59.6978 64.046 -0.932 | *| | 0.072
6 344.8 824.6 900.9 26.052 837.2 964.6 -76.2978 64.046 -1.191 | **| | 0.117
7 366.5 1353.4 1291.1 42.326 1187.6 1394.7 62.2629 54.672 1.139 | |** | 0.389
8 366.5 1339.7 1291.1 42.326 1187.6 1394.7 48.5629 54.672 0.888 | |* | 0.236

Sum of Residuals 0
Sum of Squared Residuals 28683.4218
Predicted Resid SS (Press) 53298.1310

80
Table C.4. SAS Output for PROC REG Using Eq. B.2 (quadratic model) ran on the data shown
in Table C.2.

The SAS System 08:05 MONDAY, OCTOBER 23, 2000 4

LINEAR REGRESSION

Model: EQ2
Dependent Variable: PKP

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Prob>F

Model 2 1143687.7747 571843.88733 6301.722 0.0001


Error 5 453.72035 90.74407
C Total 7 1144141.495

Root MSE 9.52597 R-square 0.9996


Dep Mean 763.32500 Adj R-sq 0.9994
C.V. 1.24796

Parameter Estimates

Parameter Standard T for H0:


Variable DF Estimate Error Parameter=0 Prob > |T|

INTERCEP 1 14619 1130.6243832 12.930 0.0001


TK 1 -99.972703 6.68965942 -14.944 0.0001
TK2 1 0.173974 0.00986369 17.638 0.0001

Dep Var Predict Std Err Lower95% Upper95% Std Err Student Cook's
Obs TK PKP Value Predict Mean Mean Residual Residual Residual -2-1-0 1 2 D

1 310.9 350.0 353.1 6.462 336.5 369.7 -3.0811 6.999 -0.440 | | | 0.055
2 310.9 357.8 353.1 6.462 336.5 369.7 4.7189 6.999 0.674 | |* | 0.129
3 326.4 510.6 522.0 4.667 510.0 534.0 -11.4419 8.305 -1.378 | **| | 0.200
4 326.4 529.3 522.0 4.667 510.0 534.0 7.2581 8.305 0.874 | |* | 0.080
5 344.8 841.2 831.1 5.341 817.4 844.9 10.0675 7.888 1.276 | |** | 0.249
6 344.8 824.6 831.1 5.341 817.4 844.9 -6.5325 7.888 -0.828 | *| | 0.105
7 366.5 1353.4 1347.0 6.637 1330.0 1364.1 6.3555 6.833 0.930 | |* | 0.272
8 366.5 1339.7 1347.0 6.637 1330.0 1364.1 -7.3445 6.833 -1.075 | **| | 0.363

Sum of Residuals 3.39E-11


Sum of Squared Residuals 453.7203
Predicted Resid SS (Press) 1089.5608

Table C.5. SAS Output for PROC REG Using Eq. B.3 (semilog model) ran on the data shown
in Table C.2.
The SAS System 08:05 MONDAY, OCTOBER 23, 2000 5

LINEAR REGRESSION

Model: EQ3
Dependent Variable: LNPKP

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Prob>F

Model 1 2.01105 2.01105 4664.789 0.0001


Error 6 0.00259 0.00043
C Total 7 2.01364

Root MSE 0.02076 R-square 0.9987


Dep Mean 6.51317 Adj R-sq 0.9985
C.V. 0.31879

Parameter Estimates

Parameter Standard T for H0:


Variable DF Estimate Error Parameter=0 Prob > |T|

INTERCEP 1 14.714661 0.12030588 122.310 0.0001


INVT 1 -2754.734789 40.33330109 -68.299 0.0001

Dep Var Predict Std Err Lower95% Upper95% Std Err Student Cook's
Obs TK LNPKP Value Predict Mean Mean Residual Residual Residual -2-1-0 1 2 D

1 310.9 5.8579 5.8541 0.012 5.8245 5.8838 0.00379 0.017 0.225 | | | 0.013
2 310.9 5.8800 5.8541 0.012 5.8245 5.8838 0.0258 0.017 1.532 | |*** | 0.607
3 326.4 6.2356 6.2749 0.008 6.2550 6.2948 -0.0393 0.019 -2.058 | ****| | 0.383
4 326.4 6.2716 6.2749 0.008 6.2550 6.2948 -0.00335 0.019 -0.176 | | | 0.003
5 344.8 6.7348 6.7253 0.008 6.7058 6.7448 0.00954 0.019 0.498 | | | 0.021
6 344.8 6.7149 6.7253 0.008 6.7058 6.7448 -0.0104 0.019 -0.542 | *| | 0.025
7 366.5 7.2104 7.1983 0.012 7.1679 7.2287 0.0120 0.017 0.724 | |* | 0.146
8 366.5 7.2002 7.1983 0.012 7.1679 7.2287 0.00187 0.017 0.112 | | | 0.004

Sum of Residuals 0
Sum of Squared Residuals 0.0026
Predicted Resid SS (Press) 0.0044

81
C.2. Nonlinear Regression Results
Table C.6. SAS Output for PROC NLIN Using Eq. B.4 (exponential model) ran on the data
shown in Table C.2.

This table gives the summary of iterations.

The SAS System 16:39 Monday, October 23, 2000 1

----------------- VAPOR PRESSURE OF N-BUTANE -----------------

Non-Linear Least Squares Iterative Phase Dependent Variable PKP Method: Marquardt
Iter B0 B1 Sum of Squares
0 2500000 -2500.000000 6853564
1 1886665 -2596.345573 232859
2 2113365 -2697.323920 2103.800517
3 2395519 -2747.628319 1325.161344
4 2675807 -2786.120284 967.280723
5 2717592 -2789.407585 735.790565
6 2718105 -2789.438810 735.731578
7 2718109 -2789.439353 735.731578
NOTE: Convergence criterion met.

Table C.7. SAS Output for PROC NLIN Using Eq. B.4 (exponential model) ran on the data
shown in Table C.2.

This table gives the summary of sum of squares and parameter estimates and their 95%
confidence intervals.

Non-Linear Least Squares Summary Statistics Dependent Variable PKP

Source DF Sum of Squares Mean Square

Regression 2 5804726.2084 2902363.1042


Residual 6 735.7316 122.6219
Uncorrected Total 8 5805461.9400

(Corrected Total) 7 1144141.4950

Parameter Estimate Asymptotic Asymptotic 95 %


Std. Error Confidence Interval
Lower Upper
B0 2718109.200 256668.75807 2090062.9206 3346155.4790
B1 -2789.439 33.42429 -2871.2257 -2707.6530

Asymptotic Correlation Matrix

Corr B0 B1
----------------------------------------
B0 1 -0.998814773
B1 -0.998814773 1

Table C.8. SAS Output for PROC NLIN that shows the actual data, prediction (model) and
residuals.

The SAS System 16:39 Monday, October 23, 2000 2

----------------- VAPOR PRESSURE OF N-BUTANE -----------------

residuals for nonlinear regression

OBS TK PKP PKHAT PKRES

1 310.9 350.0 344.92 5.0830


2 310.9 357.8 344.92 12.8830
3 326.4 510.6 528.14 -17.5449
4 326.4 529.3 528.14 1.1551
5 344.8 841.2 833.33 7.8733
6 344.8 824.6 833.33 -8.7267
7 366.5 1353.4 1345.37 8.0283
8 366.5 1339.7 1345.37 -5.6717

82

You might also like