ENCH 225 Spring 2015 Course Packet

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 185

University of Maryland Baltimore County

ENCH 225: Chemical Engineering Problem Solving and Experimental Design


Course Packet
Dr. Joshua Enszer
Spring 2015
version 1.82 last edit 16 January 2015

This course packet was written expressly for use in ENCH 225 at the University of Maryland Baltimore
County. Please do not reproduce any part of this document for use outside of the chemical engineering
program without express permission from the author.

Several examples in probability and statistics are adapted from examples in Larson and Farbers
Elementary Statistics, 2nd edition. The dimensional analysis comes mostly from Fox and McDonalds
Introduction to Fluid Mechanics, 7th edition. Revised examples are planned for future versions.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported


License.
How to Use this Course Packet

A typical chapter in this course packet will include:

Required Reading/Viewing: These are references to specific chapters or sections in the required text for
ENCH 225 or web links to online tutorials or other videos of interest. Items listed as required should be
viewed before or while working through the content of the chapter.

Recommended Reading for Further Consideration: These are references to sections or chapters of books
that are required or references for either ENCH 215 or ENCH 225. Some (probably not all) are available
in the library. In many cases, they present very similar information in another way, so if a particular topic
seems confusing, perhaps the extra perspective will help. Further, many recommendations will go more
in depth on topics (including providing many more examples), so if you are especially interested in that
particular topic or tool, this is my way of giving you more information or ideas with which to challenge
yourself. In most cases, the recommendations include several more examples and practice problems.

Learning Objectives associated with that chapter directly reference those from the course syllabus, and
in some cases break those objectives down in more detail.

Sections break down the major topics and tools. Many sections include worked out examples and more
examples with which to practice.

Explorations finish up several chapters of the course packet and correspond to the computer lab
activities for that week.

Optional chapters and sections include details (but unfortunately currently sometimes unfinished
thoughts) that will not be included in exams, but are meant to provide more detail/connections among
the required topics. Most notably, the chapter on probability and combinatorics is linked to statistics,
but you can complete the statistical tests without a very deep understanding of those concepts.

i
Contents
1 Degree of Freedom Analysis ............................................................................................................ 1
1.1 Degree of Freedom Analysis for Extensive Variables ................................................................. 2
1.1.1 Extensive Variable Material Balances ................................................................................ 2
1.1.2 Extensive Variable Atomic Balances .................................................................................. 3
1.1.3 Extensive Variable Energy Balances .................................................................................. 4
1.1.4 Practice Problems............................................................................................................. 4
1.2 Degree of Freedom Analysis with both Extensive and Intensive Variables................................. 7
2 Systems of Linear Equations ............................................................................................................ 8
2.1 Matrix Operations .................................................................................................................... 9
2.1.1 Practice Problems........................................................................................................... 10
2.2 Matrix Determinant and Rank ................................................................................................ 10
2.2.1 Practice Problems........................................................................................................... 11
2.3 Systems of Linear Equations ................................................................................................... 11
2.3.1 Practice Problems........................................................................................................... 13
2.4 The Condition Number ........................................................................................................... 13
2.5 Chemical Reactions ................................................................................................................ 15
2.5.1 Practice Problems........................................................................................................... 17
2.6 Steady-State Material Balances .............................................................................................. 18
2.6.1 Practice Problems........................................................................................................... 20
2.7 MATLAB Exploration: Linear Algebra ...................................................................................... 24
2.7.1 Practice Problems........................................................................................................... 33
3 Descriptive Statistics and Basic Probability Theory ......................................................................... 35
3.1 Descriptive Statistics .............................................................................................................. 52
3.1.1 Frequency Distributions .................................................................................................. 52
3.1.2 Measures of Central Tendency ....................................................................................... 53
3.1.3 Measures of Variation .................................................................................................... 54
3.1.4 Measures of Position ...................................................................................................... 54
3.1.5 Practice Problems........................................................................................................... 55
3.2 Introduction to Probability ..................................................................................................... 55
3.2.1 Practice Problem ............................................................................................................ 57
3.3 (Optional Section) More on Probability Theory and Combinatorics ......................................... 57
3.4 (Optional Section) Computations with Probability Distributions ............................................. 58
3.4.1 Binomial Probability Distributions................................................................................... 59
3.4.2 Uniform and Normal Probability Distributions ................................................................ 60

ii
3.4.3 Normal Approximation of Binomial Distribution ............................................................. 61
3.4.4 The Central Limit Theorem ............................................................................................. 62
3.4.5 Calculations with Probability Distributions ...................................................................... 62
3.4.6 Probability Examples ...................................................................................................... 63
4 Algorithmic Thinking ...................................................................................................................... 35
4.1 The Six Operations of a Computer .......................................................................................... 35
4.1.1 Practice Problems........................................................................................................... 37
4.2 Flowcharts ............................................................................................................................. 38
4.2.1 Practice Problems........................................................................................................... 40
4.3 MATLAB Exploration: Scripts, Functions, and Visualization ..................................................... 41
4.3.1 Practice Problems........................................................................................................... 51
5 Introduction to Inferential Statistics............................................................................................... 66
5.1 Confidence Intervals .............................................................................................................. 66
5.1.1 Confidence Intervals on the Mean .................................................................................. 67
5.1.2 (Optional) Confidence Intervals on Proportions .............................................................. 69
5.1.3 Confidence Intervals on Variances .................................................................................. 70
5.1.4 Practice Problem ............................................................................................................ 71
5.2 Single Sample Hypothesis Testing ........................................................................................... 71
5.2.1 Method 1: Computing a Standardized Critical Value ( is known) ................................... 73
5.2.2 Method 2: Computing an Observed Level of Significance ( unknown) ........................... 74
5.2.3 Statistical Computations for Hypothesis Tests: An Overview ........................................... 74
5.2.4 Hypothesis Testing for a Mean ....................................................................................... 75
5.2.5 (Optional) Hypothesis Testing for a Proportion ............................................................... 76
5.2.6 Hypothesis Testing for a Variance or Standard Deviation ................................................ 77
5.2.7 Hypothesis Testing Summary .......................................................................................... 78
5.2.8 Practice Problems........................................................................................................... 79
5.3 Two-Sample Hypothesis Testing ............................................................................................. 81
5.3.1 Differences Between Means ........................................................................................... 82
5.3.2 (Optional) Differences Between Proportions .................................................................. 84
5.3.3 Differences Between Variances ...................................................................................... 85
5.3.4 Practice Problems........................................................................................................... 86
6 Iterative Programming Applications ............................................................................................... 87
6.1 Root-Finding Methods............................................................................................................ 89
6.1.1 Closed Methods ............................................................................................................. 89
6.1.2 Open Methods ............................................................................................................... 90

iii
6.2 (Optional Section) Optimization Methods .............................................................................. 92
6.3 MATLAB Exploration: Applications of Loops and Conditionals................................................. 93
6.3.1 Practice Problems......................................................................................................... 100
7 Curve Fitting ................................................................................................................................ 102
7.1 Correlation and Causation .................................................................................................... 102
7.1.1 Practice Problem .......................................................................................................... 103
7.2 Hypothesis Testing for the Regression Coefficient ................................................................ 103
7.3 Regression ........................................................................................................................... 104
7.3.1 Linear Regression ......................................................................................................... 105
7.3.2 Confidence Intervals and Hypothesis Testing on Linear Regression Parameters ............ 108
7.3.3 (Optional-ish) Polynomial Regression ........................................................................... 110
7.3.4 (Optional-ish) Multiple Linear Regression ..................................................................... 111
7.3.5 General Linear Least Squares Theory ............................................................................ 112
7.3.6 Practice Problems......................................................................................................... 114
7.4 Interpolation and Splines ..................................................................................................... 119
7.5 Interpolation versus Regression ........................................................................................... 121
7.5.1 Practice Problems......................................................................................................... 122
7.6 Excel and MATLAB Exploration: Statistics and Curve Fitting Tools ......................................... 123
7.6.1 Practice Problem .......................................................................................................... 131
8 Numerical Integration and Differentiation ................................................................................... 132
8.1 Integration Methods ............................................................................................................ 133
8.1.1 Practice Problem .......................................................................................................... 134
8.2 Taylor Series and Numerical Derivatives ............................................................................... 135
8.3 First-Order Ordinary Differential Equations .......................................................................... 137
8.3.1 Eulers Method ............................................................................................................. 138
8.3.2 Higher-Order Runge-Kutta Methods ............................................................................. 138
8.4 Systems of First-Order Ordinary Differential Equations......................................................... 139
8.5 Boundary Value Problems .................................................................................................... 140
8.5.1 Shooting Methods ........................................................................................................ 141
8.5.2 Finite Difference Methods ............................................................................................ 142
8.6 MATLAB Exploration: Differential Equations and Advanced Visualization ............................. 143
8.6.1 Practice Problems......................................................................................................... 151
9 Dimensional Analysis and Similarity ............................................................................................. 154
9.1 Rendering Equations Dimensionless ..................................................................................... 155
9.2 The Buckingham Method ..................................................................................................... 157

iv
9.2.1 Practice Problems......................................................................................................... 163
9.3 Similarity .............................................................................................................................. 163
9.3.1 Practice Problems......................................................................................................... 164
10 The Scientific Method and Experimental Design .......................................................................... 166
10.1 The Engineering Method and the Scientific Method ............................................................. 166
10.2 One-Factor Experiments ....................................................................................................... 169
10.3 Analysis of Variance (Multi-Sample Test for Means) ............................................................. 169
10.3.1 Practice Problems......................................................................................................... 173
10.4 Two-Factor Experiments and Two-Way Analysis of Variance ................................................ 174
10.4.1 Practice Problem .......................................................................................................... 177
10.5 Factorial Design.................................................................................................................... 177
10.6 Model Selection ................................................................................................................... 179

v
1 Degree of Freedom Analysis
Recommended for Further Consideration:

Section 2.4 of Murphys Introduction to Chemical Processes, Principles, Analysis, and Synthesis, 1st
edition. This section most closely matches the discussion in this chapter of our course packet.

Sections 4.3d and 6.2 of Felder and Rousseaus Elementary Principles of Chemical Processes, 3rd edition.
Note that their version of degree of freedom analysis is significantly different from mine they only
count items they absolutely dont know as variables and then find appropriate equations to solve for
those unknowns. This works, but if youre not a seasoned problem solver, you can sometimes miss an
equation and not know where to go from there.

Sections 2.7, 10.2, and 13.8 of Smith, Van Ness, and Abbotts Introduction to Chemical Engineering
Thermodynamics, 6th edition. This gets into way more of the calculus of the application, which well not
touch in this course.

Chapter 2 of Seborg et al.s Process Dynamics and Control, 3rd edition. This is a very dense summary of
chemical engineering mathematical models and applications.

Chapter Goals

Determine the number of variables and equations in a chemical engineering problem. This is a helpful
strategy in many chemical engineering courses, because it can provide insight into the way a problem is
solved but it is especially important in process control and design where the degrees of freedom is
almost always positive, meaning you must recognize where you get to make design decisions.

Identify the types of equations you must write to solve a chemical engineering problem. This will help
in your efforts to write a system of equations. Later in this course well talk about how to solve them.

A degree of freedom analysis provides information that is important to designing and/or solving
engineering problems. By performing a degree of freedom analysis, you can determine what type of
engineering problem you are working on. Problems with zero degrees of freedom are typically
encountered in introductory engineering courses, because this is the case in which there is one unique
solution to a problem. In practice, engineering problems actually have a positive number of degrees of
freedom, and each degree of freedom means another design decision!

Degree of freedom analysis differs slightly depending on whether you are focusing on extensive
quantities or also considering intensive quantities. An extensive quantity gives information regarding the
scale of a process. Examples of extensive quantities include mass, moles, and volume, or their
corresponding flow rates. Intensive variables are independent of the scale or size of the engineering
system. Commonly encountered examples of intensive variables are temperature, pressure, and
concentration (e.g. mole fraction).

In a sense, degree of freedom analysis is comparing the number of variables in your system to the
number of constraints (each constraint can be written mathematically as an equation). The number of
degrees of freedom in engineering problems is the number of variables minus the number of

1
independent equations. Independence is covered more in Chapter 2, but for now, recall that
independent equations cannot be derived by adding or subtracting combinations of other equations.

If the number of degrees of freedom for a system is zero, then there is usually a unique solution to your
problem you have the same number of equations as unknowns.

If the number of degrees of freedom is negative, then you have an overspecified system. Provided you
have done degree of freedom analysis correctly (double-check to make sure you have only counted
independent equations!), there are two possibilities in this situation. If we are lucky, one of our
equations is redundant; that is, its inclusion does not change the solution to the problem. The tricky
thing here is there is no telling which equation is redundant. More likely, one of our equations is
contradictory, and there is no solution to this overspecified problem.

If the number of degrees of freedom is positive, then you have an underspecified system. This is
common in engineering. An underspecified system means that at least one variable can be used as a
design parameter (mathematically speaking, the variable is arbitrary and may be set to any value; once
all arbitrary variables are declared, there is one solution to the problem). This becomes important in
process analysis and process design, because our choice in design variables can change the behavior
(and profitability!) of our chemical system.

The purpose of this chapter is only to identify the number of degrees of freedom and classify the kinds
of equations or relationships that should be used to solve chemical engineering problems. In future
chapters, we will discuss actually solving such systems of equations.

1.1 Degree of Freedom Analysis for Extensive Variables

1.1.1 Extensive Variable Material Balances

In degree of freedom analysis of chemical engineering systems, there are three sources of extensive
variables:

Process stream components. Each stream has a corresponding flowrate for each component in the
stream. This flowrate is an extensive variable mass, moles, or volume. You should be able to
translate between mass, molar, and volumetric flow rates using molecular weights, specific gravity, or
relevant equations of state, so each component in each stream provides exactly one variable.

Chemical reactions. Each reaction has an extent of reaction an extensive variable that describes how
much of a reaction has occurred. Extents of reaction are virtually always molar flowrates.

Accumulations of components. If a system is at steady-state, recall that the accumulation of each


component within a system is zero. However, if a system is not at steady state, the accumulation term in
the balance equation is nonzero, introducing a new variable in our system. This new variable usually is
the initial condition of our system; recall from calculus that indefinite integrals require an arbitrary
integration constant (the infamous +C in your calculus homework). This arbitrary constant is (derived
from) our variable!

2
The types of equations we can write on our extensive variables can be classified as one of three kinds:

Specified streams (often also referred to as a basis or set of basis equations). This is the direct
declaration of the flowrate of one component in one stream.

Other specifications, of which there are two main types:

Stream composition specifications. This is any equation that describes the composition
(concentrations) of an individual stream. Examples of stream composition specifications
are usually mass or mole fractions. The maximum number of independent stream
composition specifications in a stream containing N components is N-1, because if you
know the mole fractions for N-1 components in a single stream, you automatically know
the mole fraction of the last component (its the one minus all the other mole fractions).

System performance specifications. This is any equation that describes a relationship


between one stream component and any other variable in the system. This is kind of a
catch-all category, because it can be related to a number of concepts: conversion,
reaction rate, product yield, reaction selectivity, separation coefficients, etc.

Balance equations. This is the backbone of your introduction to material and energy balance course:

One such equation can be written for each component in your system. Recall that input and output refer
to material flows across a system boundary (stuff is actually entering or leaving the system), and
generation and consumption refer to material within a system boundary (stuff is actually being made
or used up inside the system). Input and output are effectively flow terms, while generation and
consumption are effectively chemical reaction terms.

One balance equation can be written per component per system. In the case of multiple-unit
operations, multiple systems can be defined, so long as they are independent (for example, you can
write material balances on each individual unit, or the overall system and all but one unit). Well
consider the idea of the energy balance later this chapter, but this too counts as one equation.

1.1.2 Extensive Variable Atomic Balances

Some chemical engineering problems are more easily solved by accounting for atoms entering and
leaving a system, rather than accounting for flows of actual chemical components. This is certainly a
valid approach for solving chemical reaction problems, but it slightly changes our degree of freedom
analysis:

There are no variables from reaction rates. This is because, unlike chemical compounds, atoms are not
being generated or consumed; theyre basically swapping locations. Extent of reaction does not appear
as an explicit variable out of atomic balances.

3
Extra stream composition specifications arise from chemical compounds. For example, a stream of water
becomes a stream of hydrogen and oxygen where the mole fraction of hydrogen is 2/3 and the mole
fraction of oxygen is 1/3. These specifications only apply in the case of atomic balances.

1.1.3 Extensive Variable Energy Balances

The introduction of energy considerations to engineering systems does not particularly change our
degree of freedom analysis generally including energy on top of a material balance DOF analysis
means just adding one variable (the amount of energy being added or removed from the system) and
one equation (the general energy balance).

However, its possible to much more greatly complicate degree of freedom analyses by considering
thermodynamic degrees of freedom, as introduced by Gibbs phase rule. This is considered a little bit in
ENCH 300, and its even more important in ENCH 442, because degree of freedom analysis is needed to
figure out our options for controlling a system.

1.1.4 Practice Problems

1) A blending process is being developed to dilute a dangerous acid solution (assume the mixture is
binary, acid and water) of unknown mole fraction using a second acid-water stream that is 1%
acid by mass. The system operates at steady state. What should be the flow rate of the 1% acid
solution in this system?
(a) Conduct a degree of freedom analysis on this process to determine how many more details
must be specified in order for this process to be completely characterized.
(b) Propose at least one set of design decisions (specifications) that reduce this process to zero
degrees of freedom in such a way that there is a solution to the problem.
(c) Propose at least one set of design decisions (specifications) that reduce this process to zero
degrees of freedom in such a way that there is no solution to the problem. (You can be as
specific as necessary to demonstrate there is no solution.)
(d) If the system is not assumed to be at steady state, the number of degrees of freedom in this
process increases by two. What new variables are introduced by assuming we are not at steady
state?

For each scenario below, perform an extensive degree of freedom analysis (do not solve). Specify the
sources of each variable and equation. Designate the problem as underspecified, completely specified,
or overspecified. [Some of these are adapted from Murphys Introduction to Chemical Processes,
Principles, Analysis, and Synthesis,1st edition.]

2) A gas mixture of hydrogen and nitrogen is fed to a reactor where they react to form ammonia.
The nitrogen flow rate into the reactor is 150 mol/h and the hydrogen is fed at a ratio of 4 moles
H2 per mole of N2. Of the nitrogen fed to the reactor, 30% exits in the outlet stream (the rest
reacts). [DOF = 0]

3) You are designing a mixer that produces 200 kg/day of battery acid. Raw materials are three
streams one is pure water, one is concentrated sulfuric acid (77% acid by mass in water), and

4
one is dilute sulfuric acid (4.5% acid by mass in water). The product of the mixer is 18.6% acid in
water. [DOF = 1]

4) A researcher is making LEDs. She puts a 1 cm2 chip of aluminum oxide in a reactor, where
gallium nitride will deposit as the result of a reaction between trimethyl gallium and ammonia.
The reactants are in their stoichiometric ratio and carried in with an inert gas. This inert gas
sweeps out the reaction byproduct, methane. (The reaction goes to completion and there is no
(CH3)3Ga or NH3 exiting.) The researcher wants to estimate the grown of GaN on the chip. [DOF
= 2 this one is tricky! Try writing out the equations to see why.]

5) Salad greens are washed to remove debris before being packaged for sale. One facility
processed 1500 1-pound packages of greens per day. The freshly-picked greens contain 1 lbm
debris per 12 lbm of greens. The greens are mixed with 150 gallons of water per day and washed,
then spun-dry to separate the dirty water from the greens. This process removes 99.9% of the
debris an all the wash-water from the greens. The dirty water is dumped into a nearby river, but
due to environmental constraints, is limited to a maximum of 4 barrels per day, with a maximum
debris content of 1.5% by volume. [DOF = 0 if you got a negative number, why do you think
that happened? Which specifications are actual specifications and which are constraints?]

6) A bioreactor system consists of three units: first, a stream of water is combined with a stream of
solid particles of nutrient in a mixing tank. The amount of nutrient supplied to the plant is 100
grams per day, but the flow rate of water is not automatically prescribed. The nutrient solution
is then fed directly into the bioreactor, preloaded with 100 micrograms of algae, but thanks to
the nutrient, it grows, basically according to the reaction

Algae + Nutrient 2 Algae + Waste

Exiting the bioreactor, then, is a stream of algae and water (with dissolved nutrients and waste).
This stream is fed into a separator, where 100% of the algae and some of the water are
separated from the rest of the material. After some time, the system will reach steady-state,
where nothing is accumulated in any unit.

5
7) Today is your first day working as an engineer for the Fictitious Chemical Conglomerate and you
are ready to make your mark on the world! Today also happens to be the day the FCC is
experiencing problems in its Food Science division the premade smoothie processing units are
not meeting the target set forth by management. Your supervisor has given you the following
specifications to the primary mixer.

Feed streams into mixer (three streams):


Premade fruit and yogurt mixture (35% fruit), fed at a rate of 100 kg/hr
Raw oats
Honey

Output mixture specifications (single stream):


30% fruit
10% oats
2.5% honey

Use degree of freedom analysis to help explain to your supervisor why the target set forth by
management cannot be met.

8) Methane is fed to a burner at a rate of 10 gmol/s, where it undergoes both complete and
incomplete combustion. The stack gas is fed to a condenser, where 1 gmol/s of water exits. The
system is at steady state.

(a) How many more pieces of information are required for this scenario to be completely
characterized, such that all material flow rates are known?
(b) Provide an example of extra information that would reduce this scenario to having zero degrees
of freedom such that you can solve the problem.
(c) Provide an example of extra information that would reduce this scenario to having zero degrees
of freedom such that it is actually impossible to solve the problem (hint: make it so the solution
is physically impossible negative flow rates, mole fractions greater than 1, etc.)

6
1.2 Degree of Freedom Analysis with both Extensive and Intensive Variables

Recall from the Gibbs phase rule that a stream can be completely specified according to

where the number of thermodynamics degrees of freedom d.f. is a function of the number of chemical
components c and phases . We have already been partially applying Gibbs phase rule to a stream in
the extensive variable DOF analysis, we said that a stream required c equations to completely
characterize it on a mass or molar basis. Gibbs phase rule says we actually need 2- more equations
than that to completely characterize it on both a mass/molar basis and an energy basis, simultaneously.
These equations are always specifications such as a declaration of the temperature or pressure of the
stream, or an observation that a stream is in equilibrium with another (such as a saturated vapor in
equilibrium with a saturated liquid).

Intensive DOF analysis will prove most useful when you take a course in process control, because this
analysis will help you to determine how to most effectively design a process such that you are able to
maintain it at the desired temperature, pressure, flow rate, and more.

7
2 Systems of Linear Equations
Required Reading/Viewing:

Chapter 3 of the free version of the MATLAB Interactive Tutorial published by Mathworks at
http://www.mathworks.com/academia/student_center/tutorials/mltutorial_launchpad.html. You can
check out Chapter 2 if you want, but note that you cant get by just clicking through menus in ENCH 225.

The first two sections of the MATLAB Linear Algebra video at


http://www.mathworks.com/academia/student_center/tutorials/computational-math-tutorial-
launchpad.html. Introduction to Linear Algebra, Solving Linear Systems (about 18 minutes).

Recommended for Further Consideration:

Any linear algebra textbook will provide much more detail on this topic.

Chapters 1-2, 3.1, 3.2, and 5.1.1 in Prataps Getting Started with MATLAB. First skim these sections to
see how much you are comfortable with. If youre comfortable, focus on 1.7, 3.1, 3.2, and 5.1.1. If youre
new to MATLAB or uncomfortable with it, work through 2.1, 2.2, and 2.6 this week.

Chapter 1 and Sections 12.1-12.2.1 in Attaways MATLAB: A Practical Introduction to Programming and
Problem Solving.

Sections 3.3-3.4 of Murphys Introduction to Chemical Processes, Principles, Analysis, and Synthesis, 1st
edition. These sections provide more examples of linear algebra as it is applied to chemical engineering
problems.

Chapter Goals

Perform basic matrix arithmetic. Your ability to add, subtract, and multiply matrices will help in your
understanding of MATLAB programming.

Pose a chemical engineering problem as a system of linear equations.

Determine the properties of the matrix equation Ax=b including rank and condition number. Finding
the rank of a linear system should confirm the results of degree of freedom analysis. The condition
number will give us confidence in our computed numerical solutions to a matrix equation.

Linear equations arise in chemical engineering primarily in two distinct applications. They can be used to
evaluate chemical reactions by numerically determining stoichiometric coefficients. Linear equations
also appear in chemical engineering analysis in the case of single-phase steady-state material balances
(and some special cases of multi-phase steady-state material balances and/or energy balances). In this
chapter we will follow up on our use of degree of freedom analysis, now focusing on the creation and
solution of linear systems of equations.

8
2.1 Matrix Operations

First, lets establish a few crucial details. A matrix is an array of values. Matrices can be classified by their
dimensions the number of rows and number of columns in a matrix. A matrix with one row and one
column is a scalar value. Sometimes you will hear about vectors, which can be represented as a matrix
with one dimension of length one (a row vector has just one row; a column vector has just one column).

Matrix arithmetic works a little differently than scalar arithmetic. In order to add and subtract matrices,
they must have the same dimensions. Multiplication is a little more complicated. In order to multiply
matrices, the number of columns of the left-hand matrix must equal the number of rows in the right-
hand matrix. The product of these two matrices will have the same number of rows as the left matrix
and the same number of columns as the right.

The dot product of two vectors is the sum of the products of the elements of the two vectors. Keeping
with matrix multiplication rules, the left-hand vector L should be 1 row by n columns, and the right-hand
vector R should be n rows by 1 column. Using Xi to denote the ith element of vector X, the dot product is

Matrix multiplication is basically a bunch of dot product calculations. If


two matrices A and B are multiplied together to make matrix C, the Lets go to the Video!
element in the ith row and jth column of C is the dot product of the ith
row of A and the jth column of B. Click here to see a little more
information regarding matrix
In this course we will be concerned with a system of linear equations, arithmetic using MATLAB.
which can be rewritten as a single matrix equation of the form

where A is a matrix of coefficients on the variables contained in the column vector x.

The solution to this system of equations depends on the number of degrees of freedom in our problem.
If our system of equations has zero degrees of freedom, then the solution to this system is represented
by

which is usually easily computed using programs like MATLAB. If our system has a positive number of
degrees of freedom (the system is underspecified or underdetermined), then additional equations must
be designed to reduce the degrees of freedom to zero. Later in this
chapter, when we look at reaction stoichiometry, well see that all Lets go to the Video!
balanced chemical reactions have one degree of freedom, which is
addressed by simply picking a value for one stoichiometric coefficient Click here to see another
in the reaction. application of linear algebra, with
demonstrations of matrix
If our system has a negative number of degrees of freedom, which equations.
often happens in the laboratory, where multiple trials are conducted

9
to find a single piece of information, then we must use more advanced linear algebra to solve this
system. In this chapter, we will address some zero and positive DOF problems; well reserve the topic of
overspecified or overdetermined systems for Section 7.3.

2.1.1 Practice Problems

For each of the matrix multiplication problems below,

1) Evaluate the result. If the multiplication is not valid, explain why.


2) Transpose the right-hand matrix and evaluate the result. If the multiplication is not valid, explain
why.

2.2 Matrix Determinant and Rank

There is a lot of terminology specific to linear algebra to describe matrices and associated matrix algebra
problems. Our aim here is to look at just a few of them so that we know just enough to be dangerous
we simply want to know how we can tell that we have a complete system of linearly independent
equations to solve.

The determinant of a matrix can be used to tell if a corresponding system of linear equations is
independent or not (recall from ENCH 215 that a system of linear equations is independent if there is no
way to write one equation as the sum of multiples of the others). If the linear system is independent,
then the determinant will be nonzero, and if the system is not independent, then the determinant will
be zero.

The determinant of a 2x2 or 3x3 matrix is simple enough that we can write it in equation form.

There is a more general formulation for an n-by-n matrix, but this isnt a linear algebra course, so for
large systems, we can just use the det command in MATLAB. Because there are a lot of computations
necessary to find the determinant of a large matrix, most computer programs try to find a compromise
between number of computations (which corresponds to how long it takes to get the result!) and the
accuracy of the result. In most cases, this is fine, but it does mean that we need to take our results with
a grain of salt if the computer tells us that determinant of a large matrix is close to zero, it might
actually be zero. Then we need to resort to other means (see the next section).

10
But, if we know the determinant of a matrix is zero, we then want to know how many linearly
independent equations we actually have. Thats where the rank comes in it is the count of the number
of linearly independent equations! When a square matrix has a rank equal to the number of rows and
columns, then we say the system is of full rank (and corresponds to a system with zero degrees of
freedom). We will rely on MATLAB to compute matrix rank for us also, using the rank command.

2.2.1 Practice Problems

1) Compute the following determinants by hand. Check your work using MATLAB.

2) If a matrix is not square, but has m rows and n columns, what is the largest possible rank this
matrix could have? Explain.

3) If the determinant of a matrix is zero, what can be said about the rank of this matrix?

2.3 Systems of Linear Equations

The reason we care so much about matrices is that any system of linear equations can be expressed as a
single matrix equation. Armed with the right computational tools (certain calculators or computer
programs), it can be much faster to solve a single matrix equation than to solve a system of
simultaneous linear equations.

Let x={x1,x2,x3} be the set of unknown values that satisfies the following system of equations:

This is equivalent to the following single matrix equation:

I realize these letters with subscripts may not be the most exciting thing to see but I do want to point
out that each value aij is specifically subscripted such that i is the row (or equation!) and j is the
column (or corresponding x-value!).

11
Typically we write matrix equations in shorthand, so for the matrix above, we would define the
following vectors and matrices like the following:

So the resulting matrix equation can simply be written

The solution to this equation is found using matrix arithmetic. There is no such thing as matrix
division, only multiplication, but most square matrices A have an inverse, A -1, such that their product is
the identity matrix, I, which is a matrix of all zeroes except ones down the diagonal.

The inverse also has the special property that it is one of the few matrices that can be multiplied on
either the left or the right and the result is the same.

We will not worry about how to compute the inverse of a matrix in this course (this topic is part of any
traditional linear algebra course). However, you will need to know how to use MATLAB to find this
inverse, or if the inverse of a matrix is given, you should be able to put it to good use such as solving
the equation we have posed here. The solution to Ax=b is

where A, x, and b are the matrices/vectors as defined previously.

In the next section, well explore another matrix property that can seriously affect the results of
computations.

12
2.3.1 Practice Problems

1) Write the system of linear equations that is equivalent to this matrix equation:

2) Write the matrix equation that is equivalent to this system of linear equations:

3) Demonstrate that the matrix is the inverse of the matrix by

multiplying them both together, in both directions (left-multiply and right-multiply).

4) Using the information from the previous problem, determine the solution to the matrix
equation below by hand. Check your response using MATLAB.

2.4 The Condition Number

Not all matrix equations have a solution, and not all matrix equations have solutions that are easy to
compute numerically. If you take a course in linear algebra, you will learn in more detail why, but for the
case of this course, we will count on computer programs to let us know how reliable our results are.

The condition number of a matrix provides an estimate on how inaccurate the solution x=A -1b is when
substituted back into the problem Ax=b. This may seem especially surprising depending on your
mathematical history how can the answer to a problem, when substituted back into the problem, not
give us the original result? The condition number depends on the relative scales of the eigenvalues of a
matrix (Eigenvalues are the characteristic values that define a matrix. They are not important to
compute in ENCH 225, but well need them in ENCH 442). If these eigenvalues differ greatly in
magnitude, then it can be mathematically impossible to solve a matrix equation to any accuracy you
want (it is similar to trying to solve a problem down to the nearest billionth when you are only given
rough integer estimates of the values to begin with).

If a matrix has a small condition number, this means that any measurement error in the values of the
matrix will have a small effect on the solution to the problem. If a matrix has a large condition number,
then an error in the values of the matrix might change a result entirely. In the case of very large
condition numbers, computers can wind up giving completely incorrect answers (usually MATLAB will
recognize this is happening and warn you that the answer may be wrong).

13
Example. The matrix A in the matrix equation Ax=b below has a condition number of about 100,000.

The solution to this system is x1=1, x2=-1, and x3=0. If you use a calculator or MATLAB to solve the
system, it should get this result with little effort.

However, if those vector values were the result of some experiment and had a little error in them, how
much do you trust your solution for x? The condition number tells us that we shouldnt trust it much:

The solution to the system

which is the same problem as before except one value has been adjusted by 0.1%, has solution x 1=0.443,
x2=-0.39, x3=0.002. We adjusted one value in the entire problem, and the solutions changed by over 50%
from their original values!

Even scarier is adjusting a value inside the matrix itself, even by 0.01%...

The solution to this problem is roughly x1=11.8, x2=-12.8, x3=-0.04. And even rounding the solution here
the way these numbers are presented results in a lot of error. If we were to take our rounded x-values
and recompute the right-hand vector, we would obtain

which is at least 24% away from the original values of that right-hand vector.

Moral of the story? Watch your condition number. Even if you have a nice zero degree of freedom
problem (matrix of full rank), it may have a very sensitive solution. This will come back into play when
we talk about curve fitting in Chapter 0.

We will rely on computer software to compute matrix condition numbers for us. It is not entirely
complicated, but we need to invest our energy elsewhere.

14
2.5 Chemical Reactions

Linear algebra can prove to be a useful tool in chemistry and chemical engineering for dealing with
chemical reactions. You have certainly spent your share of time balancing chemical reactions in your
chemistry courses, but in cases of numerous or complicated chemical compounds and elements, it can
be time-consuming to get it right.

A balanced chemical reaction is one in which the number of atoms of reactants is the same as the
number of atoms of products. In many cases, reaction stoichiometry is straightforward, and you have
learned strategies to balance them. For example, with organic chemical reactions, you have used rules
of thumb to help balance components in a particular order. Some basic examples of balanced chemical
reactions include

The numbers in front of each chemical compound are their stoichiometric coefficients (denoted by the
Greek letter nu, which is written as ). Note that these coefficients are relative quantities in each case,
one coefficient may be chosen arbitrarily, and the rest are then determined. For example, the following
two reactions are equivalent:

We could even write stoichiometric coefficients of , , and ,


respectively, if we wanted. Here, we will adopt a convention of a negative stoichiometric coefficient
when a compound is being consumed (is a reactant) and a positive coefficient when a compound is
being produced (is a product). This will be important in setting up our linear algebra problems.

Again, in balancing a chemical reaction, the number of atoms on one side of the reaction must be the
same as the number of atoms on the other side. Adopting our new convention of negative
stoichiometric coefficients for reactants, we can write for the reaction where hydrogen and oxygen
gases react to produce water two atomic balances, one on H and one on O:

H:
O:

The coefficients in front of the stoichiometric coefficients correspond to the number of atoms in that
particular element (hydrogen gas and water both have 2 hydrogen atoms; oxygen gas has 2 oxygen
atoms but water only has 1). This is a small system of linear equations that we can solve. Rewriting as a
matrix, we obtain

15
This system of equations has only two equations but three unknowns. This is an underdetermined (or
underspecified) system, a system with one degree of freedom. That means we are free to specify one
stoichiometric coefficient and the rest are determined. If we choose (remember, its a
reactant, so it needs to be negative!) and rearrange our equations slightly, we obtain

which has solution and . We can choose whatever value we want for and the
other coefficients are appropriately determined.

Lets try a slightly more complicated example the combustion of methane. We already know the
reaction is

But lets once again pretend we dont know how those coefficients and instead write a system of atomic
balances.

C:
H:
O:

We have three equations and four unknowns, so we get to design one more equation. Lets say
so our system of linear equations becomes

C:
H:
O:

Or, in terms of matrices,

We can pretty easily solve this one by substitution, or we can take the inverse of the left matrix and
multiply it by the column vector on the right. For more complicated problems, it can be helpful to write
the atomic matrix like this and solve using matrix algebra.

Look at the last matrix equation again. Do you notice anything interesting about the coefficients in the
matrix? We know that the three rows correspond to the three atom balances (in this example, in order,
C, H, then O). What do the columns correspond to? Each column corresponds to a different chemical
compound by matrix multiplication, the first column gets multiplied against the oxygen gas coefficient,
the second column against the water coefficient, and the last column against the carbon dioxide

16
coefficient. The elements of this square matrix correspond to the number of atoms in that compound.
This means the first row, first column is 0 because there are no carbon atoms in oxygen gas. The second
row, second column is 2 because there are 2 hydrogen atoms in water! The third row counts the atoms
of oxygen in oxygen gas, water, and carbon dioxide, respectively.

Look at the column vector on the right. Notice that there is a 1 in the carbon row, 4 in the hydrogen row,
and 0 in the oxygen row. This corresponds to the atoms in methane, CH 4!

This leads us to a general strategy for writing systems of linear equations to balance chemical reactions.

1. List the atomic elements involved in the reaction. There will be one row in your matrix and
column vector for each element.
2. Choose one reactant to serve as a basis. Write the number of atoms in this compound as a
column vector for the right-hand side of the equation. This reactant will have a stoichiometric
coefficient of -1 in our solution.
3. The column vector of stoichiometric coefficients is our unknown, x. List the compounds in the
order they appear in the reaction.
4. Carefully fill in the left-hand matrix by noting the number of atoms in each compound. The row
should correspond to the element and the column should correspond to the compound.
5. Solve for x using matrix algebra.

Lets try one more example to put this idea to work the complete combustion of carbon disulfide in
oxygen to form carbon dioxide and sulfur dioxide.

First we note the elements involved are C, O, and S. Well use CS2 as our basis compound, so it will
appear in the column vector in our matrix system. The rest of the compounds will appear as columns in
the left-hand matrix:

We can solve this system by inverting the left-hand matrix and left-multiplying it against the right-hand
vector to obtain the rest of our stoichiometric coefficients. In this case, we get -3, 1, and 2, so

2.5.1 Practice Problems

1) Show that the matrix equation for the carbon disulfide combustion reaction can be derived from
the atomic balances on carbon, oxygen, and sulfur.

2) Repeat the analysis for the combustion of methane from this section, except instead of using
one mole of methane as the basis, use one mole of oxygen gas. How does your result differ?

17
3) Use the analysis of this section to determine the balanced chemical reaction representing the
complete combustion of phenol (C6H5OH).

4) Use the analysis of this section to balance this chemical reaction:

2.6 Steady-State Material Balances

Many chemical engineering problems, especially in the beginning of your academic career, can be
written and solved as a system of linear equations. This is because of the nature of the overall balance
equation:

The terms of the overall balance equation are often straightforward either constants, unknowns, or
the product of a constant and unknown. The sum of these items results in a linear equation.
Accumulation is the term that potentially results in a balance equation being a differential equation
(well deal with those in Chapter 8), but when there is no accumulation, the equation is algebraic.

The most straightforward way to write systems of linear systems is to treat mass or molar flowrates as
the unknown variables in a problem. Then, simply write one linear equation for each source of
equations determined in degree of freedom analysis. This often results in a lot of linear equations, but
they are a lot of simple linear equations that can easily be solved in some sequential or simultaneous
manner.

In addition to material balances, equations may come from a basis or specified stream (the easiest kind
of equation to write, since its in the form n=constant), a stream specification (which may wind up being
a small system of equations that relates many mass/molar rates in a single stream), or a system
specification (which relates a mass/molar rate in one stream to mass/molar rates in another stream,
and/or extents of reaction, when applicable).

As general rule, whenever there is a reaction, its almost always easier to work with molar flowrates
(and extents of reaction will have the same units as these flowrates). When there is no reaction, either
mass or moles are fine to work with.

The nice thing about treating each individual molar flowrate and each individual reactions extent as
variables is that, at steady state, our material balance equation becomes

where ij is the stoichiometric coefficient for chemical compound i in reaction j and j is the extent of
reaction j. All of these terms have units of moles, and all of these terms are variables or products of
variables and constants a linear system!

18
Example. A process stream with 10% pollutant by mass is diluted with another clean process stream
(with 0% pollutant by mass) before it is discharged. The diluted stream has a flowrate of no more than
2,000 lbm/hr. If the maximum allowable concentration of pollutant in this stream is 0.1%, how much of
the polluted process stream and the clean process stream are consumed at maximum capacity?

Polluted stream Diluted stream


? lbm/hr 2000 lbm/hr
10% pollutant 0.1% pollutant
Clean stream
? lbm/hr
0% pollutant

Solution. In this problem, we are given information for overall flowrates and concentrations. Theres
certainly more than one way to write a linear system for this problem. Here is one approach:

Treat the important flows as pollutant and other. The polluted and diluted streams have both
pollutant and other. The clean stream is just other. Thats five mass flows. Were given one specified
stream (2000 lbm/hr diluted) and two stream specifications (% pollutants in polluted and diluted
streams). Pair that with two material balances (one for pollutant, one for other), and we have five
equations and five unknowns:

mP,p-mD,p=0 (pollutant balance)


mP,o+mC,o-mD,o=0 (other balance)
0.9mP,p=0.1mP,o (stream specification on polluted stream)
0.999mD,p=0.001mD,o (stream specification on diluted stream)
mD,p+mD,o=2000 (specified stream)

We can transform this into a matrix equation:

Inverting the left matrix and left-multiplying it against the right column vector, we obtain

All units are mass per hour.

As you studied in your material and energy balances class, you can apply similar techniques to many
combinations of the four basic mass processes: mixing, separating, splitting, and reacting. The system is

19
only nonlinear in cases with complicated thermodynamics (such as vapor-liquid equilibrium) or
complicated reaction kinetics (reversible or high-order reactions).

These sorts of problems can get pretty complicated quickly. A system with a reactor and recycle can
easily have over 20 equations in 20 unknowns. In those cases, you could set up an appropriate matrix, or
you can make substitutions where it makes sense to end up with a core system of just a few equations
and a few unknowns. Usually these core equations are the material balances themselves; all other
specifications (basis, system, or stream specifications) can be substituted into the material balances.

2.6.1 Practice Problems

1) You have finished your first week at the Fictitious Chemical Company and are about to leave for
a well-deserved weekend when you get a call from Bob, another new hire to the company. You
are much smarter than Bob, and he knows it, so when he cant get his liquid-liquid extraction
unit working properly, he calls you up to ask for help analyzing it.

The multi-stage liquid-liquid extraction process Bob is running removes an impurity from one
liquid stream by dissolving it in another liquid stream that passes it counter-currently. The flow
rates in each direction up and down the column are constant, but the concentration of the
impurity is different at each stage. Consider the currently-running three-stage column below:
F2
y3
x4

Stage 3

y2 x3

Stage 2

y1 x2

F1 Stage 1
y0
x1
The convention is for the subscript on the mass fraction to be the number of the stage that
stream is exiting. So the impure stream flows in at a mass flow rate of F1 lb/hr and has a mass
fraction of impurity of y1 lb impurity/lb stream 1 exiting the first stage, a mass fraction of
impurity of y2 lb impurity/lb stream 1 exiting the second stage, and so on. Likewise, the stream
removing the impurity from the original stream has a flow rate of F2 lb/hr and a mass fraction of
impurity of x1 leaving the first stage, x2 leaving the second stage, etc, even though F2 is flowing
the opposite direction of F1.

20
With the system at steady-state, there is an equilibrium ratio between the mass fractions xi and
yi that is a constant value across all stages:

a) Write a mass balance on impurity passing through an arbitrary stage j in terms of the flow rates
F1 and F2 and appropriate mass fractions (such as xj, for example).
b) Bob needs to add another stage to this column. A solution is 10% acid by mass (the balance is
water). It is put into a four-stage column at a rate of 400 lb/hr in contact with a stream of pure
n-hexane entering at a flow rate of 1000 lb/hr. The equilibrium constant K for this separation is
2 lb water/lb hexane. Perform a degree of freedom analysis on the four-stage system, then write
a system of linear equations describing the mass flow rates of acid in the system. Then use
MATLAB to solve this system, determining the mass fraction y4 of acid in water in its exit stage,
and the mass fraction x1 of acid in hexane in its exit stage. (Hint: by writing the mass balance on
the impurity at each stage, and substituting in the equilibrium coefficient, you can write the
governing mathematical model in as few as four linear equations if you wish.)

2) When certain solutes crystallize from aqueous solutions, the crystals are hydrated salts a
compound where H2O molecules are bonded to solute molecules. We can think of this as a sort
of reaction; for example, in the case of magnesium sulfate at room temperature, we would have

where the product of this reaction is a single molecule.

A feed of aqueous magnesium sulfate at 220F containing 35 wt% MgSO4 is fed to a cooling
crystallizer that operates at 70F. At 70F, the solubility of MgSO4 in water is 0.25 lb MgSO4(aq)
per lb of solution. We seek to determine the feed rate of magnesium sulfate and water (in
lbmol/h) needed to produce one ton per hour (2000 lb m/h) of magnesium heptahydrate.

(a) Conduct a degree of freedom analysis (assume steady state) to show there are zero degrees of
freedom in this process.
(b) You should find that there are six equations in six unknowns in this problem. One way to write
the six unknowns are labeled as n1 through n5 and in the process flowchart below:

n1 lbmol/h MgSO4 (aq) lbmol/h n3 lbmol/h MgSO4 (aq)


n2 lbmol/h H2O of reaction n4 lbmol/h H2O
n5 lbmol/h MgSO47H2O (s)

Write the system of equations that relates these six variables in a single matrix equation.

21
3) The FCCs petrochemical division is under review for the way that it separates a portion of the
crude oil that is processes each day. You have been tasked to investigate how the very lightest
components of the hydrocarbons are separated. The four product streams are fuel gas
(primarily hydrogen, methane, and ethane), propane, n- butane, and isobutane. All percentages
in the diagram below are in mole percents. For your analysis, take a basis of 100 lbmol/hr of
feed.

a) Perform a degree of freedom analysis on the following setup to determine whether the
system is overspecified, underspecified, or exactly specified.

b) If the system is overspecified, propose what piece(s) of information to ignore so that there
are zero degrees of freedom. If the system is underspecified, make additional assumption(s)
on the system so that there are zero degrees of freedom.

c) Once you have a system with zero degrees of freedom, set up the system of material
balances (that is, explicitly write out all the equations). Do not solve.
D1
99.9% fuel gas
0.01% propane

B1
5% fuel gas
95% propane
Feed
21% fuel gas
9% propane D2
38% n-butane 1% propane
32% i-butane 99% n-butane

B2
3% propane
5% n-butane
92% i-butane

22
4) The environmental division of FCC wants to you analyze the watershed consisting of a series of
five reservoirs as shown in the figure below each reservoir has volumetric in- and outflows of
water Qi and mass loads (inputs) of chloride Li as labeled. They are interested in solving for the
steady-state distribution of chloride ions in each reservoir. Write out the system of equations
that describes the chloride ion mass balance in each reservoir, then solve this system for the
steady-state masses of chloride ions in each reservoir using MATLAB.

In using MATLAB to get numerical results, it is fine to simply define a matrix A and vector b in
the Command Window and use the appropriate operation to solve the system (but this is the
last time for that you will be asked to use MATLABs publish command after today). You can
copy and paste what you type and see in the Command Window into a word processing
document for submission.

LA=180 mg/min LC=740 mg/min

QA=67 m3/min LE=740 mg/min


Reservoir A Reservoir C QC=161 m3/min

QE=212 m3/min
Reservoir E
LB=710 mg/min
LD=3850 mg/min

QB=36 m3/min QD=182 m3/min


Reservoir B Reservoir D

5) A civil engineer involved in a construction project requires a mixture of 4800 lb sand, 5800 lb
gravel, and 5700 lb coarse rock. There are three pits to obtain materials pit #1 is 52% sand,
30% gravel, and 18% rock by mass; pit #2 is 20% sand, 50% gravel, and 30% rock by mass; pit #3
is 25% sand, 20% gravel, and 55% rock by mass. The engineer needs to determine how many
pounds of materials should be obtained from each pit.

(a) Perform a degree of freedom analysis on this problem to show that there are the same number
of equations as unknowns.
(b) Letting m1, m2, and m3 be the masses of materials from the three pits, write a matrix equation in
the form Am=b that could then be solved to find these values. (You dont need to solve for
them.)

23
2.7 MATLAB Exploration: Linear Algebra

Reminder: Its in your best interest to watch the following videos before coming to lab.

Chapter 3 of the free version of the MATLAB Interactive Tutorial published by Mathworks at
http://www.mathworks.com/academia/student_center/tutorials/mltutorial_launchpad.html. You can
check out Chapter 2 if you want, but note that you cant get by just clicking through menus in ENCH 225.

The first two sections of the MATLAB Linear Algebra video at


http://www.mathworks.com/academia/student_center/tutorials/computational-math-tutorial-
launchpad.html. Introduction to Linear Algebra, Solving Linear Systems (about 18 minutes).

It is also helpful to have looked through Chapters 1 and 12.1-12.2.1 in Attaways MATLAB: A Practical
Introduction to Programming and Problem Solving or Chapter 1 through 3.2 in Prataps Getting Started
in MATLAB.

By the end of this exploration, you should be able to

Navigate the MATLAB Desktop.


Define scalar, vector, and matrix variables.
Generate random numbers.
Use the MATLAB Command Window as a calculator for both scalar and matrix operations.
Solve matrix algebra equations using MATLAB.
Write simple MATLAB scripts.

Open MATLAB on your computer. In Windows, MATLAB should be on the list of all programs if you use
the start menu. (Be careful on UMBC machines to not just search for MATLAB and hit enter Activate
MATLAB comes before MATLAB alphabetically, so youll run the wrong program.) When MATLAB
starts, you should end up in a program that looks like Figure 1 below:

Figure 1: Default Configuration of the MATLAB Desktop

24
For most of today, we will devote our attention to the largest window in the center of the MATLAB
Desktop, the Command Window. This is where MATLAB can be used as a calculator, or as a sandbox
for testing out short commands.

Each MATLAB window has the same downward-pointing-triangle-in-a-circle in the upper right corner
these let you customize the layout of windows in MATLAB to be more useful for you personally. When
you undock a MATLAB window, it pops it out to be manipulated using Microsoft Windows instead. The
undock option is replaced with a dock option, which lets you relocate the window into MATLAB. If you
close a MATLAB window and want it back, you can find it under Layout on the Home tab.

For now, lets just work in the MATLAB Command Window. Confirm that you can enter commands into
this window much like you would expect on a scientific calculator. Try entering each of the following at
the >> prompt and keep an eye on the Command Window, Command History, and Workspace as you go:

2 + 3

4 - 1 When you see sections of the


document like this, you should enter
3 * 7 what you see at left into MATLAB,
and make note of what you see
4^2
and/or what you know/learn in the
6/3 white space here!

6/3;

In the last example, we put a semicolon after our computation. What is the difference between the
results of the last two entries?

MATLAB will perform any computation you enter into the Command Window, but more than that, it will
automatically store the result in its memory as the variable ans as you can see in the Workspace.
MATLAB also keeps a running log of the commands that you have entered at the prompt in the
Command History. You can drag and drop lines from the Command History back into the Command
Window, or, from a blank prompt in the Command Window, press the up and down arrow keys to cycle
through previous entries.

If we want to retain a result for use later in the Command Window, we must use the = sign, which in
MATLAB is the variable assignment operator. (It literally means is assigned the value, and NOT is
equal to.) There are specific rules about how variables and the variable assignment operator work:

Variables are case sensitive. You can store separate values as x and X in MATLAB, so be careful!
Variable names can overwrite MATLAB functions, so be careful! If you save a variable as plot,
then you can no longer use the plot command!
Variable names must start with a letter, but can otherwise contain alphanumeric characters and
underscores. Virtually every other character (!, @, #, $, %, ^, &, brackets, commas, parentheses,
slashes, arithmetic operators, and spaces) will result in an error or behavior that you probably
dont want. Seriously, dont use spaces anywhere in MATLAB unless you really mean it.

25
The variable assignment operator must always be used in the form

variable_you_want_to_define = what_you_want_to_assign_to_that_variable

The left side is always the name of the variable you are writing to; it cant be a mathematical
expression. The right side can be any valid MATLAB command you can assign variables to have
the same value as other variables or to have the result of some command or computation.

And some best practices:

Give variables short but meaningful names. Its clearer to say height than h or
vertical_displacement, and to say molefracA instead of x.
Avoid giving variables the same name with just different use of capitalization.
Dont overwrite existing MATLAB functions. This bears repeating. If you have written your own
MATLAB program or know there is an existing MATLAB program with a certain name, do not
assign the same name to a variable!

Try these out, and keep an eye on the Workspace:

x = 2;

y = 4;

z +2 = y ; % this should cause an error!

z = y-2;

z = z+1;

x = 1:5;

Notice that you can use previously defined variables to define new variables and even to overwrite
currently used variables! In the case of z=z+1, you are literally instructing MATLAB to assign to z the
current value of z plus one, which is a valid command. As a mathematical formula, it doesnt make
sense, but the = sign doesnt mean is equal to in MATLAB!

The colon is a special operator in MATLAB that is used in a few different ways. One way is to define a
vector of values. Notice when you defined x in that last line, MATLAB automatically created a 1-by-5 row
vector. It would be equivalent to write one of the following:

x=[1 2 3 4 5];
x=[1,2,3,4,5];

In general, there are four ways to define vectors. You should be familiar with all of them. They all work,
but each one is easier to use in certain situations.

Write out each element of the vector and enclose them in square brackets. This method must be used if
there is no mathematical pattern in the creation of the vector. Use commas or spaces to separate
elements of a row vector and semicolons to separate elements of a column vector.
Try it: how would you define a row vector [4 93 7 -2] in MATLAB?

26
Use the colon operator to create vectors whose elements are linearly increasing or decreasing. By
default, MATLAB assumes you want to make a vector where each element is one greater than the last,
so you just have to type the first value, a colon, and the last value.
Try it: how would you define a row vector [3 4 5 6 7 8 9] in MATLAB?

You can also specify how to increment the vector if you need the values to be changing by some other
value by writing that value between the first and last value; for example, 1:2:9 would count by twos
from 1 to 9, so [1 3 5 7 9].
Try it: how would you define a row vector [4 9 14 19 24] in MATLAB?
Try it: how would you define a row vector [10 9 8 7 6 5 4 3 2 1] in MATLAB?

The linspace command in MATLAB is helpful when you know the first and last value of a vector and
the number of elements you want in that vector. MATLAB will automatically determine how to
increment the vector. Almost every built-in function in MATLAB requires inputs to be given in
parentheses following the function name. In this case, you should write
linspace(first_value,last_value,number_of_elements)
Try it: how would you define a row vector [1 3 5 7 9] using linspace?
Try it: how would you define a row vector starting at 0, ending at 10, with 100 elements?

Define the vector using an already existing vector. If you already have created a row vector, you can use
mathematical operators or functions to act on that vector. There are some rules here, so well look at
that in more detail now.

If you already have a vector, but just want a certain element of it, you can use parentheses to call that
element out; for example x(3) gives the third element in x.

Vectors can only be added and subtracted if they are the same size (have the same number of
elements).

Vectors can only be multiplied if one is a row vector (one row and N elements) and the other is a column
vector (N elements and one column). This is the dot product we defined back at the start of this chapter.

27
If you want to perform element by element multiplication on a vector, then they must be the same size
and you must use the element by element operator in MATLAB, which is the period. Lets try a few
things:

x=linspace(2,10,5);
y=-2:2;

You should be able to add or subtract x and y, or use them together with scalar values:

z=2*x;
w=x+y;

But you wont be able to get the dot product:

x*y % should cause an error

You can get around this by transposing the second vector (turn it from a row vector to a column vector).
In MATLAB the transpose operator is the apostrophe:

y_col=y';
xy=x*y_col;
yx=y_col*x;

Notice that left multiplying and right multiplying are different when using vectors!

Many built-in functions will work element by element on vectors:

sin(y);
exp(y);

To multiply element by element, put a dot in front of the multiplication operator:

x.*y

Notice this is different than the dot product of two vectors!

You also need the dot to perform exponentiation:

x.^2;

There are a few commands that can be helpful in measuring vectors:

length(x)
size(x)

28
The length command returns the number of elements of a vector. The size command gives the
dimensions of the vector (number of rows and columns). These can be helpful later on when you are
writing more complicated functions and computer programs where you dont know these lengths
outright (or dont want to repeatedly define them). For example, if you wanted to generate a vector that
has the same dimensions as another vector, you can use the length or size commands as follows:

ones(size(x))
zeros(size(x))
rand(size(x))

In each of the above examples, the result from the size(x) evaluation is then used as an input to the
ones, zeros, or rand command. What do those commands do? You might be able to guess based on
their outputs, but anytime you dont know what a function does, MATLAB has two commands for that:
help and doc. The help command gives a summary of the help file that is associated with that
command and displays it in the Command Window. The doc command actually opens a popped-out
help window that usually includes all the information that the help command gives, plus gives more
details and examples. Try it:

help rand
doc rand

Using information from the help or doc files (including examples on those pages!), can you now
Create a vector of 100 random numbers between 0 and 1?
Create a 5-by-4 matrix of random numbers between 0 and 1?
Create a 5-by-4 matrix of random numbers between 0 and 10?
Create a 5-by-4 matrix of random numbers between 11 and 21?

Matrices are just an extension beyond vectors. They can be created in similar ways as before. To make a
matrix from scratch, simply enter the matrix one row at a time and separate the rows with
semicolons:

A =[1 2 3; 4 5 6; 7 8 9];

If you make a mistake and enter different numbers of elements in each row, MATLAB will flag an error.

The ones, zeros, and eye command are helpful for automatically generating large matrices of specific
types (all ones, all zeroes, or an identity matrix of mostly zeroes but ones down the diagonal).

Once a matrix has been created, the colon operator takes on another important function: it is also used
as the command for everything. (In programming, we call this overloading an operator which
makes the same character/function do different things in different situations.) MATLAB also recognizes
the end command when dealing with matrices to mean the last one. We can use these commands
together with the parentheses to call out specific elements from a matrix.

29
Define the matrix A as above, then try these:

x=A(3,2) should assign to x the value in the third row and second column of the matrix A.

x=A(:,3) should assign to x the value of every row in the third column of the matrix A.

x=A(end,end) should assign to x the value of the last row and last column of the matrix A.

A(3,2) = 19 should reassign the value of the third row and second column of A to now be 19.

MATLAB stands for Matrix Laboratory. It is a computer program that was made with the ability to work
with matrices explicitly in mind. All the rules of matrix arithmetic are followed in MATLAB the +, -, and
* operators will perform matrix addition, subtraction, and multiplication on matrices. So the usual rules
of matrix arithmetic apply. Remember that we dont exactly divide matrices, but instead multiply
them by their inverse.

There are many ways to do this in MATLAB, but here are two:

(1) Use the inv command. If you are solving the matrix equation Ax=b, in MATLAB you can define A
and b and then type x=inv(A)*b to solve for x.
(2) Use the backslash operator. MATLAB will automatically solve the system Ax=b if you type A\b.

MATLAB is also able to easily and quickly compute all properties of matrices we care about. Recall from
earlier this chapter that we can learn more about a matrix system and its solution if we know its rank,
determinant, and/or condition number. Those commands are straightforward:

rank(A) %Matrix is of full rank if this is the number of rows/columns.


det(A)
cond(A)

What does the evaluation trace(A) do? Consult MATLAB Help.

In addition to operating in the Command Window, MATLAB features an Editor that allows you to write
complete MATLAB programs. The Editor can be accessed in a number of ways, but its probably easiest
to just click the New Script icon under the Home tab (or hit Ctrl+N on the keyboard). If MATLAB
doesnt do it by default, its probably convenient to dock the editor window and drag it to sit above the
Command Window, like in Figure 2 below.

30
Figure 2: MATLAB Desktop with Editor

The contents of the Editor Window are executed in the order they are typed, one after the other, only
when the script is run. In order for the script to be run, it must be saved. MATLAB scripts are saved by
default as files with the extension .m. You do not need to include the extension when you save the file.
The same rules for defining variables apply for defining scripts they should be unique names and not
the same as another variable or already-existing MATLAB function. No spaces. No punctuation marks.
No spaces. Also, no spaces. Dont put a space in your function file name.

To run a script, type the name of the script in the Command Window, or, more easily, click the giant
green triangle marked Run in the top center of the Editor tab.

The Command History can be really helpful here, again if you have already entered content in the
Command Window that you know works, you can drag and drop lines from the Command History
directly into the Editor window. Do not put multiple lines in the Command Window on one line in the
editor. This is important, so to reiterate: the script is executed in the order it is written, one line after
the next. You cannot use a variable until after you already defined it, for example. Well look more
closely at scripts next week, but you will need to at least copy and paste commands from your
Command History in a script to successfully complete this weeks lab deliverable.

When you write scripts, you should include comments. These are notes to yourself, or to the instructor,
or to other people who might use your computer program (but if its an assignment, not to other people
in the course, because sharing computer programs is cheating and seriously wrong), and should include
things like units for variables, so you keep track of them, or statements about why you wrote a certain
expression a certain way. To write a comment, type the percent sign, %, and the text past it will turn
green, which is MATLABs way of saying it will ignore that text when it comes to running the script.

So, lets get to the real reason we are here today, and finally put MATLAB to use solving linear systems.

31
First, lets practice by putting the following set of equations below into matrix form:

2x+3y-5z=9

5x+6y+z=-1

x+2y-z=5

Define the left hand matrix as A and the right-hand matrix as b in MATLAB, and solve this system. What
are the values of x, y, and z? How did you figure them out? Double-check your solution by substituting
those values back into the equations (is there a fast way to do this in MATLAB?).

If you are given a matrix, do you recognize what system of linear equations that matrix represents?

You should now be able to solve any matrix equation of the form Ax=b using MATLAB. You should be
able to create a matrix equation if you are given a system of linear equations, or if you are given a
problem that allows you to develop those linear equations yourself. By way of one last example, we will
revisit a problem from ENCH 215:

In a given plant operation, four streams are mixed to give a single fifth stream with a desired
composition. The four inlet streams to the mixer and the final product stream have the compositions
shown below:

Stream Number Composition in mass %


H2SO4 HNO3 H2O Inerts
1 80 0 16 4
2 0 80 20 0
3 30 10 60 0
4 10 10 72 8
5 40 27 31 2

Determine the mass flow rate of each individual stream to make 2000 lbm/hr of final product.

For this problem, let the mass flow rates of streams 1-4 be your four unknowns. There are four material
balances to write for this problem, which should fully define the problem (no need to do a formal
degree of freedom analysis this time you would have 20 variables and 20 equations, which is a bit
much). Your four equations should be of the form f(m1,m2,m3,m4)=constant, so you can turn that into a
matrix and solve using MATLAB. Seek help in class if you do not figure out how to code this problem!

32
2.7.1 Practice Problems

1) Below is a chemical engineering problem1 and complete solution to show the order of
computations necessary to solve the problem. Write a MATLAB script that outlines the solution
to this problem by defining all constants upfront, then solving for each intermediate value in the
given order, and finally displaying the results clearly in the Command Window. Prepare your
MATLAB script as it is required for this class (refer to the Code Standard distributed in class and
on Blackboard).

When you are finished, save the MATLAB script, and, making sure that the Current Directory is
the same as the folder where your script is saved, enter in the Command Window:
publish('filename','pdf')
where filename is replaced with the name of your MATLAB script (do not include the .m
extension of the file when using the publish command).

Problem: A liquid mixture of benzene and toluene contains 65 mol% benzene. The mixture is
partially evaporated to yield a vapor containing 76 mol% benzene and a exiting liquid containing
55 mol% benzene. For a feed rate of 100 mol/h of the initial mixture, determine the molar
flowrates of benzene and toluene in the liquid and vapor streams.

Solution:
Total mole balance: moles in = moles out
ntotal = nv + nL
Benzene balance:
zbenntotal = ybennv + xbennL
Substituting,
zbenntotal = ybennv + xben(ntotal - nv)
Solving for nv,
nv=ntotal(zben-xben)/(yben-xben)=47.6 mol/h
Back-substituting to get nL,
nL=ntotal-nv=52.4 mol/h
Benzene in liquid stream: xbennL=28.8 mol/h
Benzene in vapor stream: ybennv=36.2 mol/h
Toluene in liquid stream: (1-xben)nL=23.6 mol/h
Toluene in vapor stream: (1-yben)nv=11.4 mol/h

1
Adapted from Felder and Rousseau, Elementary Principles of Chemical Processes, 3 rd edition

33
2) One way to approximate the flow field in a rectangular channel is to partition it using a grid and
approximate the velocity at each point in the channel based on the velocities of the four points
adjacent to it in the grid.

Consider the grid below, divided into five lines in the x-direction and four lines in the y-direction.
We want to solve for the velocities at the six grid points in the middle (circled).

It is known that the velocity of a given point v(x,y) depends on the velocities at the point to the
left v(x-1,y), the point to the right v(x+1,y), the point above v(x,y+1), and the point below v(x,y-
1) by the relationship

Set up a system of linear equations that relates the velocities at the six circled points to each
other in the form of a matrix equation Av=b, assuming that the velocities on all of the edges is
zero. Define the matrix A and vector b in MATLAB, then solve for the six unknown velocities.

Show your work using a well-commented code in MATLAB, which you will then publish to .pdf
format. Refer to the Coding Standard on the Blackboard site for rules regarding variables,
comments, and cell structure. Refer to MATLAB documentation for help with the publish
command as needed.

34
3 Algorithmic Thinking
Required Reading/Viewing:

Chapters 4-6, 9-10, and 13 of the free version of the MATLAB Interactive Tutorial from Mathworks at
http://www.mathworks.com/academia/student_center/tutorials/mltutorial_launchpad.html.

Recommended for Further Consideration:

Sections 4.2-4.3 in Prataps Getting Started with MATLAB. If you are particularly uncomfortable with
MATLAB, work through Sections 2.3, 2.4, 2.5, and 2.7 this week.

Chapters 2-4 in Attaways MATLAB: A Practical Introduction to Programming and Problem Solving.

Chapter Goals

Convert a flowchart or pseudocode into a working computer code. You must be able to read
flowcharts and pseudocodes to understand the work of others. Being literate in flowcharts and
pseudocodes will aid in your understanding of experimental and numerical methods.

Create an algorithm in terms of the basic functions a computer. In order to create a successful
computer code from scratch, you should know what a computer can and cannot do, and be able to
think in terms of conditional statements and loops.

Describe an algorithm in a way appropriate to technical communication. In addition to being able to


read flowcharts and pseudocodes, you must be able to write them yourself to clearly communicate your
ideas with others.

3.1 The Six Operations of a Computer

All computer functions can be boiled down to being one or a combination of the six following tasks, the
six basic operations of a computer:

INPUT: Reading information. Examples include loading or opening a file, or prompting the user for some
sort of entry for example, a button press, some numerical information, or some textual information.
OUTPUT: Writing or displaying information. Examples include playing a tone or sound and displaying
text or numbers.
REMEMBER: Storing information to memory. Examples include saving data to a file or defining a
variable.
COMPUTE: Perform calculations with information. Lots of examples here adding, subtracting,
multiplying, and dividing are the most basic ones.
DECIDE: Compare a piece of information to another to determine how to proceed. Examples here
include comparison operators like greater than, less than, and equality. In textual computer language,
decisions usually involve some sort of if structure.
REPEAT: Perform a task multiple times (loop through multiple iterations). In textual computer language,
repetitions are typically achieved using a for loop or a while loop.

35
So far, we have used MATLAB to solve linear algebraic systems. This has involved defining variables
(input and memory), doing arithmetic with those variables (computation), and displaying the results of
those arithmetic operations (output). Section 2.9 of Pratap discusses ways to save and load data.

While we havent explicitly instructed MATLAB to make decisions or repeat a procedure, the truth is the
innocent-looking backslash operator that we use to solve the system Ax=b is actually a complicated
sequence of computer operations in which the rows and columns of A are manipulated in a particular
order according to the values inside that matrix a complicated but not difficult combination of
computations, decisions, and repetition.

A decision (sometimes called a conditional statement) in computer programming is the result of a


comparison. For example: If the temperature outside is below 32 degrees Fahrenheit, then falling
precipitation is snow. Otherwise, falling precipitation is rain. When writing out decisions, it is
sometimes helpful to indent your text to more clearly show there are multiple possibilities:

There is precipitation falling from the sky.


IF the temperature is less than 32 degrees,
THEN it is snowing.
IF the temperature is not less than 32 degrees,
THEN it is raining.

A repetition in computer program is more typically called a loop. There are basically two flavors of
loops: the for loop and the while loop. A for loop is a computer instruction to perform a task a
certain number of times.

FOR the next ten people in line


Let the person into the store

A while loop is based on a condition (and, in that way, is kind of a combination of a decision and a
repetition):

WHILE there is room in the store


Let the next person in line into the store

Which kind of loop to use depends on the situation. Notice that the two examples above are different in
perhaps slight but perhaps important ways.

When working with loops in MATLAB, you must conclude the set of commands inside the loop with an
end.

The above examples and indented texts are examples of pseudocode. A pseudocode is a textual
sequence of information that describes an algorithm -- basically a computers plan of attack for
working with information, or a detailed sequence of instructions for accomplishing a task. The most
detailed computer algorithms provide a sequence of computer operations broken down to the level of
the six fundamental operations. Algorithms are usually represented in one of two ways as a list of
directions like a pseudocode, or in a graphical representation known as a flowchart.

36
An example of an algorithm is something weve run through several times in ENCH 215: lets take the
general procedure for solving material balances, written as a pseudocode:

Choose a basis.
Draw a flow diagram and label all the known and unknown variables.
Express the problem statement in terms of labeled variables.
If multiple kinds of units or quantities are given (i.e., moles and mass), convert them all to one basis.
Perform a degree of freedom analysis.
If the degrees of freedom is zero, write equations in an efficient order (avoiding simultaneous equations
when possible).
Solve the equations.
Calculate the quantities requested by the problem statement if they have not already been calculated.

Notice that some steps of this algorithm pretty clearly correspond to one basic operation (choose a
basis is just inputting a value) while others are much more complicated (solve the equations is a
series of decisions and calculations) and others are explicitly steps with multiple actions (if this is true,
do that). These more complicated steps to the problem-solving algorithm are examples of subfunctions
(or subroutines). Step 4, for example, could be broken down into parts to convert every step to moles
and written as this pseudocode:

For every quantity given,


INPUT: Check to see if the current quantity is in moles.
DECIDE: If the quantity is already in moles, move on to the next quantity.
DECIDE: If the quantity is not in moles, check to see what the units are.
DECIDE: If the quantity is a mass,
COMPUTE: Divide the mass by the molecular weight.
DECIDE: If the quantity is a volume,
DECIDE: If the quantity is an ideal gas,
COMPUTE: Use the ideal gas law to convert to moles
DECIDE: If the quantity is a liquid or solid,
COMPUTE: Multiply by density and divide by molecular weight.
REPEAT the above until all quantities are converted.

Typical pseudocodes do not explicitly list the six basic operations of a computer, but I have added them
above to emphasize that each step is essentially one of these six operations. When you write your own
pseudocodes, they should at least be broken down into simple enough parts that you can easily
convert them into a code suitable for the computer program you are using.

Pseudocodes are sometimes easier for people to use to help write actual computer code, but for others,
a more visual approach to computer codes is necessary. Enter the flowchart! Sadly, as a simple trip to
ilovecharts.tumblr.com will attest, next to no one makes a proper flowchart. So, the next section
explains how its done.

3.1.1 Practice Problems

For each of the following numerical methods, break the process down into the six basic operations. Note
there are multiple valid responses to each of these.

37
1) Compute the quotient and remainder for a division problem in which one number does not
perfectly divide into the other.

2) Determine the tip that should be put on a meal at a restaurant, allowing for the tip to be larger
if service was especially good.

3) Determine how many terms of the Taylor series expansion for exp(x) around x=0 must be added
together such that the error between this expansion and the actual value of exp(0.5).

3.2 Flowcharts

A major reason we took the time to break down computer algorithms into the six basic operations is
that they easily lend themselves to flowcharts. Flowcharts use differently-shaped blocks to
communicate whether a block is an input/output, decision, or computation. The basic shapes of a
flowchart include the following:

The start and end of a flowchart are marked with an oval:

Input/output (reading/writing) is represented with a parallelogram:

Computations (and sometimes memory) is denoted with a rectangle:

Decisions are marked with a diamond:

The results of decisions, the flow of the algorithm, and the presence of repeating (loops) is shown using
arrows.

Except for diamonds, only one arrow exits from each block in a flowchart. Its possible that multiple
arrows may point to the same block. There are multiple ways to denote multiple arrows into one block
(the arrows may merge before entering the block or not); whatever makes the flowchart clearer in any
case is always preferred.

Step 4 from the problem solving procedure in the last section can be transformed from pseudocode to
flowchart as shown on the next page:

38
start

Are there quantities to no


end
consider?

yes
Move to next quantity
Read the current quantity

Is the quantity in moles? yes

no

What are the dimensions? mass Divide by molecular weight

volume
liquid
What is the phase?
Multiply by density

gas
Use ideal gas law

The purpose of both pseudocodes and flowcharts is to present information related to a computer
algorithm in a consistent way that is clear to understand. Sometimes for complicated problems, it is
helpful to envision a computer code in this way. Note that it is not always necessary to break down a
command into its most fundamental computer operations often listing more general ideas (or
subfunctions) is sufficient.

When working through computer algorithms, if you are stuck, first try to represent what you want to do
as a flowchart or pseudocode. (If you work with other students, it is fine to share flowcharts and
pseudocodes; it is academic misconduct to share computer files.)

39
3.2.1 Practice Problems

1) A classic example in computer programming (and one we will code very soon) is the game
where one person is thinking of a number and prompts you to guess it, indicating if your guess is
too high or too low. Describe the best strategy for winning this guessing game. Write this
strategy as a flowchart.

2) The internet is full of examples of flowcharts, but rarely are they formatted correctly. Find a
flowchart online and redraw it to use the proper shapes/text/arrows.

40
3.3 MATLAB Exploration: Scripts, Functions, and Visualization

Reminder: Its in your best interest to watch the following videos before coming to lab.

Chapters 4-6, 9-10, and 13 of the free version of the MATLAB Interactive Tutorial from Mathworks at
http://www.mathworks.com/academia/student_center/tutorials/mltutorial_launchpad.html.

It is also helpful to have read Chapters 2-4 in Attaways MATLAB: A Practical Introduction to
Programming and Problem Solving or Sections 4.2-4.3 in Prataps Getting Started with MATLAB. If you
are particularly uncomfortable with MATLAB, work through Sections 2.3, 2.4, 2.5, and 2.7 this week.

By the end of this exploration, you should be able to

Write MATLAB functions that require one or more direct inputs and produce one or more outputs.
Use commands like disp and plot to display computational results.
Save and load the contents of a MATLAB workspace.
Write a conditional statement in MATLAB.
Create a loop in MATLAB.
Debug MATLAB scripts using built in tools.

Today we will operate primarily using the Editor window in MATLAB. Remember that the Command
Window is best suited for one-line operations, or as a testing ground for a small sequence of actions,
and it is convenient to use because all lines entered are remembered in the Command History. In the
case of more complicated programming techniques, such as input/output, decisions, and repetitions,
the Editor should be used, because many of these techniques require multiple lines to successfully
execute!

Just above the Editor (or Command Window, if the Editor is closed) is your Current Directory. This is
the folder from where MATLAB is currently saving/loading files. You may want to navigate to another
folder as necessary. In order to use commands that operate on MATLAB scripts (like those buttons on
the Editor and Publish tabs), your Current Directory needs to contain the script.

A function is a special kind of MATLAB script that usually requires some sort of input and provides some
kind of output. In order for your script to be recognized as a MATLAB function, the first line in your
script has very specific requirements. Here is the idea:

function [output1,output2] = function_name(input1,input2,input3)

The function script This is the name of your


must begin with the function. Your script
word function. must be saved with the
same name.

After a single space, the name(s) of the output Immediately after the function name is an
variable(s) is (are) given. If there is more than one open parenthesis, then the input(s).
output, they must be enclosed in brackets. There Multiple inputs should be separated with
can be any number of outputs (even none!) commas. Close parenthesis at the end.

41
So far in this course, we have used functions in passing MATLAB has hundreds of built-in functions, like
the matrix/vector creating functions ones and zeros, the mathematical functions sin and exp, and
matrix evaluation ones like det, rank, cond, and trace. Now you will be able to create your own
custom functions to serve your needs.

It is important now to distinguish between a more general script and a function. Here are the major
differences.

Script Function
Usually written to achieve a specific, limited task. Usually written to work for more general tasks,
with the idea of needing it again in the future.
No specific first line necessary. Very specific requirements for what must be in the
first executable line of your m-file.
No specific last line necessary. It is good programming technique in MATLAB to
include the word end at the end of your
function. (Currently, this is not yet mandatory.)
No inputs or outputs absolutely expected in the May be written to require any number of inputs
file. Inputs or outputs may not be required when and outputs (may be zero, one, or multiple). Inputs
the script is run. Everything is assigned directly must be specified when running. Outputs must be
inside the script. assigned using the = operator.
Generally run only directly from the Command Generally run only within other functions or
Window or using the Run button in the editor. scripts, or in the Command Window.
Any variables that are assigned during the running Any variables that are assigned during the running
of the script will be posted to the Workspace. of a function are used in a private version of the
Workspace and not posted to the Workspace at
any time before/during/after running.

Lets look at an example. Say that we want to write a simple script that converts a temperature from
degrees Fahrenheit to degrees Celsius. Enter this script into the Editor and save it, then run it.
% FtoC.m
% Converts a specific temperature from degrees Fahrenheit to degrees
% Celsius.

tempF=77;
tempC=(tempF-32)*5/9;

When the above script is run, notice that the values for tempC and tempF appear in the Workspace. The
script is run by entering its name in the Command Window. Notice that if we wanted to convert another
temperature, we would have to modify the original script itself.

Now enter the following script into a new Editor window and save it (what should you save it as? Refer
to the function declaration description on the previous page if you arent sure!):

function tempC = convertFtoC(tempF)


% Converts an input temperature from degrees Fahrenheit to degrees
% Celsius.

tempC=(tempF-32)*5/9;
end

42
If you try to run this function using the Run button in the Editor or by typing convertFtoC in the
Command Window, you get an error:

>> convertFtoC
Error using convertFtoC (line 5)
Not enough input arguments.

A function requires an input to be given directly. Try typing this into the Command Window:

convertFtoC(77)

Now the function should have properly run (if not, double-check your m-file to make sure you entered it
correctly and saved it as convertFtoC.m), but notice the differences compared to the FtoC.m script that
you wrote:

There are no values for tempC or tempF in the Workspace, and instead a value for ans, the same
way there was when we did basic math in MATLAB last week. If you wanted the result of the
function to be stored as another variable, just enter a command like T=convertFtoC(32)
We can change the number in the parentheses after the function name and we get different
correct results immediately, with no need to edit the function file directly.

The function we just wrote has a single input and single output. Lets put together a quick program to
use two inputs and outputs say, compute the area and perimeter of a rectangle given its length and
width:

function [area,perimeter] = rect(length,width)

area=length*width;
perimeter=2*(length+width);
end

Write and save this program, then try entering the following lines into the Command Window. Which
ones run without error? What is the result in those cases?

rect
rect(4)
rect(4,3)
x=rect(4,3)
[a,p]=rect(4,3)

The number of inputs into your custom function must be the same as the number of inputs you
specified in the first line of the function. If a custom function gives more than one output, the outputs
are only collected if you specify you want them all (in this case, just both) only in that last of those five
command lines did MATLAB store both the area and perimeter to the workspace!

In general, if you call a function that you have written, then the line in MATLAB to execute it is

[outvar1,outvar2]=function_name(invar1,invar2,invar3)

where the number of inputs and outputs should be the same as that in your function m-file.

43
Now we have explored how MATLAB functions work with inputs and outputs, but before we move on to
more of the basic six computer operations, lets look at two more kinds of output: MATLABs plot and
disp commands.

The plot command is used to create figures in MATLAB. The most basic call of the plot command first
requires that we define vectors that correspond to the independent and dependent variables. Lets
make a simple plot of y=x2. You can enter this in a script or in the Command Window:

x=0:10;
y=x.^2; % why do we need a dot in front of the ^?
plot(x,y)

If you enter just one set of vectors, x and y, into the plot command, you get MATLABs default
decisions of a solid blue line that basically connects the dots. You can dress up the plot command
with lots of other inputs:

plot(x,y,'ko',x,y,':g','LineWidth',3)
xlabel('This is the x-axis')
ylabel('This is the y-axis')
legend('Points','Line','Location','Southeast')

If you had to guess what xlabel, ylabel, and legend did, youd probably guess correctly. The doc
command reveals legend can be a little more interesting: in addition to labeling different plots in order,
you can use supplemental commands like Location to tell it where to place the legend on the plot. Try
moving it to the upper left corner by rewriting that last line.

The other issue with MATLAB plots is the labels/axes fonts are tiny, especially when you copy and paste
into Word (use Edit Copy Figure). Choose View Property Editor in the Figure Window (wait for it to
load), then click on the graph axis, click the Font tab, and bump that font size up to at least 16 points.

The plot command can be used in a lot of different ways, but the basic idea is that you can plot
multiple data sets at once (as long as the paired vectors are the same length!) by simply entering them
one after the other with the same plot command (assuming you have vectors x1, y1, etc., defined first):

plot(x1,y1,x2,y2,x3,y3)

and so on. In addition, you can specify line type, color, and markers by typing them in single quotation
marks immediately after the y-axis data for a given set (type doc linespec to see the possible line
type, color, and markers MATLAB uses and see a more complicated plot example).

The other option for plotting multiple data sets at once is to use MATLABs hold command. This
instructs MATLAB to leave a current figure open and to plot on top of it. The above plot command for
three unknown sets of data is equivalent to writing

plot(x1,y1)
hold on
plot(x2,y2)
plot(x3,y3)

44
The disp command is used to display text directly in the Command Window. This is helpful to show
values in the Command Window while a script or function is running. The entry format is simply

disp('insert text here')

disp works also with variable values, but it will not mix variable types. If you wanted to make MATLAB
display a sentence announcing the result of a computation, you need to convert that result to text:

x=sind(90);
disp(['The sine of 90 degrees is ',num2str(x)])

Several important things to point out from just this disp line:

Text in MATLAB is specified by putting the contents in single quotation marks.


To convert a numerical variable to a text variable (MATLAB calls it a string), you use the num2str
command. (There is a similar str2double command that converts text to a number if that text
actually is a numeral.)
To display multiple pieces of text with one disp command, you need to make a vector of text, using
square brackets. If you use multiple disp commands, MATLAB will write to multiple lines.

MATLAB has a number of related built-in functions related to storing information for use at a later time.
Within MATLAB, there are the save and load commands, which can be used to save/load the contents
of the Workspace (or the private Workspace if used within a function). MATLAB also has the ability to
save/load directly to spreadsheets using the readxls/writexls or readcsv/writecsv commands.
Well talk a little more about these later, but be sure to use save and load as necessary in your work.

Another of the six basic computer operations is deciding. You have likely worked with computer
decisions before this is where the idea of an if/else statement comes into play. If one condition
holds, we want our program to do one thing, and if not we want it to do another. In MATLAB, these
statements are written as follows:

if condition
disp('this')
else
disp('that)
end

The condition is a logical expression that either computes as true or false. MATLAB has a variable type
called logical that can only take on one of two values: true or false (in the Workspace, theyll be listed
as 1 or 0).

Try entering the following lines in the Command Window, and keep an eye on the Workspace:

x=2;
x<1;
condition=x>3;
y=true;
z=isprime(5);

45
Notice that any time a comparison is involved, the result is a variable of the logical type. So really,
whats going on in that if/else/end structure is MATLAB is checking to see if condition is a logical 1 or a
logical 0. If it is a 1, MATLAB will execute, in order, the instructions between if and else (in our
example on the last page, it will display this in the Command Window), then skip the instructions
between else and end. If condition is a 0, MATLAB will skip the stuff between if and else, and
execute the lines in between else and end instead (in our example, it will display that).

There are lots of different conditional operators, and youve been using many of them in math class
probably since before you can remember (<, >, ==, etc.). In addition, MATLAB has a lot of built-in
commands that also produce true/false values, like the isprime command. We can have MATLAB
check to see if a variable is an integer, a numeric value, a character string, and many other things type
doc is* into the Command Window for the complete list.

The Command Window is a nice testing space for conditional statements, but we need to use the
Editor to execute if/else structures. So, lets try writing a basic MATLAB function called grader that
accepts a numerical value as an input and writes a letter grade as an output (and for practice, what must
the first line of your MATLAB file be? It is incomplete below).

function ___?___ = ___?___(___?___)

if score>=1700
grade='A';
end
end

(There are two ends because the first end is the end of the if structure, and the second end is the
end of the program. If you click on the word end in your m-file, MATLAB will briefly underline it and its
corresponding beginning. Try it click the first end and see that if gets underlined; click the second
end and see that function is underlined.)

Save your m-file (make sure its name is grader!) and then try using the function in the Command
Window (watch the Workspace as always):

grade1=grader(1750)

What happens if we didnt get an A, though?

grade2=grader(1478)

If we think about this program the way it is written, all it does it assign a text value of A if the input is
1700 or greater. If the input is less than 1700, MATLAB skips over the stuff between if and end, and the
function is finished! Sounds like we need an else.

if score>=1700
grade='A';
else
grade='not A';
end

46
Now try assigning grade2 again. It at least saves a value, but not A isnt very helpful. It would be nice
to be able to assign the right letter grade in the right condition. MATLAB allows for multiple if
statements at once using the keyword elseif. The last condition still just gets the word else.

if score>=1700
grade='A';
elseif score>=1500
grade='B';
elseif score>=1300
grade='C';
end

Can you finish the if/elseif/else structure by adding more elseif lines to assign letter grades from A to F?

While conditional statements are useful in making certain decisions in computer programming (if you
are running a computer simulation on a reactor at an unrealistic pressure or temperature, you might as
well have a simple if statement to skip over the rest of the program and not waste your time), they are
much more useful when it comes to ending a repetitive task. Repeating, another one of the six basic
computer operations, is done in MATLAB by using loops. The general structure is as follows:
while condition
do this
and that
and something to do with the condition
end

The lines in between while and end are executed in order, over and over again, as long as the condition
corresponds to a logical 1. If the condition corresponds to a logical 0, when MATLAB is at the while
line, MATLAB will skip over all the lines from while to end. Its really important to include some
instruction between while and end that can potentially change the condition on the while loop, or
else MATLAB will continue to execute those lines between while and end FOREVER.

If you get stuck in an infinite loop, click in the Command Window and press Ctrl+C on the keyboard.

Later in the course, well use while loops to run a computational procedure until our answer is
acceptable. A simple example of that is this guessing game, where the computer chooses a number
between 1 and 10 and prompts the user to guess what it is. Try saving this in your Editor:

number=randi(10); % What does this function do? Check the help file!
guessedright=false;
disp('I''m thinking of a number between 1 and 10!')

while ~guessedright % could also write while guessedright==false


userguess=input('Guess what it is: ');
if userguess>number
disp('Nope, it''s lower than that!')
elseif userguess<number
disp('Nope, it''s higher than that!')
else
guessedright=true;
disp('You got it!')
end
end

47
Lots of stuff going on here. Do you understand these details?
While (and if) again work by checking if the rest of the line after the word while (or if) is
true or false. Usually you will put an expression there to evaluate (like guessedright==false)
but if you have a variable that you know is true or false, you can just put that variable instead!
The ~ operator in front of a logical variable means not, so ~true means false.
The input command is a quick and dirty way to prompt the user for an input. Its not
particularly elegant (or useful) in complex computer programs, so unless you are writing a
program where you seriously want to prompt the user for an input somewhere in the middle of
the program, its better to just write a function that requires the input upfront.
If you want to use the disp command to display an apostrophe, type a double apostrophe (not
a quotation mark).
Its always a good idea to make sure all your if/elseif/elses cover all possible options. Here, the
guess must be too high, too low, or right there is no fourth option.
Notice that even though we changed guessedright to true, the program still displays You got
it! afterwards. Thats because MATLAB will execute commands in order. It is told to display
You got it! before it gets back to the start of the while loop in other words, the while
condition doesnt make MATLAB immediately break out of the while loop when the condition
changes.
In the case that you know exactly how many times you want a loop to be repeated, a for loop is the
way to go. The idea is very similar to the while loop:

for index=first_value:step:last_value
do this
and that
end

In the above general formulation, the index is MATLABs counter. It starts by setting index equal to
first_value, then it executes the lines between the for and end, then it adds step to the index and
executes the lines again, and continues to do this until the index exceeds last_value.

If you dont put a step on the for line, and instead write

for index=first_value:last_value

MATLAB will automatically assume a step of 1, which is often what you want to do anyway.

Another nice thing about for loops is you can use the value of index directly in your program. For
example, we could write a custom function that reads every value in a vector and displays the
maximum. Try writing and saving this function in a new Editor window:

function m = maximumvalue(vec)

m=-Inf;
for index=1:length(vec)
if vec(index)>m
m=vec(index);
end
end
end

48
A couple other MATLAB tidbits here:

MATLAB recognizes infinity (and negative infinity) as a value. This is helpful in initializing a variable
that you need later for instance, when you are computing the error in an iterative procedure (see
chapter 6) you probably want to set the initial error to positive infinity, since you never really know how
large the initial error might be!
You can have MATLAB do some math or evaluate a function directly in the line with the for loop. In this
case, I dont know how many elements might be in the input vector, but I can easily have MATLAB
determine its length.

Try the function out!

>> maximumvalue([1 2 3])


>> maximumvalue([1 50 -12 0 0 0 7])

Okay, one more thing this week. During your MATLAB adventures so far, I hope you have run into some
error messages. Those messages arent there just to make you angry, confused, or distraught, with their
cold, red text and sometimes confusing language. The truth is, we all make errors in computer
programming, all the time. When you get an error, try to interpret what it means. If there is an error in
one of your MATLAB scripts, do you see that MATLAB points out which line in your code is causing the
mistake? Can you tell why MATLAB stopped working? (Are your variables defined? Did you write a
custom function but save it in the wrong folder? Is there just a typo?)

Sometimes error messages are just too confusing to understand, but MATLAB has another tool to help
you in your quest: Debug Mode. Debug Mode lets you walk through a script, one line at a time, to see
how your code is working. In the Editor, there is button marked Breakpoints. Once you set a
breakpoint in a script and try to run it, Debug Mode becomes active, and all the options to the right of
the Breakpoints button change. To set a breakpoint, click a line in your code, then click set/clear
breakpoint on the Breakpoints menu. A red circle should appear on the left side of that line. Then, if
you click Run, MATLAB will run the code, but stop when it gets to the red circle.

From then on, you must click the step button to give MATLAB permission to go to the next line. Try
running the programs we wrote this week in Debug Mode and watch as MATLAB manipulates values in
the Workspace, one line at a time. You can exit Debug Mode at any time by clicking Quit Debugging.

49
When youre done, be sure to clear all breakpoints so that MATLAB doesnt go into Debug Mode every
time you use that script. And again, Ctrl+C breaks you out of a function if you dont want to debug it all.

Whew! That was a lot, but if you are patient and diligent, I know you can make all this stuff work for
you. Lets put it to the test with this weeks MATLAB problem.

50
3.3.1 Practice Problems

The randn command in MATLAB generates random numbers according to a normal distribution (the
rand function assumes all numbers are equally likely, which is a uniform distribution not what we
want for this simulation!). Demonstrate that you can create a vector of numbers that are normally
distributed and that you can visually display them in a histogram.

1) Your boss at the FCC sends you an email this morning, asking you to troubleshoot their chemical
reaction simulation software. In order for the simulation to predict a successful molecule of
reaction, the activation energy of that molecule must be greater than or equal to 75 kJ/mol, but
at any time the energy of an individual molecule may be much lower or higher than that. In
order to make sure that the current software is working correctly, you need to compare it to a
computer simulation that actually works, so you set off to write one in MATLAB.

a) Use the randn command to generate 10,000 numbers that are normally distributed with a
mean of 75 and standard deviation of 10. (Dont display this suppress it with a semicolon;
youll need it for part (b).)
b) Create a histogram of the random numbers that you just generated by partitioning them into
bins of width 1 (i.e., count the number of values between 40 and 41, 41 and 42, 42 and 43, etc
then plot them in a bar graph).
[Hint: check out the MATLAB commands hist, histc, and bar using one or two of these
built-in functions should make your life a lot easier. If you find the hist-type commands
confusing, you can also use a combination of loops/conditionals to achieve the same result,
which might be good practice anyway.]

2) In determining the likelihood of profitability for a new chemical plant design, a series of
complicated random number generations are used to simulate thousands and thousands of
possible scenarios for changes to costs of equipment, raw materials, and processing, as well as
changes to revenue or the amount of down time the plant experiences after starting up.

The Finance division of the FCC has already spotted you as the jack of all trades that you have
quickly become, and has asked you to create a new random number simulation script using
MATLAB. According to one forecast, the Net Present Value of a new business venture has a
mean of five million USD with a standard deviation of 4 million USD. They assume the
distribution of actual scenarios is normal.

a) Use the randn command to generate 10,000 numbers that are normally distributed with a
mean of 5 and standard deviation of 4.
b) Create a histogram of the random numbers that you just generated by partitioning them into
bins of width 0.5 (i.e., count the number of values between 0 and 0.5, 0.5 and 1, 1.0 and 1.5,
etc then plot them in a bar graph). Note that the lower limit of your histogram will probably be
negative. You may find the hist and/or bar commands to be helpful in generating your
solution. Dont forget to use MATLAB to publish your work.
c) Out of your 10,000 numbers, estimate how many of them are negative (which corresponds to a
loss of money). What is the probability that this investment will lead to a loss?

51
4 Descriptive Statistics and Basic Probability Theory
Recommended for Further Consideration:

Section 5.3 in Prataps Getting Started with MATLAB or Section 13.1 in Attaways MATLAB: A Practical
Introduction to Programming and Problem Solving.

Chapters 1 and 3 of Navidis Principles of Statistics for Engineers and Scientists, 1st edition, or any other
statistics text.

Chapter Goals

Describe a set of data or a specific datum relative to that set of data by using an appropriate
collection of statistics, which may include measures of central tendency, variation, and position. This
helps in your technical communication by clearly and concise describing a lot of data at once.

Compute the probability of a result for a given mathematical experiment. Your understanding of
probability is crucial to your ability to make statistical claims on an experiment.

Create a probability distribution for a numerical experiment. Your ability to work with probability
distributions is crucial to statistical hypothesis testing, so you should be able to both read and write
probability distributions.

4.1 Descriptive Statistics

Youve probably already worked with descriptive statistics in the past. Simply put, this is the
organization and display of data. The other major component of statistics, and the one to which well
pay more attention, is inferental statistics, the use of statistical computations to draw conclusions about
a population. The rest of this chapter is organized to discuss frequency distributions, the three classes of
statistical measures, and basic probability theory.

4.1.1 Frequency Distributions

A frequency distribution is a table that shows a count of data corresponding to specific classes or
intervals. The frequency of a class is the number of data entries in that class. The cardinality of a data set
is the total number of entries, or the sum of the frequencies. For example, a list of the number of As,
Bs, Cs, Ds, and Fs received on an exam is a frequency distribution. The classes are the letter grades
and the frequencies are the number of students receiving those grades.

90-100%: 22 students; 80-90%: 15 students; 70-80%: 20 students; 60-70%: 3 students

Sometimes we are interested in percentages, or relative frequencies of classes. These are the frequency
values divided by the cardinality. In the list above, the cardinality is 60, and the relative frequency of the
90 100 class is 22 /60 = 0.37. Obviously we expect the sum of relative frequencies to be one.

52
The main graphical display of a frequency distribution is called a histogram. In a histogram, the classes
(sometimes called bins) are listed in increasing order on the horizontal axis and the (relative)
frequencies are marked on the vertical axis. A histogram is essentially a bar chart with the added
specification that adjacent bars must touch.

Other types of frequency graphs include stem-and-leaf plots, pie charts, and scatter plots, all of which
do the same thing: present a frequency distribution in graphical form.

4.1.2 Measures of Central Tendency

A measure of central tendency is a value that corresponds to a typical data set.

The mean of a data set is the average value of the entries. We represent the mean of a population with
and the mean of a sample with x. A weighted mean of N values f1, f2, , fN is given by

where wi is the weight of an entry. A typical use of a weighted mean from chemical engineering is the
molar mass of a mixture of gases, like air. In computing this molar mass, the molar mass of each
component of air is multiplied by its mole fraction (when simplified, this is 0.79 times 28 g/mol for
nitrogen added to 0.21 times 32 g/mol for oxygen, for a weighted mean of 29 g/mol). The mean of a
frequency distribution is given by the very same formula, actually, except now f represents the
frequency of a data value.

The median of a data set is middle entry of the set after all values have been arranged in numerical
order. If the cardinality of the data set is even, the median is the mean of the two middle values.

The mode of a data set is the entry that occurs with the greatest frequency. If no entry occurs more than
once, there is no mode for that data set. If two (or three, or more) data entries occur with the greatest
frequency, each entry is a mode and we call the set bimodal (or trimodal, or... you get the idea).

An outlier of a data set is an entry that is far from any measure of central tendency. Depending on your
data processing, you may be able to choose to remove such an entry from the set.

One last comment about measures of central tendency: we can compare one measure to another, or
look at a histogram, to add some qualitative comments to a data set. For example, we call a data set
symmetric if a vertical line can be drawn down the middle of a histogram and the halves are roughly
mirror images. If the histogram looks roughly rectangular, that is, the frequency of each class is
approximately the same, then the data is approximately uniform. If the histogram appears to be bell-
shaped, the frequency distribution may be normalmore on that type of distribution later. One final
qualitative description of a frequency distribution is its skew. If the median is greater than the mean, we
say the data is left-skewed, or negatively skewed. If the median is less than the mean, the data is right-
or positively skewed. (Chances are, in a histogram, that the data set will have a tail in the direction of
which it is skewed.)

53
4.1.3 Measures of Variation

The range of a data set is the difference between the maximum and minimum values of a set.

The deviation of a single entry x in a population data set is the difference between its value and the
mean, that is, x . If you sum the deviation of all entries in a set, you will get zero (and if this is not
obvious, take a moment to convince yourself). Thats part of the reason we square deviations in order to
get usable measures of variation.

The variance of a data set is computed differently, depending on whether you are working with the
population or a sample. The variance of a population is denoted 2 and computed as

where N is the number of values in the entire population. The variance of a sample is denoted s2 and is
calculated similarly, but the denominator is n 1 instead of N, where n represents the sample size.

One notable disadvantage to working with variances is that they are not very intuitive. Take a look at the
units on 2. Whats a square dollar? Does square inch carry the same meaning as an area? Not
exactly. Thats why we work more with standard deviations than variances. The standard deviation of a
data set is simply the square root of the variance. Its denoted s for a sample and for a population.

For a normal distribution, there is an empirical rule, often called the 68-95-99.7 Rule, which says that
roughly 68% of data lies within one standard deviation of the mean, 95% within two standard
deviations, and 99.7% within three standard deviations. (Note again another example of the heuristic
for all practical purposes, four is infinity!) So the question you may have is, when do we know were
working with a normal distribution? Well discuss this in section 4.4.

For any distribution of data, the Russian mathematician Pafnuti Chebychev proved that the fraction of
data lying within k standard deviations of the mean is at least 1 k 2 provided k > 1. This applies to all
distributions, but when you know more about your specific probability distribution, you can be more
specific than this.

4.1.4 Measures of Position

While central tendencies and variations are more useful for statistical computations and inference,
there is a third type of statistical measures called measures of position.

A fractile is a partition of the data set into roughly equal sizes. Most commonly used are quartiles, which
divide the data set into four parts. The second quartile is the median of the data set. The first quartile is
the median of the data between the minimum and the median. The third quartile is the median of the
data between the median and maximum. We may denote the ith quartile as Q i.

54
The interquartile range of a data set is the difference between the third and first quartiles, that is,
Q3 Q 1 .

The five-number summary of a data set is the minimum, three quartiles, and maximum. You may have
used these numbers in the past to create what is called a box-and-whiskers plot, which I wont discuss
here, so dont worry if you dont know what that is.

Finally, the most important measure of position is the z-score, sometimes called the standard score. The
z-score of a data entry x is the number of standard deviations that x is away from the mean. The formula
is

and in many cases the population mean and population standard deviation can be replaced by their
respective sample mean and sample standard deviation.

4.1.5 Practice Problems

1) Find the mean, median, mode, and standard deviation of the set of integers between 1 and 20.

2) A data set has a mean of 7 and a standard deviation of 1.2. What is the z-score of the value 9?

3) A data set has a mean of 4 and a standard deviation of 0.5. What value will have a z-score of -3?

4.2 Introduction to Probability

Much like how I recommend for you to take an actual linear algebra class in the chapter of this packet
on linear algebra, Im also going to suggest taking engineering statistics (or ENCH 459, our departments
upper level course in statistics) if you want more background on probability. For the purposes of ENCH
225, Im going to try to give you just want you need to know to do the experimental analysis required in
our core curriculum. Virtually everything we touch on here will matter in the physical lab component of
this course, of your senior chemical engineering lab, or both. I have included two optional sections
past this one to hit on concepts to probability that are related but not required for this course.

As well see in the next chapter, the probability computations we care about are all embedded within
inferential statistics. In rough terms, we want to be able to quantify how certain we are of a given result.
One way to do this (and what youve probably done up to now) is report results as a mean and standard
deviation. Perhaps you have computed more accurate error bars (plus or minus one standard
deviation is sometimes useful, sometimes you need more than that) and well hit that topic in the next
chapter. Another way to do this is to compare your results to literature (well call this a one-sample
test, because you are collecting one set of data and comparing it to an already established value), or to
compare one set of results to another (a two-sample test). We want to be able to answer the
question, Based on our experiment and data, what is the probability that our result is different than the
literature (or different than another related result)?

55
In order to answer this question, we must become somewhat skilled at using probability distributions.
By the end of this course, we will have looked at four different kinds of probability distributions. The one
youve probably seen before is the normal distribution, also known as the Gaussian distribution, or the
bell curve. There is a separate handout in our course website that gives a table of values from this
distribution.

The formula for the probability density function of the normal distribution is given by

where and are the mean and standard deviation of the distribution. For a normal distribution, the
mean, median, and mode are equal. Further, the normal distribution is symmetric. Essentially, the mean
determines the location of the distributions peak, and the standard deviation determines how
spread out the distribution appears.

As is typical in engineering, wed like to standardize and nondimensionalize the normal distribution so
that we only have a standard normal distribution to worry about. So, we replace the datum x with its z-
score z, and the formula for the standard normal distribution becomes

If a data set is normally distributed, then the most likely observation (single piece of data) you would
select, if you were to choose at random, is the mean and the median. 68% of all data lies within one
standard deviation of the mean (do you see why reporting just plus or minus one standard deviation is
only sometimes useful?), 95% of all data lies within two standard deviations of the mean, and 99.7% lies
within three standard deviations. Also if data is normally distributed with known mean and standard
deviation, we can determine the exact percentile of any given piece of data that is, what percentage of
the data has values that are less than or equal to that particular point?

For a probability density function fP(x), the probability that x lies between xL and xU is

For a normal distribution, we have a scary looking function to integrate, and one we cant do
analytically. However, that function has proven important for a number of applications, so its been well
studied, and statistics books include numerical tables of integrals of the standard normal distribution.
Computer programs like Excel and MATLAB also have built-in functions that provide these numerical
values. You can check the course files in Blackboard for my table (this is what I will give you for exams),
or consult another statistics book or (for once) a website like Wikipedia.

A brief note on reading your typical z-table. Youve definitely encountered tables like the z-table before.
Table B.3 in Felder in Rousseau does the same read the number off the left and the decimal off the top
thing to help display the vapor pressure of water for a wide range of temperatures. Along the leftmost
column is a range of z-values from 0.0 to 3.4, and along the topmost row is a range from .00 to .09. The
idea is that for any z-score from 0.0 to 3.49, you find the appropriate row for the tenths place in the z-

56
score, then trace to the right to the appropriate column for the hundreths place. The number in this
position is the integral of the standard normal distribution from zL = to the z-score of interest.
Further, because the normal distribution is symmetric, if you need the value for a z-score between 3.49
and 0, you simply subtract the table value for the positive z-value from 1. For most practical purposes,
the integral for values greater than 3.49 is 1 (and for values less than 3.49 is 1 1 = 0).

4.2.1 Practice Problem

Determine the area of the normal distribution (a) between z=- and z=-2, (b) between z=-1 and z=3, and
(c) for the region greater than z=1.5.

4.3 (Optional Section) More on Probability Theory and Combinatorics

An experiment is an action that produces a specific result where this result is not always necessarily the
same. This result is called an outcome. The set of all possible outcomes is called the sample space. An
experiment that produces a single outcome is called simple. A subset of the sample space (that is, a
collection of some of the outcomes) is called an event.

The theoretical probability (or classical probability) of an event E, written P(E), assuming all outcomes
are equally likely, is the number of outcomes in E divided by the total number of outcomes. (Example:
the probability of flipping a fair coin twice and getting heads exactly once is 2/4.)

The statistical probability (or observed or empirical probability) of an event E is the frequency at which E
is observed divided by the total number of observations. (Example: if I flipped a fair coin 100 times and
got heads exactly 49 times, the probability is 49/100.)

The Law of Large Numbers basically claims that if an experiment is repeated enough, the statistical
probability will approach the theoretical probability.

The complement of an event E is the subset of all outcomes that are not outcomes in E. In set notation,
if we write the sample space as S and the complement as E, and

E = S \ E.

Use of the complement is helpful when its easier to compute the probability of E than that of E,
because P(E) = 1 P(E).

A conditional probability, written P(B|A) and read the probability of B given A, is the probability of an
event (B) occurring provided that another event (A) has already occurred. The probability of two events
A and B occurring is P(A&B) = P(A)P(B|A).

Two events are called independent if the occurrence of one does not affect the probability of the
occurrence of the other. If A and B are independent, then P(B|A) = P(B) and P(A|B) = P(A). Using this
definition, it should be straightforward that the probability of two independent events occurring is
simply the product of their probabilities.

57
Two events are called mutually exclusive if they cannot occur simultaneously.

Combinatorics is the mathematics of counting. This is where youd learn how many different ways you
could hand out nine pieces of candy to four people, or figure out the number of socks you need to draw
from your drawer in the dark to make sure you have a match. Why do we need to waste our time
learning how to count? Because we need to know how to find the total number of outcomes for a
probabilistic event, of course!

Fundamentally, if event A can occur in j ways and event B can occur in k ways, then the number of ways
events A and B can occur sequentially is j times k.

The factorial of a number n, denoted n!, is the product of all integers from 1 to n. The factorial is also
defined for n = 0 such that 0! = 1. The factorial is used to compute permutations, or ordered
arrangements of objects. The number of different permutations of n objects is n!, and by the definition
of factorials, the number of ways we can arrange zero things is one.

Sometimes we wish to know the number of permutations on a set where we are selecting a subset of all
possible items. For example, how many seven-digit numbers can we write without repeating a single
digit? Since there are ten possible digits to choose from and we are selecting seven, we can write this
permutation computation as 10P7 and computer it according to the following formula, given for the
number of permutations on n objects taken r at a time:

Sometimes the items we want to arrange are indistinguishable. For instance, what if we wants to count
how many seven-digit numbers we could write while allowing for repeated digits? If a digit is repeated,
is it different the first time it is used compared to the second time? (Is 1234566 different than 1234566?
But I switched the 6s! Nope, its the same.) So, we can compute the number of distinguishable
permutations on n objects (where n1 are of one type, n2 are of another type, , and nk are have another
type, with n=n1+n2++nk), as .

When order doesnt matter, we are computing combinations, which are numerically equivalent to
binomial coefficients (more on those soon). The number of combination of n objects taken r at a time,
stated n choose r for short, is written and computed as

Binomials are covered more in the next section.

4.4 (Optional Section) Computations with Probability Distributions

Probability distributions are analogous to frequency distributions (and actually, if you consider the
definitions in prior sections, you could say that a relative frequency distribution is an observed
probability distribution).

58
A random variable represents a number that in turn represents the outcome of a probability
experiment. Such a variable is discrete if there is a finite number possible outcomes and continuous if
there is an infinite number of possible outcomes. (More often than not but not always! a discrete
variable is limited to integers, while a continuous variable can represent any real number.)

A probability distribution displays the probabilities of all potential values that a random variable can
assume. A probability distribution is discrete if the variable in question is discrete, and continuous if the
variable is continuous. There are two conditions that all probability distributions satisfy. First, each
random variable must have a probability P such that 0 P 1. Second, the sum of all discrete
probabilities (or the integral over all continuous probabilities) must be one. If either of these conditions
are violated, then the distribution is not a probability distribution.

4.4.1 Binomial Probability Distributions

A discrete probability distribution can be visualized as a relative frequency histogram. They are
effectively the same thing. In fact, we have a lot of definitions that are analogous to descriptive statistics
definitions, but they are stated here for completeness. The mean of a discrete random variable is
analogous to the weighted mean of a data set. That is,

where P(x) is the probability that the result x occurs. The mean of a random variable is also referred to
as its expected value (denoted as E[X]) or its first central moment.

The variance of a discrete random variable is computed by

and the standard deviation is the positive square root of the variance. The variance of a random
variable is sometimes also called its second central moment (denoted as E[X2]).

We will devote our attention to one discrete probability distribution in particular: the binomial
distribution. This is the probability distribution of a binomial experiment. In this case, the same
experiment is repeated for a fixed number n of trials. In each trial, there are two outcomes, which are
traditionally labeled as success S and failure F, but it can be heads/tails, 0/1, on/off, whatever. Each
trial is identical; that is, the probability P(S) is the same in each trial. The random variable x is then the
count of successful trials.

Example: Flipping a fair coin 10 times. The outcomes are either heads (H) or tails (T). In this case,
P(H)=P(T). The variable x represents the number of times heads turns up, and can take on any whole
number value from 0 to 10.
Example 2: Drawing a playing card out of a standard deck of 52 cards and returning the draw to the deck
and shuffling between trials, repeating 6 times. Denote two outcomes, say, hearts (H) or not hearts (N).
In this case, P(H) = 0.25, P(N) = 0.75, and x is a whole number between 0 and 6.

59
In a binomial experiment with n trials, probability of success p, probability of failure q = 1 p, the
probability have having exactly x successes is

The binomial distribution is simply the collection of all probabilities of all possible outcomes x. We could
plot them all together in a histogram to get a good visual idea of the probability distribution. You may
notice for certain values of p and n that the distribution can look symmetric, or even bell-shaped! More
on that in a moment

Using the definitions for mean, variance, and standard deviation of a probability distribution, we can
compute these values for a binomial distribution. If this were a math class, Id have you show this result,
but were engineers, so you can have the result for free! The equations become = np, 2 = npq,
= .

4.4.2 Uniform and Normal Probability Distributions

There are five kinds of continuous probability distributions that we will use this semester. Three of
them, the t-distribution, chi-square distribution, and the F-distribution, are used nearly exclusively for
confidence intervals and hypothesis testing, so well talk about them in those sections. The other two,
the uniform and normal distributions, can be used directly for probabilistic calculations, so they are
discussed here.

Much like the concept of Riemann sums from calculus, a continuous probability distribution is essentially
the limit of a discrete distribution as the number of discrete random variables tends to infinity. In the
same way, you can imagine a histogram (discrete probability distribution) being divided into thinner and
thinner bars on the number line until we have a continuous probability distribution, often called a
probability density function (PDF). Remember that there are two properties to a probability distribution
that are still valid here: no point on the PDF takes on a value greater than one, and the integral of the
PDF over all random variables is equal to one.

One major difference between discrete and continuous probability distributions is that the probability of
a single value of a variable is different. For example, consider the experiment of flipping a coin five
times, and x is the number of heads. For every valid value of x (0, 1, 2, 3, 4, or 5), the width of the bar
in the histogram is 1 unit, and the probability corresponding to x (height of bar times width of bar) is
nonzero. The probabilities for all other values of x are zero (its impossible to have 6 heads, or -1 heads,
or 2.3 heads!). In continuous distributions, probabilities correspond to ranges of values. Now consider a
second experiment where a computer program generates a real number between 0 and 5 inclusive.
Every real number from 0 to 5 is a valid possible result, but the probability corresponding to each value
(still the height of bar times width of bar) is actually zero (because the width of the bar is now zero).
So the probability that the computer program chooses exactly 1.5 is zero, but the probability that the
computer program chooses a number between 1 and 2 would be nonzero.

Was that last bit confusing? It will make sense with some examples Ive tucked them all in Section
4.4.5.

60
The uniform distribution assumes that all random variables are equally likely. The PDF is a horizontal
line, and its height depends on the range of random variables, since the integral (area of the rectangle,
effectively) is equal to one. A uniform distribution has a mean equal to its median, which is the value
halfway between the maximum and minimum, and a variance equal to the square of the range divided
by 12.

The normal distribution is the familiar bell curve. The formula for the PDF of the normal distribution is
given by

where and are the mean and standard deviation of the distribution. For a normal distribution, the
mean, median, and mode are equal. Further, the normal distribution is symmetric. Essentially, the mean
determines the location of the distributions peak, and the standard deviation determines how
spread out the distribution appears.

As is typical in engineering, wed like to standardize and nondimensionalize the normal distribution so
that we only have a standard normal distribution to worry about. So, we replace the random variable x
with its z-score z, and the formula for the standard normal distribution becomes

4.4.3 Normal Approximation of Binomial Distribution

In the case of a large enough sample size n and a particular success rate p, the binomial distribution
(which, as you will recall, is discrete) can be approximated as a normal distribution (which is
continuous). The main advantage of making this approximation is that we can extend ideas that involve
the normal distribution to discrete problems! As a general rule of thumb, when np5 and n(1-p)5 this
approximation is valid, but the large np and n(1-p) are, the better.

To convert a binomial probability distribution into a normal distribution, there are four main steps:
(1) Confirm that np5 and n(1-p)5. If you cant figure out what n, p and q=1-p are for your
particular problem, you might not have a binomial distribution!
(2) Compute the mean and standard deviations for your binomial distribution. Remember, =np
and = .
(3) Choose the value of x necessary to answer the posed question. This is probably the trickiest part
because we have to keep in mind the difference between discrete and continuous probabilities.
In a binomial distribution where x only takes on integer values, we can directly compute the
probability P(x=7). For a continuous distribution, we have to select a probability on a range of x-
values, say P(6.5x7.5).
(4) Convert the x-value you chose to a z-value. This is straightforward (see Section 4.1.4).

61
4.4.4 The Central Limit Theorem

We have established that discrete and continuous distributions are related through changing the size of
the discretization of a probability distribution, but we havent made this particularly useful yet. We need
to mention just a few more definitions, and then we can establish a widely used statistical theorem.

When samples of size n are repeatedly taken from a population probability distribution, and a specific
statistic of each sample is calculated we call this a sample statistic. If we compute the probability
distribution of a sample statistic, we call this a sampling distribution of the sample statistic.

The central limit theorem, which we will not prove, but we will certainly use, is arguably one of the most
important theorems in statistics. It states that
If samples of size n 30 are drawn at random from any population with mean and standard
deviation , then the sampling distribution of sample means is approximately a normal
distribution. The larger the number n, the better the approximation.
If a population is normally distributed, then the sampling distribution of sample means is
normally distributed regardless of n.

In either case, if we take n samples of a population probability distribution, the mean of the sample
means, denoted x, will be equal to the population mean . The standard deviation of the sample
means, denoted x, is equal to the population standard deviation divided by . The standard
deviation of the sampling distribution of the sample means (whew!) is called the standard error.

What is so important about this theorem, you may ask. Most of the time, we cant know the true mean
and standard deviation of a population, because the population is too large for this to be practical. What
the central limit theorem tells us is that if we randomly sample the population and compute the mean
and standard deviation of that, we can approximate the mean and standard deviation of the entire
population! Well look at examples in the next section.

4.4.5 Calculations with Probability Distributions

So we can graph probability distributions. So what? What are they good for? Well, now we can use
these distributions in order to perform probabilistic calculations.

Often we are interested in computing the probability that the random variable x is greater than a certain
value. (Alternatively, we may want the probability that x is less than a certain value, or even that x is
between two values.) Probability distributions make it easy to find this information.

In terms of a discrete random variable, phrases like at least, at most, exactly, and no more than
are important to determining our inequality. Say we are performing a binomial experiment 10 times.
At least 4 means {4, 5, 6, 7, 8, 9, 10}.
More than 4 means {5, 6, 7, 8, 9, 10}.
No more than 4 means {0, 1, 2, 3, 4}.
At most 4 means {0, 1, 2, 3, 4}.
Exactly 4 means {4}.
Notice that sometimes the value 4 was included in our set of interest, and other times it wasnt. This
distinction can be important in discrete probabilities. On the other hand, these distinctions dont matter

62
in a continuous distribution, because theoretically, every continuous variable has a zero probability of
occurring. It is only when we consider ranges of continuous variables that we have a non-zero
probability.

4.4.6 Probability Examples

Here are some examples to finally put all this information to use.

Example 1. A chemical reactor is operational 70% of the time. The project manager chooses to observe
the reactor at random times over the course of the day. If he checks on the reactor 10 times in one day,
what is the probability that the reactor will be operational at least 7 times?

Solution 1. This is clearly a binomial distribution. Either the reactor is operational or not. By the phrasing
of the question, we want to know the probability that the reactor will be operational 7, 8, 9, or 10 times.
Those four probabilities can be computed separately, then added together.
P(7) = 10C7 0.770.33 = 0.2668
P(8) = 10C8 0.780.32 = 0.2335
P(9) = 10C9 0.790.31 = 0.1211
P(10) =10C10 0.7100.30 = 0.0282
P(x 7) = P(7) + P(8) + P(9) + P(10) = 0.6496
So the probability is about 65%.

Remark. Note that if we were asked for the probability that the reactor is operational exactly 7 times,
we computed that already tooits about 27%.
Remark. Note that we cannot use the normal approximation in this example because n = 10, p = 0.7, and
q = 0.3. nq = 3 is less than 5.

Example 2. A chemical reactor is operational 70% of the time. The project manager chooses to observe
the reactor at random times over the course of a month. If he checks on the reactor 100 times in one
month, what is the probability that the reactor will be operational at least 70 times?

Solution 2. This is a binomial distribution with a large number of trials (n = 100). Since p = 0.7 and q =
0.3, both np and nq are greater than 5, so we can normally approximate this. We compute = np = 70
and = (npq)1/2 = 4.583, and choose x = 69.5 since we want to include all numbers greater than or equal
to 70. Then our z-score is z = (69.5 70)/4.583 = 0.11. We look up the value for z = 0.11, which is
0.5438. This is our answer, 54.4%.
Remark 2. Actually, we wanted the value for z = 0.11, right? So we should take 1 0.5438 to get
0.4562. However, this is the probability that z < 0.11, and we want the probability that z > 0.11, which
means we need 1 0.4562, which is 0.5438 again. (Remember that the z-table gives us the integral from
z = up to the z we care about.)

Example 3. A random number generator produces some real number between 1 and 10 such that the
probability of any number between generated is the same. What is the probability that the number is
less than 5?

63
Solution 3. This is a uniform continous distribution. The range of values is 10 1 = 9, and the integral
from 1 to 10 must be 1, so the PDF is fP(x) = 1/9. Then the probability that x is less than five is 1/9(5 1)
= 4/9.

Remark 3. Even though fP(x) = 1/9, the probability of a single value x occurring is zero, by same
application of the formula: 1/9(x x) = 0.

Example 4. A deck of cards with face cards and aces removed is shuffled and a card is drawn at random.
What is the probability that the card has a value greater than 8?

Solution 4. This is a single event, not requiring a probability distribution. But we can still easily obtain the
answer. Our possible results are discrete, and they are 2, 3, 4, 5, 6, 7, 8, 9, and 10. Of these, only 9 and
10 are greater than 8. Since 9 and 10 are two of the nine options, the probability is 2/9.

Remark 4. Look for the signs that a probability distribution is necessary! If its not explicitly mentioned,
or an experiment isnt repeated, or there isnt an infinite number of possible results, we may not need a
distribution calculation at all.

Example 5. The life expectancy of a certain brand of automobile tire is normally distributed with a
population mean of 30000 miles and a population standard deviation of 3000 miles. What is the
probability that a tire will last less than 25000 miles? More than 33000 miles?

Solution 5. The z-score for 25000 is (2500030000)/3000 = 1.66. From the z-table, the probability that z
1.66 is 0.9515. So the probability that z 1.66 is 1 0.9515 = 0.0485, or about 4.9%. The z-score for
33000 is (33000 30000)/3000 = 1. From the z-table, the probability that z 1 is 0.8413. So the
probability that z 1 is 1 = 0.8413 = 0.1587, or about 16%.

Remark 5a. Another way to solve this one: remember the 68-95-99.7 rule of thumb. We know that the
area under the normal distribution from z = 1 to z = 1 is 0.68, so the area from z = 0 to z = 1 is 0.34. The
area from z = to z = 0 is 0.5, since the distribution is symmetric about z = 0. So the area from z =
to z = 1 is 0.5 + 0.34 = 0.84 (using the table, we found that its 0.8413). Using this rule of thumb can save
time at the cost of precision if youre comfortable with it.

Remark 5b. We can also compute the probability that the tire will last between 25000 and 33000 miles
by taking the p-value for z = 1 and subtracting the p-value for z = 1.666.

Example 6. The life of a particular brand of batteries can be represented with a population normal
distribution with mean 10 hours and standard deviation 1 hour. What battery life marks the 75th
percentile of all batteries? (What value will only 25% of batteries manage to exceed?)

Solution 6. The p-value we care about is 0.75, which falls between 0.7486 and 0.7517, the p-values
corresponding to z = 0.76 and z = 0.77. So lets say z = 0.765. Then x = z + = 10.765 hours.

Remark 6. When we know the probability P and need the z-score, we usually have to linearly interpolate
it from the z-table.

Example 7. Samples of twenty-five of the batteries described in the prior problem are taken. What is the
mean and standard deviation of the life of this sample?

64
Solution 7. By the central limit theorem, the sample means of a normal distribution are normally
distributed no matter what n is. In this case, n = 25. The mean is equal to the population mean, or 10
hours. The standard deviation is 1/251/2 = 0.2 hours.

Remark 7. We could answer further questions about probability of this sample the same way as in
Examples 5 and 6.

Example 8. One hundred samples of the concentration of a process reactor are taken. These samples
have a mean concentration of 2 moles per liter and a standard deviation of 0.01 moles per liter. What is
the mean and standard deviation of the entire reactor?

Solution 8. By the central limit theorem, the mean is 2 moles per liter. The sample standard deviation
/n1/2 = 0.1, so = 0.01(100)1/2 = 0.1 moles per liter.

65
5 Introduction to Inferential Statistics
Recommended for Further Consideration:

Sections 4.3, 5.4, 6.1, 6.2, 6.4, 7.3, and 7.4 of Navidis Principles of Statistics for Engineers and Scientists,
1st edition, or any other statistics text.

Chapter Goals

Compute confidence intervals (error bars) for a given experimental result or statistic. Error analysis is
critical to virtually all experimental settings. It also communicates your understanding of experimental
results.

Create a hypothesis for a given experimental objective.

Make statistical claims related to a set of data using formal hypothesis tests. All scientific and technical
communication about experimental findings requires statistical analysis. You need to understand what
makes a claim statistically significant.

Descriptive statistics are used to characterize existing data. Inferential statistics make use of those
characterizations to make statements regarding a population based on just a sample from that
population. Youve run into inferential statistics in a variety of ways, inside and outside of science and
engineering. Opinion polls that express a margin of error are using similar computational methods to
scientists who are printing error bars on graphs.

If youve done any lab work or seen any experimental reports, youve probably encountered error bars
on graphs. Error bars denote the probable maximum and minimum value some computed data could
actually assume if an experiment were repeated several times. The more times a researcher successfully
completes an experiment, the smaller these error bars can become. Usually, the report will declare a
point-value estimate of a data computation and include these error bars as an interval-value estimate.
An interval is just another word for a range of values.

5.1 Confidence Intervals

Margins of error and error bars are examples of confidence intervals. A confidence interval is a range of
numbers that is supposed to contain a population parameter (or, more generally, any calculated
quantity) with some predetermined accuracy. The level of confidence 1 is the probability that the
confidence interval contains the population parameter. Typical values for 1 are 0.90, 0.95, and 0.99
(meaning is 0.1, 0.05, and 0.01, respectively, in these cases).

Typically, when reporting experimental results, we are interested in giving a confidence interval on the
mean value of a particular finding. Depending on the experiment, it may also be important to compute
confidence intervals on other statistical measures sometimes, we wish to provide a range of values for
a variation or standard deviation, and in binomial experiments, we may try to bound a proportion. For
the rest of this section, we will run through the math necessary for each of those situations.

66
5.1.1 Confidence Intervals on the Mean

Often when you hear that something is statistically significant, it means some specific mathematical
analysis was conducted. Such analysis can include comparing confidence intervals or running a
hypothesis test (more on the latter in Section 5.2 and beyond). We can make statistical claims on
virtually anything. The simplest place to start, though, is the population mean.

Its important to note here that these procedures make one very important assumption: either the
variable of interest is known to be normally distributed or the number of samples taken is greater than
30. If neither of these statements is true, we cannot use these techniques. If one or the other is true,
then we can.

For Large Samples or Known Population Standard Deviation

The critical value xc of a confidence interval is the value of a probability distribution such that the
probability that P(xc x xc) = 1 , where x is the standardized measure that were interested in (this
will either be z, from the normal distribution, or t, from the distribution described in the next
subsection). The only distribution weve discussed so far for which this applies is the standard normal
distribution, where the critical value is denoted zc, since the horizontal axis of this distribution is the z-
score.

The width of a confidence interval is determined by the maximum error of an estimate. For a given ,
the maximum error E is the largest value between the point-value estimate and actual value of a
population parameter:

When n 30 we can use the sample standard deviation s instead of the population standard deviation ,
per the Central Limit Theorem. Thats helpful, since the population standard deviation is virtually never
known.

Using this formula for E, if you have some idea of the mean and standard deviation of your data, you
should be able to estimate n in the case you have a desired maximum error. This would tell you how
many samples you need to take (or how many experiments you need to run) in order for your error bar
to have a specified width.

Since the typical values for 1 are 0.9, 0.95, and 0.99, its helpful to remember the critical values zc for
these three values. They are 1.645, 1.96, and 2.575, respectively.

The c-confidence interval for the population mean is

where E is the maximum error of the estimate. We can write confidence intervals using a bracket
notation like [x E, x + E] or as a central value plus or minus the error, like x E. The probability that is
contained within this interval is 1 = c.

67
Notice that for the population mean, the confidence interval is symmetric. This is because the normal
distribution is symmetric. What we are doing is finding the interval such that the area under the
distribution for this interval is 1 and the area under the distribution on either side outside the
interval is /2.

The confidence interval is a range of values that should bound a specific measure of a data set, not the
entire data set itself. In the case of the confidence interval on the population mean, we are attempting
to bound the mean of the data set. Its easy to forget this sometimes!

Example. Ten samples of the conversion of a reactor with known population standard deviation of 0.02
were taken. The conversions were 0.88, 0.89, 0.9, 0.87, 0.89, 0.88, 0.91, 0.85, 0.88, 0.91. What is the
95% confidence interval for the population mean conversion of this reactor?

Solution. The mean x of these ten values is 0.886. Since is known to be 0.02, we can compute the
standard error E directly: E = 1.96(0.02)/(10)1/2 = 0.012. So we can say with 95% confidence that the
population mean conversion is xE = 0.8860.012 = [0.874, 0.898].

For Small Examples and Unknown Population Standard Deviation

So far, we have stated that is known or n 30 so that the above definitions and calculations are valid.
Of course, we need to account for the case when neither of those assumptions is true, and therefore,
we need to introduce a new probability distribution.

The t-distribution (sometimes referred to as Students t-distribution) is the sampling distribution of


sample means x. Analogous to the normal distribution, we can compute the t-score for a mean x by

The t-distribution has many of the same properties as the standard normal distribution: it is symmetric,
has a mean, median, and mode equal to zero, and the area under the distribution is one. However,
there is a different t-distribution for every value of the sample size n. In order to use the correct t-
distribution at any given time, you must know the statistical degrees of freedom of your data. The
degrees of freedom for the t-distribution is simply n 1 in this situation.

Note that statistical degrees of freedom is not exactly the same as the process degrees of freedom we
considered back in Chapter 1 (though they are related in spirit). Mathematically, the degrees of freedom
is the number of elements that must be specified for a vector to completely defined. In chemical
engineering analysis, having zero degrees of freedom makes a problem completely defined, and nonzero
degrees of freedom gives us some room to define values (design processes). In statistics, having zero
degrees of freedom means you have as many data points as you do statistical values to classify those
data points (which kind of defeats the purpose of statistics, since it aims to characterize data using as
few measures as possible).

The formula for the maximum error E is analogous to the normal case also; just replace zc with tc.

Example. Ten samples of the conversion of a reactor with unknown population standard deviation were
taken. Distribution of the conversions is known to be normal. The conversions were 0.88, 0.89, 0.9, 0.87,

68
0.89, 0.88, 0.91, 0.85, 0.88, 0.91. What is the 95% confidence interval for the population mean
conversion of this reactor?

Solution. The mean x of these ten values is 0.886. Since is unknown, we must compute s as well; this
turns out to be 0.0184. Since we have 10 samples, we have 9 degrees of freedom. For 95% confidence,
tc = 2.262. So now we can compute the standard error: E = 2.262(0.0184)/101/2 = 0.013. So we can say
with 95% confidence that the population mean conversion is x E = 0.886 0.013 = [0.873, 0.899].

5.1.2 (Optional) Confidence Intervals on Proportions

In addition to means, confidence intervals can be computed on proportions (and therefore any
measureable item where fractions come into play, like mass or mole fractions). A proportion is a
measure of the amount of a set that displays a specific characteristic typically, either an element of
that set has the characteristic or it doesnt. This on/off, yes/no, true/false type characteristic means we
have a binomial distribution to deal with! In order to compute a confidence interval, we will have to use
a continuous approximation on a binomial distribution.

A proportion can then be thought of the fraction of a data set that exhibits a certain characteristic. To
align this idea with the binomial distribution, we could say that success is exhibition of the
characteristic and failure is the lack of the characteristic. The terms success and failure are used in
the same way they were in a binomial distribution. We (unfortunately?) denote a population proportion
as p (hopefully it will be clear when p is a proportion and when it is a probability. Youd think the first
people to develop this theory would have been more careful) and the sample proportion as = x/n,
where now x is the number of observed successes in a sample. We also define the variable = 1 .

After we have applied the normal approximation to a binomial distribution, we can compute the
standard error in much the same way that we did for confidence intervals on the mean. A confidence
interval on a population proportion p is then

where the error E may be rewritten making use of =n and = , so

Example. In a recent Gallup poll of 1024 U. S. adults, 287 said that their favorite sport was football. Find
the 95% confidence interval for the population proportion of American adults who say their favorite
sport is football.

Solution. First we note that this is a binomial distribution: either footballs your favorite sport, or its not.
Then we compute = 287/1024 = 0.28 and = 1 = 0.72. Remember that for a normal approximation
to be valid, n and n must be greater than 5. In this case, n = x = 287 and n = n x = 1024 287 =
737, so were good. For 95% confidence, z c = 1.96, so E = 1.96[(0.28)(0.72)/1024]1/2=0.028. So our
confidence interval is E = [0.252, 0.308]. This means we can say with 95% confidence that the

69
proportion of all American adults who say football is their favorite sport is somewhere between 0.252
and 0.308.

5.1.3 Confidence Intervals on Variances

Control and measurement of variance is important in manufacturing. A manufacturer wants to be able


to produce a large number of the same thing with few to no defects or variations among them. (Imagine
getting a box of screws with significantly different lengths when theyre all supposed to be the same, or
having one box of cereal taste entirely different than the next!) We wont discuss how to control
variation, since thats outside the scope of statistics (perhaps something you will discuss in process
control?), but we will discuss how to measure this variation.

We would like to construct the confidence interval for a variance. In order to do this, we need to
become familiar with yet another continuous probability distribution. As was the case for confidence
intervals for means, these calculations only work when a specific assumption is valid. In this case, the
measured variable x must be normally distributed.

The chi-square distribution is used to measure population variation compared to sample variation. Here,

The chi-square distribution has some of the same qualities as the normal and t-distributions: namely,
the integral of the entire distribution must be one, and the degrees of freedom is equal to n 1.
However, being a squared value, obviously 2 is always positive, and therefore the distribution is not
symmetric, but rather right-skewed.

Because the 2-value is not symmetric, the confidence interval for variances is not symmetric and we
cant simply compute one error value to add and subtract from the mean variance.

The c-confidence interval for population variance 2 is given as

where the two chi-square values, right and left, correspond to a given degrees of freedom n-1 and
confidence level c=1-. The probability that the population variance 2 is within this interval is c.

Looking up values of 2 is a little more complicated than it was for z or t due to the lack of symmetry in
the distribution, but the idea is still the same: we want the area under the distribution within the
interval to be equal to 1 and we want the area on either side of the distribution in the tails to be
/2.

The same procedure is used to determine confidence intervals for standard deviations. Since the
standard deviation is just the square root of the variance, the endpoints of the confidence interval for
the standard deviation are just the square roots of the endpoints of the confidence interval for the
variance.

70
Example. We randomly select and weigh thirty samples of a pain medication. The sample standard
deviation is 1.2 mg. Assuming the product weights are distributed normally, what is the 99% confidence
interval for the population variance and standard deviation?

Solution. For 99% confidence, the area from 2L to 2R is 0.99. The area in each tail is half of 1 0.99, or
0.005. So we need to be looking in the 0.995 and 0.005 columns of the table. We have 30 1 = 29
degrees of freedom, so 2L = 13.121 and 2R = 52.366. We compute the left endpoint to be
(301)(1.2)2/52.336 = 0.798 and the right endpoint is (30 1)(1.2)2/13.121 = 3.183. So 0.798 < 2 < 3.183,
and taking square roots, 0.89 < < 1.78. We can say with 99% confidence that 2 and satisfy these
inequalities.

5.1.4 Practice Problem

Since we have not begun physical experiments yet, we will use our ability to generate random numbers
to perform a simulation instead. Today we want to get a better idea of what exactly a confidence
interval, and a level of confidence, is, so here is a demonstration.

Use MATLAB to do the following one hundred times (Hint: what kind of loop should you use?):

Use the randn command to generate 200 random numbers of mean 5 and standard
deviation 2. Use the built-in MATLAB functions to compute the mean and standard
deviation of this set of 200 numbers, and then compute the 90% confidence interval on
the mean.

A 90% confidence interval should signify that if you performed this experiment an infinite number of
times, the confidence interval you compute will contain the true mean of the population (in this
numerical experiment, that mean is 5) exactly 90% of the time.

Of the 100 times you performed this experiment, how many times did the confidence interval contain
the true mean of 5? (What do you figure is the most probable number of times the confidence interval
contains the true mean? Because of the nature of random number generation, your answer may not be
exactly that value, and thats okay.)

5.2 Single Sample Hypothesis Testing

Hypothesis testing is a crucial component of both inferential statistics and experimental design.
Hypothesis testing is used to make statistically significant quantitative claims about population
parameters. The idea behind hypothesis testing is that we want to use statistical data from a sample of
the population to make statistical statements about the entire population. First we will consider
hypothesis testing on one sample, where we compare a mean or variance to a given value. Then we will
discuss hypothesis testing on two samples, where we compare the means or variances of each sample
and make a claim about them. In Chapter 10, well extend the idea to multiple-sample testing.

71
A hypothesis test is a process that uses statistical data to make claims about parameter data. For
instance, we may prove that the mean of a data set is greater than a certain value, or that the variances
of two different data sets are equal.

A statistical hypothesis is a claim about a population parameter. In a hypothesis test, we must explicitly
declare two hypotheses: a claim and its complement. This way, one hypothesis is effectively false and
the other is true.

The null hypothesis H0 is the statement about a population parameter than includes an equality. For a
population parameter and a test value k, possible null hypotheses are k, = k, and k.

The alternative hypothesis Ha (some textbooks write H1) is the complement of the null hypothesis, and
therefore the claim that does not include an equality. For the three possible null hypotheses listed
above, the respective alternative hypotheses are > k, k, and < k.

We read H0 as H naught or H sub zero and Ha as H sub a.

The claim we actually care about may be either the null or alternative hypothesis. Declaration of which
statement is the null one completely depends on where you can write an equals sign. The idea is
illustrated in a couple examples below.

Example 1. A university claims that its four-year graduation rate is 82%. What are the null and
alternative hypotheses for this situation?

Solution 1. Either the rate is 82% or its not. So H0 is p = 0.82, which is the claim. Then Ha is p 0.92.

Example 2. Nutritional labels in the United States currently allow the reported amount of trans fats to be
listed as zero grams per serving if the actual amount is less than 0.5 grams. If a food makes this claim,
then, what are its null and alternative hypotheses?

Solution 2. The claim is that the amount is less than 0.5 grams. This is our alternative hypothesis, since it
does not include an equality. That makes our null hypothesis that the food has an amount greater than
or equal to 0.5 grams.

As we will see later, these two examples follow the rules for statistical tests, but ultimately do not lead
to useful results. This is because the point of hypothesis testing is that we will try to refute the null
hypothesis (we call this rejecting the null hypothesis). If we can, then the alternate hypothesis is
supported by this data. Do note that I am not saying that we are proving anything. If we cannot reject
the null hypothesis (we literally call this failing to reject the null hypothesis), then we cannot reach any
particular conclusion.

If youve never done this kind of statistical analysis before, this can seem a little weird. You may ask
yourself, if you cant reject the null hypothesis, havent you just proven it? The truth is, you havent.
Imagine that youve lost your keys and youre convinced your friend slipped them in your backpack as a
prank and now they are just lying about it. If you open the backpack and find your keys, then you know
they were there. But if you dont find them, you dont know that theyre not there (maybe they were
cleverly concealed somewhere, or you rushed and overlooked them). Hypothesis testing is a little like
that. You are looking for evidence to prove someone or something is wrong. If you find that evidence,

72
then youve proven the thing wrong. However, if you didnt find the evidence, that doesnt make the
thing true, it just means you cant prove the thing wrong. If you find evidence, you can reject the null
hypothesis. If you fail to find the evidence, you fail to reject the null hypothesis.

And statisticians have come up with names for the types of mistakes you can make. Say that you were
searching your backpack for your keys and found them, but your friend wasnt lying they really didnt
put them there as a prank. But your finding the keys exposed them as a liar! Or say that you couldnt
find the keys when they were really there and you should have been able to find them! Both of these
mistakes could happen. We call them errors in statistics:

A type I error occurs when the null hypothesis is rejected and it actually shouldnt be rejected. By the
formulation of statistical calculations, the maximum probability of this happening is . We can design
statistical tests to force this value to be a specific probability (typically 0.1, 0.05, or 0.01), or thanks to
computers, can determine precisely the probability of a type I error..

A type II error occurs when the null hypothesis fails to be rejected and it actually should be rejected. The
probability of this occurring is denoted , and its computation is too complicated for even semester-long
statistics courses, let alone this one.

The value is officially called the level of significance when its used in hypothesis testing. As you
decrease , you generally increase (but not linearly).

Were going to explore hypothesis tests on three population parameters: means, proportions, and
variances. For each case, hypothesis testing works in roughly the same way. Here are the basic steps,
presented in two different ways using the same examples to show how they are related.

5.2.1 Method 1: Computing a Standardized Critical Value ( is known)

This method of statistical analysis is probably what you learn if you take a basic statistics class (think for
business majors instead of engineers). Computing the exact level of confidence or significance is
complicated, so you simply assume one (typically =0.01, 0.05, or 0.10) and perform these four steps:

1) The null hypothesis and alternative hypothesis are formulated. One is noted as the claim.
Usually you are aiming for the alternate hypothesis to be the claim (because again, the point is
to refute the null hypothesis, so if you can refute it, that supports the alternate).
2) The standardized critical value of the test is determined based on the critical level of
significance. The critical value is the most extreme value that the test statistic could take on in
order to not reject the null hypothesis; it effectively separates the rejection region of a
distribution from the fail to reject region. Then the test statistic that will assess the validity of
the null hypothesis is calculated (if it is not already given). For means, the test statistic is ; for
proportions, it is , and for variances, it is s2. Convert this value to its standardized test statistic,
which is just the z-value (or t-value, or 2-value) of the test statistic.
3) The standardized test statistic and standardized critical value are compared. If the test statistic is
beyond the critical value (that is, it is in the rejection region), then we can statistically reject the
null hypothesis (and declare that the data supports the alternate hypothesis ) for that level of
significance.

73
4) Interpret the result: conclude whether we reject the null hypothesis or fail to reject it, and
explain what that means for the given level of significance.

5.2.2 Method 2: Computing an Observed Level of Significance ( unknown)

If you take a course in statistics designed for STEM (science, technology, engineering, math) majors, you
will typically take this approach in hypothesis testing. It can be a little more challenging to compute the
observed , but it allows for a more precise conclusion effectively, the probability that your claim is
supported, rather than yes it is or no it isnt.

1) The null hypothesis and alternative hypothesis are formulated. One is noted as the claim.
2) The critical level of significance of the test is announced. This is the maximum acceptable
probability of making a type I error with our test, separating the rejection region of a
distribution from the fail to reject region. The test statistic that will assess the validity of the
null hypothesis is calculated (or given). For means, the test statistic is , (for proportions, it is ,)
and for variances, it is s2. This is converted to the standardized test statistic, which is just the z-
value (or t-value, or 2-value) of the test statistic. Finally, the test statistic is used to compute
observed level of significance, either by estimating from tables, or using computer software to
compute it directly.
3) The critical and observed levels of significance are compared. If the observed level of
significance is beyond the critical value (that is, it is in the rejection region), then we can
statistically reject the null hypothesis (and declare the data supports the alternative hypothesis)
for the given critical level of significance.
4) Interpret the result: conclude whether we reject the null hypothesis or fail to reject it, and
explain what that means for the given level of significance.

5.2.3 Statistical Computations for Hypothesis Tests: An Overview

Hopefully from the above lists, the only completely new and potentially complicated part is step 2. So
lets try and spell out exactly whats going on.

The critical value depends on the level of significance of the hypothesis and the degrees of freedom in
the appropriate probability distribution. Remember that basically our null hypothesis will either include
a , =, or and that the result of our test will be to either reject or fail to reject this hypothesis. In all
three cases, we need to decide how to compute the critical value. Once we know where this value is
coming from, then we just compare this value to the test statistic.

If H0 contains the symbol, the probability value comes from a left-tailed test. In this case, values of the
test statistic that are less than the critical value correspond to a rejection of the null hypothesis (in other
words, the rejection region is the area to the left of the critical value on a number line).

If H0 contains the symbol, the probability value comes from a right-tailed test. Values of the test
statistics that are greater than the critical value result in a rejection of the null hypothesis (the rejection
region is the area to the right of the critical value).

74
If H0 contains the = symbol, the probability value comes from a two-tailed test, which means values of
the test statistic that are greater than the largest critical value or less than the smaller critical value
result in null hypothesis rejection (there are rejection regions on other end of the probability
distribution).

So, how do we compute the critical value? We simply find the value on the appropriate probability
distribution (z-distribution or t-distribution for means, z-distribution for proportions, chi-square
distribution for variances) where the area of the rejection region is equal to .

There is an immediate connection between this idea and the idea of confidence intervals. A confidence
interval computes the fail to reject region for the two-tailed hypothesis test!

The final step is key. You can crunch numbers and reject all the null hypotheses you want, but if you
dont conclude your test by explicitly stating what youve just done, then theres really no point.

Ok, this will make much more sense after a few examples but we have to take it a case at a time.

5.2.4 Hypothesis Testing for a Mean

Lets start with an example and just make comments along the way. After different examples for
different types of tests, well compare them all and reiterate the general procedure.

Example 1. A study claims that the average college student sleeps fewer hours than the general public. It
states that the national average for sleep is 56 hours per week with a standard deviation of 4 hours. A
sample of 36 students has an average sleep time of 54 hours per week. Test this studys claim for =
0.01.

Solution 1. Lets walk through the steps:

First off, we state the hypotheses. The claim is that the mean sleep time is less than 56. This does not
include an equality, so it must be the alternative hypothesis. So,
H0: 56
Ha: < 56 (claim)

Secondly, we compute the critical value, which is zc in this case. We use the standard normal distribution
(z-distribution) because we are dealing with a calculation about the mean with a sample size greater
than 30. This is a left-tailed test and = 0.01, which corresponds to a z-value of zc = 2.33. This is the z-
value where the area under the curve to the left of this point is equal to 0.01.

This time, our test statistic was given to us, = 54. But we need to convert this to a z-score, which is z =
(54 56)/(4/361/2) = 3.

The next step is to compare to the critical value. In this case, z zc, so z lies in the rejection region. We
reject the null hypothesis.

Finally, we interpret the result. We have sufficient evidence to conclude with 99% confidence that, on
average, college students sleep less than the general population and the studys claim is valid.

75
Well quickly discover that the only difference between all these different hypothesis tests is how to
determine the critical value. For hypothesis tests for the mean, the value is found by using the z-
distribution (for large samples) or by using the t-distribution (for small samples). Remember that the t-
test is used when the population is normal, but is unknown and n < 30.

Heres another example of a mean test before moving on to other tests.

Example 2. A certain chemical reactor is first order with a relatively low rate constant. A certain
company suggests that use of their catalyst will keep the reaction first order, but with a rate constant of
0.34 per second. You run sixteen trials in a batch reactor using this catalyst and find that the mean rate
constant is 0.33 with a sample standard deviation s = 0.01. Run hypothesis testing to see if this data
supports the companys claim with = 0.05.

Solution 2. First, we state the hypotheses. There is no mention of something being bigger or smaller, so
our null hypothesis must just be an equality:
H0: = 0.34 (claim)
Ha: 0.34

Second, we find our critical value. We have a two-tailed test with = 0.05 and d.f. = 15, so tc = 2.131 in
the right tail and tc = 2.131 in the left tail. (You can just say that |tc| = 2.131.)

Also, we standardize our test statistic: t = (0.33 0.34)/(0.01/161/2) = 4.

Third, we compare and find |4| |2.131|, so t is in the rejection region. We reject the null
hypothesis.

Fourth, we state our results. We have shown that the mean rate constant is statistically different than
0.34 with 95% confidence, which contradicts the companys claim.

Note that the claim may either be the null or alternative hypothesis, but the meaning of your result can
be quite different. Again, the null hypothesis includes the equal sign and the alternative hypothesis does
not. When you list your hypotheses, mark one as the claim.

Also, the reason we used the t-table for this latter example is because of the sample size in our
experiment. If its less than 30, we use the t-distribution, and if its 30 or more, we use the z-distribution.

5.2.5 (Optional) Hypothesis Testing for a Proportion

Recall that we computed probabilities for proportions by approximating a binomial distribution as a


normal distribution. When we computed confidence intervals for proportions, we used the normal
distribution. So, for hypothesis testing with proportions, well use the normal distribution again.

Example. A pharmaceutical R&D technician claims that less than 20% of adults will be allergic to a new
medication in testing. In a random sample of 100 adults, it is found that 15 are allergic to the
medication. Test the technicians claim for = 0.01.

76
Solution. As before with proportions, we need to first make sure that we can use the normal
approximation for binomial distributions. So we compute np = 100(0.2) = 20 > 5 and nq = 100(0.8) = 80 >
5. This means we can use the z-table.

First we state our hypotheses.


H0: p 0.2
Ha: p < 0.2 (claim)

Second, this is a left-tailed test with = 0.01, so the critical value is zc = 2.33.

Third, we compute the standardized test statistic: z = ( p)/(pq/n)1/2= (0.150.2)/[(0.2)(0.8)/100]1/2


= 1.25.

Fourth, we note that z > zc, so z is not in the rejection region. We fail to reject the null hypothesis.

Fifth, we conclude that there is not enough evidence at the 99% confidence level to support the claim
that less than 20% of adults will be allergic to the medication.

Thats how proportion tests are done. Were following the same five steps repeatedly. Tests for means
and proportions are virtually identical, but we need to remember to check that the products np and nq
are each greater than 5 before we begin.

5.2.6 Hypothesis Testing for a Variance or Standard Deviation

The hypothesis testing procedure is the same for variances and standard deviations, and the comparison
between confidence intervals and hypothesis tests continues to hold. We used the chi-square
distribution to produce confidence intervals for variances, and we use the distribution to conduct
hypothesis tests for variances.

Unlike the tests for means and proportions, where we used symmetric probability distributions, the
tests for variances uses a probability distribution that is not symmetric. If we are conducting a right-
tailed test, then our rejection region is in the right tail, and we look in the column for the given degrees
of freedom and . If the test is left-tailed, we look in the column for (1 ) instead of . If the test is
two-tailed, then the area in each of the two tails is /2, so we need to look in the columns for both /2
and (1 )/2.

I hope that the procedure of hypothesis testing is becoming clear, so instead of doing an entire example,
well just focus on finding the critical values.

Example 1. Find the critical value and rejection region for a left-tailed test where n = 15 and = 0.05.

Solution 1. We have d.f. = n1 = 14 and for the left-tailed test, we look in the column for = 0.95. So the
critical value is 2L = 6.571. We reject the null hypothesis if our test statistic turns out to be less than this
value.

Example 2. Find the critical values and rejection region for a two-tailed test where n = 51 and = 0.01.

77
Solution 2. We have d.f. = 50 and need to look in two columns for the two-tailed test. We split in half
and look in the column for 0.005 and also 1 0.005 = 0.995. So 2L = 27.991 and 2R = 79.490. We reject
the null hypothesis if the test statistic is either less than 27.991 or greater than 79.490.

Recall that we can do confidence intervals for standard deviations by making one for variances and just
taking the square roots of the endpoints. The same thing is true for hypothesis testing. Simply take the
square root of your test statistic and critical values if you want (but note that you dont really have to,
since the inequalities will be the same either way).

5.2.7 Hypothesis Testing Summary

You should now understand how to do hypothesis testing for one sample for a mean, proportion,
variance, and standard deviation. Heres that five-step procedure again, now clustering all our options
together.

Step 1. Write the null and alternative hypothesis and identify which is the claim. The hypotheses will
always be about some population parameter (, p, 2, or ). The null hypothesis always includes the
(right-tailed), = (two-tailed), or (left-tailed) symbol.

Step 2. Find the critical value. Make sure you know what and n are. If we are testing for means where
the population is normal, but n < 30 and is unknown, we are looking for tc with d.f. = n 1. If n 30
(for any distribution) or the population is normally distributed with known , we are looking for zc. If we
are testing for proportions and np and nq are both greater than five, we are looking for zc. If we are
testing for variances or standard deviations and the population is normally distributed, we are looking
for 2.

For left-tailed tests, zc or tc will be negative, and our chi-square value of interest is 2L ( will be close to
1). For right-tailed tests, zc or tc will be positive, and the chi-square value of interest is 2R ( will be close
to 0). For a two-tailed test, the two critical values for z and t-tests have opposite signs and each
corresponds to /2 instead of . In the case of the 2-test, the critical values will be located in the
columns for (1)/2 and /2. In all cases, the tails in question are the rejection regions.

Compute the standardized test statistic. The nonstandardized test statistic may be given, or it may need
to be computed as well. There are four possibilities for standardized test statistics. For t-tests, z-tests for
a mean, z-tests for a proportion, and 2-tests, respectively, the standardized test statistics are

78
and these should look familiar; theyre the same computations we used for confidence intervals.

Step 3. Compare the critical value(s) and standardized test statistic. If the standardized test statistic is in
the rejection region (in the tail of the test), then reject the null hypothesis. Otherwise, fail to reject the
null hypothesis.

Step 4. Interpret the result. Either there is sufficient evidence to say the null hypothesis is false, or there
is not. If there is, then the alternative hypothesis is probably true. Say what this means in terms of the
test, and whether it supports the claim or not.

5.2.8 Practice Problems

1) For each of the following scenarios, identify the null and alternate hypotheses for each if put to
a formal statistical test, and explain what conclusion you could reach if you were able to reject
the null hypothesis.

a) The public water company states that the water contains less than 11 parts per billion of lead.

b) A bottle of vinegar is labeled to contain 7.4% acetic acid by volume.

c) Boxes of nails are tested to confirm that the standard deviation in length is no more than 0.01
inches.

d) The mean yield of a novel chemical reaction pathway must be at least 0.25 in order for
investment into its development is merited.

e) A chemistry lab student claims that the standard deviation in their experimental data is exactly
0.1 moles per liter.

2) In order for a food product to be labeled as having no trans fats, it actually only must have less
than 0.50 grams of trans fats per serving. The FCCs brand of oatmeal cookies has been sampled
10 times and found to have an average trans fat content of 0.45 grams per serving with a
standard deviation of 0.1 grams per serving. With what level of confidence can it be said that a
typical serving has less than 0.50 grams of trans fats?

3) The pH of an acid solution used to etch aluminum varies from batch to batch. In a sample of 20
batches, the mean pH is 2.6 with a standard deviation of 0.3. Determine the probability that the
true mean pH of all such acid solutions is greater than 2.5.

4) A group of 40 students were asked to report the number of hours per week they spent
preparing for a chemical engineering course and comparison against course performance
showed that on average, exam scores went up two points per hour of preparation. However, the
standard deviation of this analysis is eight points per hour of preparation.

a) Conduct all steps of a formal hypothesis test with =0.10 to determine whether we can claim
exam scores go up at all (that is, more than zero points) per hour of preparation.

79
b) Determine the smallest possible level of significance with which we can conclude the increase
in exam score per hour of study is greater than zero.

5) A community center reports that the chlorine level in its pool has a standard deviation of 0.46
parts per million (ppm). A sampling of the pools chlorine levels at 25 random times during a
month yields a standard deviation of 0.61 ppm:

3
Frequency

0
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2
Chlorine level (ppm)

Conduct a complete hypothesis test to determine the level of confidence with which we can
refute the community center report.

80
For each of the following scenarios, do the following things as you practice formal hypothesis tests:

a) Write the null and alternative hypotheses (as mathematical equalities/inequalities) and mark
which one is the claim.
b) Sketch or describe the rejection region for this statistical hypothesis on a probability
distribution.
c) Find the standardized test statistic and denote its position on the probability distribution.
d) Decide whether to reject or fail to reject the null hypothesis.
e) Interpret the decision in the context of the original claim.

6) The marketing department for FCC wants to make statistically-valid claims for a new commercial
campaign for their brand of automotive batteries. They want to guarantee that the mean
reserve capacity of their SuperLasting Battery is greater than 1.5 hours. To test the claim, you
sent your lab technician to randomly sample 50 batteries and found the mean reserve capacity
to be 1.55 hours with a standard deviation of 0.32 hours. What can you conclude with 90%
confidence?

7) Recent FDA regulations require that the caffeine content of sodas be clearly labeled on each
can. Naturally, Fictitious Chemical Company has their brand of FC Cola, and in addition to
desperately needing a better name for their product, they need to confirm their caffeine
content labeling is accurate. The labeling on a can of FC Cola currently claims a caffeine content
of 40 milligrams per 12 ounces. The quality control manager for this cola sampled 30 12-ounce
cans of cola, whose caffeine content averages 39.2 milligrams with a standard deviation of 7.5
milligrams. What can you conclude with 99% confidence? (Caution: Unlike the first problem, no
one has made a claim yet! What claim is actionable that is, what should be the claim so that
rejecting the null hypothesis means one thing, while failing to reject it means another?)

8) The FCCs new line of paper towels just might absorb certain liquids faster than the
competitions current rate of 10.0 mL/sec under a standardized industrial test. The scientists
have already run a set of 40 initial tests and found that the FCC paper towels absorb at an
average rate of 10.4 mL/sec with a standard deviation of 0.6 mL/sec. What can you conclude
with 90% confidence?

9) FCC management would really like to brag that these paper towels are made with no more than
10% new raw materials the rest is post-consumer recycled material. Their brilliant scientists
have somehow come up with a way to measure the amount of new raw material used in a
product and find that, among 50 samples, on average the material is 9.5% new material, with a
standard deviation of 2%. Management says they are cleared to add the label if they can prove
their claim with 99% confidence. What can you conclude? (While this should technically be
treated as a test on proportions, you may do a test on the mean percentage, instead.)

5.3 Two-Sample Hypothesis Testing

So far we have used hypothesis testing to compare a sample statistic to a numerical value to make a
claim about a population parameter compared to this numerical value. However, we can also run
hypothesis tests to compare two sample statistics to one another to make claims about population
parameters. For instance, can we statistically claim that one population has a higher mean salary than

81
another? Can we statistically claim that one product has less variance between batches than another?
Can we say that statistically two population parameters are the same?

Hypothesis tests work exactly the same as before, in general. The only significant difference is in that
most complicated step, the computation of the standardized test statistic. The hypotheses and claims
are also a little different.

In the case of two-sample hypothesis testing, the null hypothesis is sometimes also called the no
change or no difference hypothesis. This is because, as before, the null hypothesis includes the
equals sign.

5.3.1 Differences Between Means

We are given two samples of data. Population 1 provides Sample 1, which has parameters 1 and 1,
while Population 2 provides Sample 2, which has parameters 2 and 2. We would like to make a
statistical claim about the difference between 1 and 2. The null hypothesis, then, will either be 1 2,
1 = 2, or 1 2. The alternative hypothesis is the opposite statement. Again, either may be the claim.

First well consider large samples (n 30) or samples with known normal distribution and known
standard deviation. The samples must be randomly selected and independent. (The procedure for
dependent samples is different and not covered here.) In this case, we still find the critical value based
on , but the test statistic is computed with the following formula.

Our null hypothesis (no change hypothesis) always includes the means being the same, so the
expression (1-2) = 0. If n is large, s may be used in place of in the above formula.

Example 1. A scientist claims that there is a difference in the pH of tap water in two different buildings
on campus. The results of a random survey of 100 of each building are below. The samples are
independent. Do the results support the scientists claim if = 0.05?
Building 1: { = 6.09, s = 1.2, n = 100}
Building 2: { = 6.43, s = 1.5, n = 100}

Solution 1. First we state the hypotheses, as always.


H0: 1 = 2
Ha: 1 2 (claim)

Second we compute our critical values. This is a two-tailed test with a z-distribution and = 0.05, so the
rejection regions are z > 1.96 and z < 1.96.

Also, we compute the new test statistic. Since both samples are large, we substitute s for in the above
equation, and since the null hypothesis always includes 1 = 2, we have 1 2 = 0 and

82
.

Third, we compare and see that the test statistic is not in the rejection region, so we fail to reject H 0.

Finally, we interpret this result. There is not enough evidence at the 95% level to statistically claim that
the pH of the water is different in the two buildings.

As you may expect by the symmetry of the last few sections of this chapter, the hypothesis test for
means of small sample sizes is slightly different and involves the t-table instead of the normal
distribution table. Indeed, there is a test for small sample sizes, which we use test when either or both
samples has n < 30. Recall that the degrees of freedom in a t-test depends on the size of the sample.
With two samples, we have d.f. = n1 + n2 2 if the population variances are equal. If not, then d.f. =
min{n1 1, n2 1}.

The test statistic will also depend on whether the population variances are equal. If they are, then

If the population variances are not equal, then the computation looks a lot like that for a large sample:

(We will discuss how to compare population variances in Section 5.3.3.)

Lets look at another example. Note that the five steps of the hypothesis test are followed as usual; the
only real difference between testing on one sample and testing on two samples is that the test statistic
calculation is a little more complicated.

Example 2. A manufacturer says the cost of its control system installation is less than that of its
competition. You are able to obtain records of the most recent installations for two different companies,
Company 1 and Company 2. Can you support the manufacturers claim ( = 0.05) if he comes from
Company 2? Assume the variances of the samples are actually equal.
Company 1: { = 127500, s = 4500, n = 14}
Company 2: { = 125000, s = 3000, n = 16}

Solution 2. The hypotheses are


H0: 1 2
Ha: 1 > 2 (claim).

83
Well assume the variances are equal, so d.f. = n1 + n2 - 2 = 28, and with = 0.05, we have a right-tailed
test with tc = 1.701, so the rejection region is t > 1.701.

Our test statistic is


,

which is in the rejection region. So we reject the null hypothesis. We have enough evidence to conclude
that Company 2s mean installation cost is less than that of Company 1.

5.3.2 (Optional) Differences Between Proportions

Much like the case of the previous two-sample tests, the tests for the difference between two
proportions is largely similar to the test on one proportion. Once again, the only step that has changed a
little is the computation of the test statistic. To use this hypothesis test on two proportions, we must
first check to make sure that the normal distribution approximation on a binomial distribution is valid;
that is, n1p1, n1q1, n2p2, and n2q2 are all greater than 5. The new formulation for the test statistic z is

where the average proportions and (recall that for a single proportion, p=x/n).

Example. The Fictitious Pharmaceutical Company is conducting a study to test the effectiveness of a new
migraine medication. Of the 4700 subjects taking the medication (Group 1), 301 reported having a
migraine within the week of the study. A control group (Group 2) of 4300 subjects took a placebo, and
357 reported having a migraine within the week of the study. At 99% confidence, can we conclude that
the medication reduced the proportion of migraines?

Solution. Recalling that x = np, we can easily tell that all values of concern are greater than 5, so we can
use the normal approximation to a binomial distribution. We compute the sample proportions for both
groups: = 0.064 and = 0.083.

The hypotheses are


H0: p1 p2
Ha: p1 < p2 (claim).
This is a left-tailed test with = 0.01, so the rejection region is z < zc = 2.33. Well compute = (301 +
357)/(4700 + 4300) = 0.073 so we can find z using the above equation:

which is in the rejection region. So we reject H0 and conclude that the claim is true and the medication
reduced the proportion of migraines.

84
5.3.3 Differences Between Variances

In the means section, we mentioned that the t-test changes depending on the assumption of equality of
the variances of the samples. As you can guess, there are hypothesis tests for differences between
variances. Unfortunately, the similarities between the one- and two-sample tests for means and
proportions do not carry over to two-sample tests for variances. We use a different probability
distribution entirely to investigate variances. (The chi-square distribution is used for another set of
hypothesis tests called goodness of fit tests, but that particular test is outside the scope of this
course.)

The F-distribution is the sampling distribution of ratios of variances. If 21 and 22 are the variances of
two populations, the F-distribution is the represents the distribution of their ratios. In mathematical
formula, this means F = s21/s22. F-distributions, like other probability distributions, have a total integral of
1. Further, they are right-skewed, always positive, and have a mean value of roughly 1. Like the related
chi-square distribution, F-distributions depend on the degrees of freedom of a sample; however, since
there are two samples involved, we have two different degrees of freedom, which we denote d.f.N, the
degrees of freedom of the numerator (sample 1), and d.f.D, the degrees of freedom of the denominator
(sample 2).

F-distributions are essentially presented as a three-dimensional table, since there are three values of
importance: , d.f.N, and d.f.D. Youll see that F-distribution tables are printed on multiple pages, with
each page corresponding to a different .

While the probability distribution used for two-sample variance tests is new and different, it makes up
for that by simplifying the hypothesis test procedure just slightly. Again, the formula for the F-value (the
test statistic for this hypothesis test) is simply

where s1>s2 (if not, relabel your samples so that this is true) and F>1.

Example. A new PID controller has been installed on a heat exchanger such that the time necessary to
reach a desired temperature has a smaller variance. With the old system, the variance of 10 sample runs
was 144 square minutes. With the new system, the variance of 21 sample runs is 100 square minutes. If
we use = 0.10, can we conclude that the new system statistically reduces the variance of the system?

Solution. We choose s21 = 144 since its larger than 100. So the old system is System 1 and the new
system is System 2. Our hypotheses are
H0: 21 22
Ha: 21 > 22 (claim).
We have = 0.10, d.f.N = 10 1 = 9, and d.f.D = 21 1 = 20. Looking up this value in the F-table, we
obtain a critical value of Fc = 1.96. The rejection region is F > Fc. Our test statistic is F = s21/s22 = 144/100 =
1.44, which is not in the rejection region. So we fail to reject the null hypothesis. There is not enough
evidence at the 90% confidence level to conclude that the new system reduces the variance in times.

85
5.3.4 Practice Problems

For each of the following scenarios, do the following things as you practice two-sample hypothesis tests:

a) Write the null and alternative hypotheses (as mathematical equalities/inequalities) and mark
which one is the claim.
b) Sketch or describe the rejection region for this statistical hypothesis on a probability
distribution.
c) Find the standardized test statistic and denote its position on the probability distribution.
d) Decide whether to reject or fail to reject the null hypothesis.
e) Interpret the decision in the context of the original claim.

1) In 2000, the EPA claimed that the average American generates at least 4.0 pounds of waste per
day. In a random survey of 10 people, you found the mean waste generated per day is 4.3
pounds with a standard deviation of 1.2 pounds. Can you support the EPAs claim with 95%
confidence?

2) As part of a water quality survey, a technician tested the water hardness in 19 randomly
selected streams in the greater Fictional area and found a standard deviation of 15 grains per
gallon. Can you report that the standard deviation is statistically equal (with 95% confidence) to
16 grains per gallon? What is the 95% confidence interval on the population standard deviation?
How are confidence intervals and hypothesis tests related?

3) The new plastic used in the FCCs sandwich baggies must have the same density as the previous
plastic in order for consumers to not detect a change in quality. Twelve samples of the old
plastic was found to have a mean density of 0.93 g/cm3 with a standard deviation of 0.1 g/cm3,
while ten samples of the new plastic have a mean density of 0.92 g/cm3 with a standard
deviation of 0.08 g/cm3. With what level of confidence can it be said that the two plastics have
different densities? (You can assume equal variances to save some time.)

4) Six plants had their carbohydrate concentrations (in wt%) measured in both the shoot and the
root. The following results were obtained. What is the probability that there is a difference in
mean concentration between the shoot and the root?

Plant A B C D E F
Shoot Carb Concentration (wt%) 4.42 5.81 4.65 4.77 5.25 4.75
Root Carb Concentration (wt%) 3.66 5.51 3.91 4.47 4.69 3.93

(Technically we should probably use the hypothesis test on proportions for this problem, but I
am holding true to those sections of the notes being optional. You will reach the same
conclusion if you use the hypothesis test on means, which is definitely not optional.)

86
6 Iterative Programming Applications
Required Reading/Viewing:

The MATLAB Data Fitting and Working with Nonlinear Equations video at
http://www.mathworks.com/academia/student_center/tutorials/computational-math-tutorial-
launchpad.html . Solving Nonlinear Equations (about 19 minutes).

Recommended for Further Consideration:

Skim Sections 4.3 and 5.6 in Prataps Getting Started with MATLAB.

Appendix A.2 of Felder and Rousseaus Elementary Principles of Chemical Processes, 3rd edition provides
overviews for many different numerical methods for root finding.

Chapters 3, 5-7 and 13-14 of Chapra and Canales Numerical Methods for Engineers, 5th or 6th edition,
provide a much more detailed description of both root finding and optimization problems.

Chapter Goals

Describe the inherent numerical error involved in computer methods.

Create an iterative method to solve a nonlinear equation.

A lot of computational techniques boil down to finding the solution to a mathematical equation
numerically and whats more, in many, many cases this mathematical equation can be rearranged to
be of the form f(x)=0 for some scalar or vector function f and variable x. While you are certainly well
versed in obtaining analytical solutions to such algebraic equations, sometimes they can be quite
difficult or even impossible to find, and so techniques have been developed to harness the power of a
computer to do the tedious work for us.

This chapter is broken down into a few main parts first we explore ways to solve the system f(x)=0 in
general, then we extend the idea to optimization problems (where, often, we are also searching for the
solution to the equation f(x)=0, though sometimes that function is now the derivative of another
function), and then we see what MATLAB tools are at our disposal.

The common thread for both of these classes of techniques is that they are generally iterative that is,
they go through an algorithm that incorporates conditionals and loops. In most cases, these techniques
are performed repeatedly until a specific error tolerance is met.

The true error of a numerical result is the difference between the numerical solution and analytical
solution of a specific problem.

Of course, determining such an error is impractical if you went through all of the work finding the
analytical solution to a problem, then what are you doing here trying to use a numerical technique to do

87
the same thing? True error is helpful in investigating numerical techniques, or trying simple test
problems with a known solution to see exactly what is going on. In the few cases where it is possible and
appropriate, we will use the true error, or the relative true error, expressed below, to make judgments
in this chapter.

More often, we will need to use an approximate error in determining the effectiveness of our numerical
procedure. The idea is that our best estimate of the true solution is actually our latest approximation.
In an iterative method, we are repeatedly refining our estimate of this value, so we can define
approximate error as the difference between our current and previous approximations, and our relative
approximate error as

Notice in our error definitions that things can get dicey if our true solution or current approximation is
near zero. We may need to keep that in mind when writing MATLAB code to solve specific problems.

Typically, the ideas behind iterative methods are similar to those of an informed trial-and-error
procedure. The numerical methods are devised so that they converge to a solution that is, each
iteration is designed to get us closer and closer to the actual value. We keep track of the relative
approximate error to inform us when to stop repeating our iterative procedure because our solution is
good enough.

One sample version of pseudocode for most of these iterative methods is as follows:

Start with the problem and an initial guess at the solution to the problem.
Specify a desirable approximate error.
Set the current error to some large number.
While the current error is greater than the desired approximate error,
Label the current guess at the solution as the previous solution.
Use the previous solution in some computation to find a new current solution.
Compute the new approximate error.
Once the current error is less than or equal to the approximate error,
Report the current solution as the numerical solution to the problem.
Substitute the solution back into the problem to make sure the problem is truly solved.

In most cases, only the parts in italics change from method to method! Certain methods may include
additional programming elements to ensure the iterative method is successfully approaching a solution
to the problem. This may include counting the number of times a new current solution is found (if
after hundreds of computations the desired error criterion is not met, it is possible it may never be met
with the current iterative procedure), or even keeping track of multiple error computations to make
sure the error is actually decreasing with each iteration. Since the point of iterative methods is to get
closer and closer to a final answer, if the error is growing with each iteration, then our solution is
diverging (instead of converging to an answer). It might be worthwhile to keep track of the error in each
step of your method.

88
6.1 Root-Finding Methods

A root finding method is a computational procedure for solving any equation of the form

In general, this equation may be for a scalar or vector value x and in the case of a vector value x where
the function f is linear, we are solving equations like those from Chapter 2, where the column vector b is
all zeroes.

Here, lets assume x is a scalar value but that f(x) is a nonlinear function in x. As is the case for problems
we want to solve numerically, typically the analytical solution is tedious, difficult, or impossible to
obtain, and so we turn to computers to give us a numerical approximation to the solution of the
problem.

There are basically two classes of root-finding methods open methods and closed methods. In this
case, a closed method is a procedure that starts with some kind of bound on the solution, effectively
providing a range of values for which to search for the solution. An open method, then, provides no such
bound, but usually requires an initial guess at the answer before seeking a better solution. We will
consider each in turn.

6.1.1 Closed Methods

The basic idea behind a closed method is to start with a range of values and systematically reduce the
width of this range until it is acceptably small. You have probably intuitively already dealt with this kind
of method before for example playing a high-low game where you are trying to guess a number and
you are told whether the number is higher or lower than your guess.

The simplest version of a closed method is actually very much related to the high-low game. If you are
trying to solve the function f(x)=0, then, in most cases, for some value near the solution to the problem,
f(x) is negative, and for some value on the other side of the solution, f(x) is positive.

Say we are given a minimum and maximum bound on our function. At the minimum value, we can see
from the graph that f(x) is negative. At the maximum value, f(x) is positive. (It might be the reverse; this
idea will still hold, just in the opposite way discussed here.) Next we choose a value in between the
minimum and maximum. If f(x) at that value is positive, then we know there must be a place between
the minimum and that value where the function crosses the y-axis, and f(x)=0. Likewise, if f(x) at the
chosen value is negative, then there must be a value between there and the maximum where f(x)=0.
And so our first closed method is revealed.

Choose a value between the current minimum and maximum.


Evaluate the function at this chosen value.
If the sign (+ or -) of the function is the same as the sign of the function at the maximum,
Set the chosen value as the new maximum.
Otherwise if the sign of the function is different from the sign of the function at the maximum,
Set the chosen value as the new minimum.

89
The error computation for all closed methods is generally the same: once the range of values (its
maximum minus its minimum) is sufficiently small, the procedure can end.

Now, the only part left to decide is which value between the current minimum and maximum to choose.
The two most popular choices are bisection and linear interpolation.

In bisection, the chosen value is simply the median of the minimum and maximum values. The benefit of
this approach is it is straightforward easy to compute, and also easy to tell exactly how many iterations
of the procedure are necessary to obtain a solution within a desired error, since the width of the range
is exactly halved every time.

In linear interpolation (sometimes also called the false-position method, or for you Latin enthusiasts,
the regula-falsi method), the equation from the start of section 7.4 is slightly rearranged to obtain a
chosen value of

The benefit of this approach is that for some functions, it requires fewer iterations than bisection to get
a tight range on the solution to the problem.

Note that one drawback to closed methods of this type is that the function f(x) needs to pass through
f(x)=0. In the case of a function like f(x)=(x-3)2, these closed methods will fail because f(x) is non-negative
for all values of x. In that case, the numerical equation needs to be redefined or another root-finding
method must be implemented.

6.1.2 Open Methods

The idea behind an open method is to start with an initial guess and use slightly more complicated
mathematics to work toward a solution. In open methods, relative approximate error is used as the
error computation, but the iterative method to find the newest current solution can vary.

One straightforward method is successive substitution, in which case instead of solving the equation
f(x)=0, you rearrange the function to find a solution to g(x)=x.

The method then simply involves taking your current guess for x, relabeling it as your previous guess for
x, and substituting it into g(x) to get a new current value:

The major drawback for this method is that it will only work if the absolute value of dg/dx is less than 1.
Often times when solving a complicated problem, there are multiple ways to form g(x), but it can be
difficult or even impossible to find a g(x) that meets this specific criterion.

One of the most popular open methods is the Newton-Raphson method, which is actually very similar to
the linear interpolation method from the last section, only instead of using two points to interpolate to a
new value, the derivative of the function is used to extrapolate to that new value instead:

90
One immediate drawback to this method is that the functions derivative must be known. It is also
possible, depending on the initial guess, that the Newton-Raphson method will not find a solution.
Therefore, its important to build in an extra bit of code to check and make sure the error between
iterations is not growing.

If the only issue is not knowing a functions derivative, you can compromise and use a numerical
approximation on the derivative instead. Using part of the definition of the derivative, we can select
some small h and get

If you just let h be the difference between your previous two guesses, then there is slightly less
computation to do for every single iteration, and your formula for the new current value is

This is referred to as the secant method. If you look closely, the formula looks like a rearranged version
of the linear interpolation method from the last section, and in truth, it pretty much is. The main
difference is that the secant method can work even if all function values are of the same sign, where the
linear interpolation method would break down.

In a few chapters, we will look at more complicated approximations to derivatives, should they be
necessary in a Newton-Raphson style procedure.

91
6.2 (Optional Section) Optimization Methods

There are basically two types of iterative procedures we can perform to find the maximum or minimum
values of a function: types that are directly related to the root-finding methods of the previous section,
or types that are more directly applicable to multivariable optimization methods.

Using root-finding methods to find the maximum or minimum of a function comes from the idea that at
these points, the derivative of f(x) must be zero. So, if that derivative is easily determined, then using the
Newton-Raphson method to solve f(x)=0 is relatively straightforward.

Another option is to use an interpolation method. Much like the linear interpolation method can
approximate f(x) and be solved to find the x value where the interpolation function is zero, a quadratic
interpolation method can approximate f(x) and be solved to find the x value where the interpolation
function reaches its maximum or minimum.

The most basic method for multivariable optimization (aside from some brute-force method like
randomly guessing) uses the gradient of a function. Using an initial guess, compute the gradient of the
function and then take a step in that direction by multiplying the gradient by some value h and adding
this to the initial guess. You can iterate on this procedure until further computations no longer improve
your best guess for the functions maximum or minimum.

In ENCH 442, we will spend more time discussing optimization methods, so I will refrain from getting too
carried away with this course packet. Something to look forward to?

92
6.3 MATLAB Exploration: Applications of Loops and Conditionals

Reminder: Its in your best interest to read the following before coming to lab.

Skim Sections 4.3 and 5.6 in Prataps Getting Started with MATLAB or Chapters 5-6 in Attaways
MATLAB: A Practical Introduction to Programming and Problem Solving.

By the end of this exploration, you should be able to

Write an iterative procedure to implement the bisection method with MATLAB.


Write an iterative procedure to implement the successive substitution method with MATLAB.
Write an anonymous function in MATLAB.

This week we will use MATLAB to implement the iterative methods discussed earlier in this chapter of
the notes to solve equations of the form f(x)=0 or g(x)=x.

So far, we have just talked generally about a simple equation where x is a single scalar value. Lets walk
through an example:

The volume of liquid in a spherical tank is given by the formula

3r h
V h 2
3

where V is the volume, h the height of water in the tank, and r the tank radius. If r=3 meters, to what
height must the tank be filled so that it holds 30 m3?

An iterative method is sort of like a guess and check procedure with an initial guess (or two) for h,
we can evaluate V, and then change our guess, and see if were getting closer or not. In this case, there
are some obvious physical limits on h what are they? (Hint: the tank is a sphere.)

Minimum h:

Maximum h:

And, evaluated at those extreme values, what is the volume of the liquid in the tank?

Minimum v:

Maximum v:

Since 30 is in between those minimum and maximum values, we are in business! We can continue to
guess and check until we find the correct value for h.

Lets first try by using the bisection method. If we have a high guess and a low guess that bound our
solution, first we need a formula to find the midpoint between the two:

93
And then we need to evaluate our function at that midpoint. We will redefine either our high value or
our low value with the current midpoint, depending on the value of the function at the midpoint but
what do we do when?

We will set high = midpoint in the case that

We will set low = midpoint in the case that

For the bisection method, we will continue until the range of possible solutions (the difference between
high and low is acceptably small. Say, for instance, that error=high-low and

error<10^-5

So well have to remember to build in our loop (for or while?) a condition to check this error size.

Lets try putting it all together. Can you fill in the partially completed MATLAB code below to implement
the bisection method for this problem?

%sphericaltankvolume.m

r=3;
hlow=__?__;
hhigh=__?__;
_____?______; %hint: need to initialize the while loop condition!

while _______?_______
hguess=_____?______;
v=pi*hguess^2*(3*r-hguess)/3;
if _____?_____
hhigh=hguess;
else
hlow=hguess;
end
_____?______; %hint: need to reevaluate the while loop condition!
end

disp(['The height of liquid in the tank is ',num2str(hguess),' meters'])

Remember to hit Ctrl+C from the Command Window if you find yourself stuck in an infinite loop. If all
works well, you should get a height around 2.03 meters.

94
Lets try a more complicated problem that will require user input as it works. Read the following
problem and think about how you might try to find the velocity for the problem below (once we have
velocity, volumetric flow rate is just velocity times cross-sectional area).

The siphon shown is fabricated from 2 in. internal diameter (i.d.) drawn aluminum tubing. The liquid is
water at 60F, which has density of 1.94 slug.ft-1 and kinematic viscosity of 1.2 x 10-5 ft2.s-1. Using
MATLAB, compute the volumetric flow rate through the siphon. How sensitive is the solution to the
length of the tube?

The velocity (V) at the tube outlet is given by:

0.5


2 gz1
V
1 K ent f L L

D bend D

where g is gravitational acceleration, z1 is the height of the water surface, Kent represents losses due
to the tube entrance, f is the friction factor, L/Dbend represents frictional losses due to a 180bend,
and L/D is the ratio of length to tube diameter. The following quantities are also known:

Kent = 0.78
L = 10 ft
L/Dbend = 56
z1 = 8 ft
e/D=0.0003
f is a function of Reynolds number and tubing roughness (e/D) as given in Figure 8.12 on the next
page.

95
96
In order to compute the velocity, we need to know the friction factor. In order to find the friction factor,
we need to read it from the graph, and it depends on the Reynolds number. In order to know the
Reynolds number, we need to know the velocity, according to the formula on the x-axis of the chart!

This calls for successive substitution. We need to guess an initial value somewhere, use it to compute
the next value, use that to compute the next value, and so on. The idea is that with each cycle, we will
be closer to the true solution to the problem a set of values for velocity, Reynolds number, and friction
fact that satisfies our problem!

% iterative.m
% MATLAB Iterative Solutions Lab Exercise
%
%% First define all known parameters
% ...
% (you need to fill this part in watch your units!)
%
%% Guess initial value of Reynolds Number & find "f"
Reguess=input ('Input a guess for the Reynolds number: ');
fprintf('Reynolds Number = %8.3e \n', Reguess)
f=input ('Input the corresponding f value from Figure 8.12: ');
fprintf('f = %8.4f \n', f)
%
%% Begin while loop to iterate and find vout
error=Inf;
while error>0.001

% Now calculate the velocity at the tube outlet

vout=((2*g*z1)/(1+kent+(f*(LoverDbend+(L/D)))))^0.5;

Re=D*vout/visc;
fprintf('Reynolds Number = %4.3e \n', Re)

fnew=input (' Input the corresponding f value from Figure 8.12 : ' );
fprintf('f = %8.4f \n', fnew)
error=abs(fnew-f);
f=fnew;
end
Q=vout*A;

%% Display results
fprintf('==== Results ===== \n')
fprintf('Reynolds Number = %4.3e \n', Re)
fprintf('Velocity = %5.3f ft/s \n', vout)
fprintf('Volumetric Flow Rate = %8.3f ft^3/s \n', Q)

Of course, its much more likely that we would be using a computer to solve a problem that doesnt
require our constant input at each iteration. Ill give you some practice problems to address this.

97
One other item to look at this week is the anonymous function. Sometimes in MATLAB we want to use a
simple custom function repeatedly in a numerical method (like the Newton-Raphson or secant
methods). This function is nothing complicated and can be written in one line of MATLAB code. But
rather than spend the time it takes to open a new editor and the space it takes to save a dinky one-line
function file, you can build that function directly into the Command Window or script you are currently
using. The notation is as follows:

anonymousfunction=@(inputs)function_using_inputs;

For example, in the spherical tank problem, we could have written a simple anonymous function to
compute v:

r=3;
volume=@(h) pi*h^2*(3*r-h)/3

Then, as long as this anonymous function is declared somewhere before the while loop of our bisection
method problem, we could evaluate v(h) directly in MATLAB. The line we wrote before

v=pi*hguess^2*(3*r-hguess)/3;

would simply become

v=volume(hguess);

Used properly, anonymous functions can save a lot of time and prevent unnecessary errors.

Its also possible to write anonymous functions with multiple inputs. Consider this rearrangement of the
ideal gas law:

idealV=@(n,T,P)n.*0.08206.*T./P;

You can then evaluate the volume for any combinations of n, T, and P. Note that you must enter them in
the same order you declared them in the function.

>> idealV(1,273,1)

ans =

22.4024

The nice thing about anonymous functions is that MATLAB has a number of built in functions that accept
them as inputs. While we are talking about root solving, MATLAB has two functions, fzero and fsolve,
that were created specifically to solve equations of the form f(x)=0.

The syntax for fzero requires a function and an initial guess. Like the name implies, MATLAB will return
a value of the independent variable that sets the function equal to zero. Taking one final look at our
spherical tank problem, we first want to transform it into an equation f(x)=0:

98
3r h
h 2 V 0
3
Hopefully that wasnt a stretch.

But then we can write this as an anonymous function in h and give MATLAB a guess to solve it:

>> r=3;
>> Vtarget=30;
>> Vfunc=@(h)pi*h^2*(3*r-h)/3-Vtarget;
>> fzero(Vfunc,1)

ans =

2.0269

If you want, instead of putting in a single initial guess, you can bound the problem like we did ourselves.
MATLAB will then know you are looking for an answer between the upper and lower bound:

>> fzero(Vfunc,[0 6])

ans =

2.0269

The fzero command works only for scalar equations. If you have multiple unknowns, then you must
pose the problem as a vector problem and use fsolve. Check the MATLAB documentation for more
information.

99
6.3.1 Practice Problems

1) Write a MATLAB Program to solve for the terminal velocity of coal particles (specific gravity 1.8,
diameter = 0.208 mm) in water (density 994.6 kg/m3, viscosity 0.893 Pas) in a centrifugal
separator. When the separator is running, the gravitational acceleration is thirty times the
acceleration due to gravity at sea level. The terminal velocity of a spherical particle is given by

where vt is the terminal velocity, r is the particle radius, g the gravitational acceleration, p and
f are the densities of the particle and fluid, respectively, and C D is the dimensionless drag
coefficient. (In your report, please also specify the units for the values you used for these
terms.)

The drag coefficient is a function of the Reynolds number. For a spherical particle at terminal
velocity, Perrys Chemical Engineers Handbook reports the following piecewise relationship:

for

for
for
for

The formula for the Reynolds number using the notation established in this problem statement
is

where is the viscosity of the fluid (please report the units you used for the value in your
report).

For this problem, generate a two-column table (one for iteration number, one for estimate of
velocity) including all the iterations performed in a successive substitution method until you are
able to compute the velocity to the nearest mm/s. In the results section, comment briefly on
your computer code, and in the conclusion, reiterate the problem and its solution.

100
2) After your brilliant work figuring out the reaction rate for the bioreactor, the division supervisor
asks you to take a look at another reactor they are trying to make run more effectively. The
reactor achieves a particularly low conversion, regardless of what temperature or feed is
supplied. You remember fondly your days in ENCH 215 and propose recycling the unreacted
material.

So impressed with your intuition, the supervisor tasks another engineer with crunching some
numbers to determine the relationship between recycle rate and overall system conversion. This
engineer is not particularly great at communication, so all he writes is

The reactor is achieving a particularly low conversion, so of its outlet a fraction R will be
recycled, mixed with fresh feed, and run through the reactor again. For this particular reactor
the best choice for the recycle fraction is related to the desired outlet conversion X through this
function:

Your supervisor has no idea what to do with this, so you offer to solve the equation for him by
writing a MATLAB script that implements an iterative root finding method, such as bisection,
linear interpolation, or the secant method to solve for R for fractional conversions of X from
0.85 to 0.95 in increments of 0.01. (Tip: not all root-finding methods are created equal! Some
are much faster to run than others.)

Then compare your results to those from using fsolve directly in MATLAB. If your results are
not in agreement (within some reasonable tolerance or numerical error of your choosing),
adjust your stopping criterion in your iterative method until they are.

101
7 Curve Fitting
Required Reading/Viewing:

Chapter 12 of the free version of the MATLAB Interactive Tutorial published by Mathworks at
http://www.mathworks.com/academia/student_center/tutorials/mltutorial_launchpad.html.

The MATLAB Data Fitting and Working with Nonlinear Equations videos at
http://www.mathworks.com/academia/student_center/tutorials/computational-math-tutorial-
launchpad.html . Introduction to Data Fitting and Working with Nonlinear Equations and Data-Driven
Models (about 29 minutes).

Recommended for Further Consideration:

Section 5.2 in Prataps Getting Started with MATLAB or Section 15.1 in Attaways MATLAB: A Practical
Introduction to Programming and Problem Solving.

Appendix A.1 of Felder and Rousseaus Elementary Principles of Chemical Processes, 3rd edition provides
a very brief overview for finding the least-squares linear fit to data.

Chapters 17-18 of Chapra and Canales Numerical Methods for Engineers, 5th or 6th edition, provide a
much more detailed description of the material covered here in Sections 7.3 through 7.4.

Chapters 2 and 8 of Navidis Principles of Statistics for Engineers and Scientists, 1st edition, supplies
further discussion and practice problems for material.

Chapter Goals

Communicate the meanings of correlation and causation. Understanding these concepts is important
to making valid statements on sets of data.

Construct a regression line for a set of data. Being able to do this for any kind of equation requires an
understanding of more linear algebra concepts, but allows you to create best-fit curves and
understand the error involved.

Construct an interpolating spline for a set of data. Being able to interpolate data is critical to your
ability to read data from tables or estimate unknown values for a sparse data set.

Use regression and interpolation appropriately for a given situation. Regression and interpolation both
have their places in engineering and experimental design. Know when to use which method.

7.1 Correlation and Causation

A correlation is a relationship between two variables x (independent) and y (dependent), which we may
write as ordered pairs (x, y). We say that x and y are positively correlated if y increases with increasing x
and they are negatively correlated if y decreases with increasing x.

102
The sample correlation coefficient r is a measure of the strength (likeliness) and direction (positive
or negative) of a linear relationship between x and y. Its formula is

where n is the number of data pairs. The population correlation coefficient is denoted . The correlation
coefficient is always between 1 and 1, and a negative sign suggests a negative correlation. The closer
|r| is to 1, the stronger the correlation.

The R2 value provided by trendlines in Excel and MATLAB is the square of the correlation coefficient r. If
youve worked with R2 values in the past, youve probably heard rules of thumb about good R2 values.
Now, however, we can apply hypothesis tests to determine precisely what r (or R2) value is good
enough to claim correlation (which, please note, is different than a good linear fit). But first, one
more definition.

Causation (or causality) is a term used to denote a directional relationship between two variables. We
could say that the independent variable x causes y. Naturally, its possible that two variables could be
correlated and have no cause-and-effect relationship. For example, the prices of corn and the prices of
dairy products are positively correlated and have a cause-and-effect relationship. The average annual
price of gasoline and your age in years are positively correlated, also, but there is no cause-and-effect
relationship there (gas isnt more expensive because you are older, and you are not older because gas is
more expensive). Other correlated data, like CO2 concentration in the atmosphere and the average
temperature of the earth, could have a cause-and-effect relationship, but different people have
different opinions on the matter.

There are basically four possibilities for correlated data (x, y). Either x causes y, y causes x, some third
variable (or combination of a bunch of variables) cause both x and y, or x and y are just coincidentally
correlated.

7.1.1 Practice Problem

Brainstorm at least five other example of pairwise data where one variable is correlated to the other,
but one variable is not caused by the other.

7.2 Hypothesis Testing for the Regression Coefficient

Remember that hypothesis tests deal with the population parameter (here, ) based on the sample
statistic (r). We will consider just one type of hypothesis test, and it is two-tailed:
H0: = 0 (There is no significant correlation.)
Ha: 0 (There is a significant correlation.)

Note that by the formulation of this hypothesis test, rejecting the null hypothesis means that its
extremely likely that the population parameter is not zero. It does not prove that the population
parameter is close to 1, but it allows us to make a statistically significant claim that the variables are
correlated, however weakly or strongly.

103
For this hypothesis test, we use the t-distribution with n 2 degrees of freedom (why n-2 and not n-1
like other uses of t-testing? Because linear regression uses two parameters, a slope and an intercept, to
characterize the data). The critical value is found same as always using and d.f. = n 2. The test
statistic is

If the test statistic t is in the rejection region (remember this is a two-tailed test), then we reject H0.

We can also note, then, that the value of r or R2 that is acceptable to show a correlation depends on
the number of data points. Once more, I want to stress that this test does not tell us if the linear fit for
the data is good. It only helps to show whether the data is correlated at all or not. A stronger
hypothesis test is to declare what the exactly value of the slope of a regression line is this is a detail
that requires knowing what the variance of the slope is... so well get back to that thought a bit later this
chapter.

7.3 Regression

Regression is one of two major classes of curve-fitting techniques (the other, interpolation, is considered
in Section 7.4). The aim of regression is to find a defined continuous function that represents a set of
discrete data. This function may not pass directly through the points in a discrete data set, but it aims to
minimize the error between the function and all points in a data set.

The regression work in this class will be least-squares regression. In a least-squares fit of a function to a
data set, we are finding the parameters for a given function that result in the smallest value for the
square of the residuals, or the errors between the function value and the given data set.

[I mean to put a graphic here eventually. Ill draw what I mean in class.]

For a regression function y=f(x), where x may be a scalar or a vector value, the residual e i for a given
data point (xi, yi) is the difference between its measured value and its fitted functions value:

And we are seeking the parameters of commonly known functions (for example, coefficients of
polynomials) that result in the smallest sum of squared residuals.

In this section, well quickly look at the most basic functions with which to perform regressions
polynomials. However, its theoretically possible to find least-squares regression for an infinite variety of
different functions, including combinations of polynomials, exponents and logarithms, and
trigonometric functions. Typically, though, the more complicated the function, the less often youll
actually find a regression for it.

104
7.3.1 Linear Regression

The most basic polynomial is that of a line. A line has two parameters: its slope and intercept. The linear
least-squares fit for a set of data is the line that minimizes the sum of the squares of the residuals of the
data. You have almost certainly encountered this kind of curve fitting before it is exactly what is
performed when you add a trendline to data in Microsoft Excel.

We will define our general linear least-squares fit as

where a0 is effectively the y-intercept of the line and a1 is its slope. (We will use a-values with numerical
subscripts instead of the usual b and m in order to extend this idea to higher-order polynomials in the
next section.)

To minimize the sum of squared residuals (SSR), we need to express this value algebraically. One way to
write this sum is

In order to find the parameters a0 and a1 that minimize the SSR, we need to temporarily think about this
sum as a function of those very parameters. SSR is a quadratic function in both cases, which means it
passes through one critical point (where the derivative of the function with respect to ai is zero),
corresponding to its minimum value. So, effectively, we are solving for the values of a 0 and a1 where

(Notice that we had to use the chain rule to take the derivatives the xi outside the parentheses in the
second equation is the derivative of a1xi; in both cases, a negative one appears, but we can factor that
out of the sum.)

This gives us two equations and two unknowns to solve for! Using a few identities having to do with
summations, we can rewrite these two equations as

105
These are sometimes referred to as the normal equations. You can enter the appropriate values for the
sums of the data values and solve this 22 linear system, but this exact problem appears often enough
that the solution is already widely known algebraically in multiple forms:

There are two main ways to describe the error associated with a regression line. The one you are
probably already familiar with is the R-squared value, more formally called the coefficient of
determination. The formula that is most convenient for computational methods was given in Section
7.1, but may be more convenient in some cases if expressed as on the right:

Formulas for the coefficient of determination are pretty easy to implement with a spreadsheet or
computer code that makes use of built-in sum commands or loops. The major drawback to the above
equation is that it is specific to linear regressions, so to make it more general, a second way of
computing this coefficient is

where TSS is the total sum of square error of the dependent variable (y, in this case) that is, the sum of
squared differences between each point and the mean:

If the regression equation is a perfect fit, the value of r2 is 1, because the SSR is zero. This means that the
regression equation explains 100% of the variation in the data. If the regression equation represents
zero improvement in describing the data (versus just using the mean value), then r 2 is 0. There is no hard
and fast rule for what value of r2 is good enough; it depends on your specific application. Sometimes any
r2 value that allows us to make a statistical claim about the linear dependence of the data (see the
previous section) is good enough. Other times, we may seek specific coefficients of determination
say those that are greater than 0.8, 0.9, 0.95, or even 0.99.

One important note about the coefficient of determination is not to rely on it to make unwarranted
claims. Especially as we begin to look at more complicated regression formulas, it should become
apparent that there is usually no reason to strive for a certain r2-value. In fact, seeing r2-values of exactly
1 may simply be an indication that you have as many data points as you do parameters (a straight line
will always connect two points! A quadratic equation will always connect three points!).

106
Another measure of the goodness of fit for a regression equation is the standard error of the estimate.

Here, n is still the number of data points, but m represents the dimension of the independent variable
vector x. Currently, we have only considered the case m+1=2, meaning the regression function has two
parameters (a slope and intercept). The quantity n-m-1 represents the degrees of freedom of the
regression it is the number of data points minus the number of parameters we are fitting the data to.

The standard error of a regression function is analogous to the standard deviation of a data set. Recall
that the standard deviation is a measure of the spread of the data around the mean. The standard
error, then, is a sort of measure of the spread of the data around the regression function. This value
becomes more important in the case of multiple regression, because we can use the standard error in
part to determine which independent variables are the most significant contributors to the regression.
So, more on that in a little bit.

Well have to stick to this general way of computing the standard error for generic linear regressions,
but for a simple case of one independent variable (x) and one dependent variable (y) related by the
expression y=a0+a1x, we can also approximate the standard deviations on the parameters using the
following formulas:

Before we move on to more complicated regression equations, we should consider the idea that many
regression equations can be transformed into linear equations especially if those regression equations
have only two parameters. You have seen some of this already in your work plotting data on semi-log
and log-log data. Some examples of common linearizations in chemical engineering include those in
Table 1, below, but you may encounter many more examples.

Table 1: Common Linearizations in Chemical Engineering Systems

Original Equation Linearized Version

107
7.3.2 Confidence Intervals and Hypothesis Testing on Linear Regression Parameters

In the previous section, I gave you some formulas to compute not just the regression parameters for a
linear regression, but the standard deviation of those regression parameters. That means you can also
compute confidence intervals on regression parameters, the same way you would for confidence
intervals on the mean. The formulas are almost the same as before, except the standard deviation
calculations already have sample size built into them.

The c=100(1-)% confidence intervals on a0 and a1 are

where the critical value tc is chosen assuming a two-tailed distribution (so divide by 2) and n-2 degrees
of freedom.

We can also compute the confidence interval on a predicted response, as long as we are not trying to
extrapolate out of the range of our data. If we want the c=100(1-)% confidence interval on the quantity
a0+a1x, it can be computed with the formula

where

When it comes to hypothesis testing, the procedure for regression parameters is exactly the same as it is
for one-sample hypothesis testing on the mean. The only step that is slightly different, once again, is the
calculation of the test statistic:

where a00 is the numerical value you are comparing a0 to.

Lets check out an example. Consider an example from back in your ENES 101 and/or ENME 110 days,
when you are seeking to find the Hookes Law constant for a spring or a beam. The object is subject to a
weight w and the length of the beam d is measured. The data is in Table 2.

108
Table 2.: Measured Lengths of a Beam Under a Load

Weight Length (in) Weight Length (in) Weight Length (in) Weight Length (in)
(lbf) (lbf) (lbf) (lbf)
0.0 5.06 1.0 5.16 2.0 5.40 3.0 5.59
0.2 5.01 1.2 5.25 2.2 5.57 3.2 5.61
0.4 5.12 1.4 5.19 2.4 5.47 3.4 5.75
0.6 5.13 1.6 5.24 2.6 5.53 3.6 5.68
0.8 5.14 1.8 5.46 2.8 5.61 3.8 5.80

Assuming d=a0+a1w and applying the equations from Section 7.3.1,

The standard deviations on a1 and a0 are given after we compute the correlation coefficient and the
standard error of the estimate:

Now, those standard deviations on the regression parameters:

The 95% confidence interval on the spring constant a1 would be

The 95% confidence interval on the length of the beam subject to a load of 1.4 lbf would be

109
Finally, we could conduct a one-sample hypothesis test on a regression parameter, if we so desire. Say a
manufacturer claims that the spring constant of this beam is at least 0.215 in/lbf. The test statistic would
be

If we look at the Critical t-value table, we know this is between =0.1 and =0.25 (or we could use
MATLAB or Excel to get the exact value), so it is like that we fail to reject the null hypothesis and cannot
support the manufacturers claim at a reasonable level of confidence.

7.3.3 (Optional-ish) Polynomial Regression

The reason this section is kind of optional is that in Section 7.3.5 we will apply a more useful theory
that does the work in this section in a better way. I can and will expect you to be able to do
polynomial regression; I just would rather you use the technique from that later section.

The idea behind polynomial regression extends directly from what brought us linear regression. Lets
first simply add a quadratic term to our regression equation:

Now our sum of squared residuals is

and we can once again write normal equations after finding the derivatives of SSR with respect to the
three fitting parameters ai, setting them equal to zero, and rearranging them to a linear system. You
may be able to detect a pattern that is starting to emerge:

110
We have three equations with three unknowns (our fitting parameters). We can set up an appropriate
matrix equation and solve it for these values.

We could keep going, extending this idea to cubic polynomials and beyond, but there are three reasons
that well stop right here:

We seek regression equations of higher-order polynomials much less often than we do for linear
equations.
More importantly, the matrix that results from our system of linear equations is growing in
condition number. Recall from Section 2.4 that solutions to ill-conditioned matrix equations are
extremely sensitive to measurement error. It is possible that being off by even a tiny percent in
one data point could completely change the least-squares polynomial fit to a set of data.
There is a more effective way to summarize all least-squares regression equations that will
account for higher-order polynomials and more complicated functions.

The same formulas for the sum of squared residuals, total sum of squares, and standard error of the
estimate apply for polynomial regression. Note that m is equal to the order of the polynomial.

7.3.4 (Optional-ish) Multiple Linear Regression

The reason this section is kind of optional is that in Section 7.3.5 we will apply a more useful theory
that does the work in this section in a better way. I can and will expect you to be able to do
multiple regression; I just would rather you use the technique from that later section.

Multiple regression still seeks to find the best fit equation for a set of data y=f(x), except instead of
having a scalar value, x can now be a vector. For a vector of length 2, our regression equation becomes

Now each data point consists of three values: two independent variables x1 and x2, plus the dependent
variable y. The subscripts in our general problem formulation get a little messy here, because there is a
different x1 and x2 value for each data point, but the normal equations for this case are

Once again, the same formulas apply for SSR, TSS, and sy/x. Much like polynomial regression, its possible
that these equations will get more ill-conditioned as we allow for larger vectors x. Fortunately, we can
apply some linear algebra theory to give us a general guideline to find the least-squares fit for any
regression formula one that is relatively easy to complete using MATLAB, to boot!

111
7.3.5 General Linear Least Squares Theory

The three kinds of regression equations we have considered so far are all specific cases of this more
general regression function:

where, keeping consistent to this section, m+1 is the number of parameters necessary to describe the
regression function. Table 3 shows how our prior three sections fit into this general formulation:

Table 3: Comparison of Least Squares Regression Cases

Regression m Functions
Linear 1 f0=1, f1=x
Polynomial Order of polynomial f0=1, f1=x, f2=x2,fm=xm
Multiple Linear Length of vector x f0=1, f1=x, f2=x2,fm=xm

The functions fi can simply be any function of your independent variable(s) x that you believe are
important to your fitted curve. Note that x may be a scalar or a vector variable, but y is a scalar. The only
requirement of the theory that follows here is that your regression equation be of the form listed at the
start of this section that is, it is a linear combination of your functions. Then we are simply trying to
find the values for the parameters ai that minimizes the sum of squared residuals.

We can generalize our formula for the sum of squared residuals by first writing a matrix equation that
captures all of our data. Each set of data (xi,yi) provides one equation of the form

and if we collect all those equations inside a matrix, we have

so we can write the sum of square residuals (differences between whats predicted by the least-squares
fit and the actual data points) as

The solution that minimizes SSR, using Y to represent the vector of y-data, Z to represent the matrix of
function evaluations, and A to represent the vector of parameters, is the solution to the normal
equations, same as before. Now representing everything as vectors and matrices, the normal equations
are

112
where a superscript T represents the transpose of a matrix the result if you were to write all the
columns of Z as rows of Z instead. This equation is in the same form as our examples from linear algebra,
so to solve for A in this matrix equation, we just multiply by the inverse of ZTZ to get the final result:

Whew! That is a lot of math in a short amount of space, and it can be potentially confusing if this is your
first foray into linear algebra. Fortunately its really easy to make MATLAB do all this hard work for us, so
its only our responsibility to understand the final result.

The matrix (ZTZ)-1 is a very useful matrix for understanding our regression equation and how well it fits
the data. The elements on the diagonal of this matrix correspond to the variances on the regression
parameters:

where sy/x is the standard error of the estimate from before.

These variances are necessary to perform a hypothesis test or to compute confidence intervals on
regression parameters for regressions more complicated than linear ones. If you need to apply a level of
confidence to the slope or intercept of a line, you are doing a test on data whose sample mean value is
the parameter of interest (i.e., the slope) and the sample variance is computed as above. The degrees of
freedom is n-m-1.

The off-diagonal elements of (ZTZ)-1 are related to the covariances of the regression parameters. The
covariance is a statistic that describes the dependency of one value on another. If the covariance for two
parameters is zero, it means that the parameters are completely independent from one another (and it
may be important for both parameters to exist in your regression). The larger the covariance, the more
dependent the two parameters are on one another. The covariance of parameter ai-1 and aj-1 is

MATLAB and Excel both have some built in tools to help with all of these computations, also, as well
explore through the upcoming exploration, and some practice problems.

113
7.3.6 Practice Problems

1) While working on the SuperLasting Battery, the researchers at Fictitious Chemical Company
realized they were accidentally hydrolyzing water from a solution, which lead them to consider
other applications of their work. One piece of information they are interested in obtaining is a
relationship between the temperature of water and the solubility of oxygen in it. So they set out
to do some experiments and obtained the following results:

Temperature (C) 0 5 10 15 20 25 30
Solubility (mg/L) 14.8 12.9 11.3 10.1 9.0 8.2 7.4

Plot the data to confirm that it looks linear enough, and give them an appropriate regression
formula for the solubility of oxygen in water as a function of temperature. (Show your work so
they can follow it!)

2) For a project, you need to know the maximum solubility of oxygen in salt water at different
chloride concentrations and temperatures. You just so happen to have a table of that data in
front of you, because you were comparing your experimental results from last time to the
literature2:

Dissolved oxygen concentrations in mg/L for various temperatures and chloride concentrations
T, degrees Celsius 0 g/L chloride 10 g/L chloride 20 g/L chloride
0 14.6 12.9 11.4
5 12.8 11.3 10.3
10 11.3 10.1 8.96
15 10.1 9.03 8.08
20 9.09 8.17 7.35
25 8.26 7.46 6.73
30 7.56 6.85 6.20
You should derive a predictive equation for dissolved oxygen concentration in water as a
function of temperature and chloride concentration based on the data table you have.

(a) Write out the regression problem in the form of an overdetermined linear system.
(b) Use MATLAB to solve this linear system with the backslash operator. Print out your work. (The
easiest way is to do this problem entirely in the Command Window and print out your results
from there, but you could also write a brief m-file and publish it.)
(c) Use your regression to estimate the concentration of dissolved oxygen in water with chloride
ion concentration 5 g/L and temperature 17C.

2
Chapra and Canale, Numerical Methods for Engineers, 5th edition

114
3) The bioreactor that you helped investigate on your first day of work for Fictitious Chemical is not
working as expected, and the supervisor of the division has come to you for help. After
explaining the situation to you, you realize that the mathematical model they are using to
simulate the growth of algae is nonsensical, so you propose a new model for the reaction rate
constant3:
k max c 2
k
cs c 2

In order to verify that this is the correct model, you ask the biochemists over in research and
development to put different concentrations of algae in five test tubes and determine the
reaction rate constant for those samples. They come back to you with the following data:

c (mg/L) 0.5 0.8 1.5 2.5 4


-1
k (day ) 1.1 2.4 5.3 7.6 8.9
To see if you have the right model, youll have to figure out how to determine the coefficient of
correlation for this model. But the model isnt even linear, so youll have to find a way to
linearize it first! You are asked to report back to the supervisor with your findings:

(a) What are the values of kmax and cs that minimize the sum of squared residuals for this data?
Dont forget the units!
(b) What is the correlation coefficient for this model once you have processed the data so that the
reaction rate constant is part of a linear equation? Statistically determine whether this
processed data is truly linear with a formal hypothesis test.

4) A childs systolic blood pressure P (in mmHg) and weight w (in lbf) are approximately correlated
by the equation

For the data below, find the fitting parameters 0 and 1, and estimate the systolic blood
pressure of a child weighing 100 pounds.

w (lbf) 44 61 81 113 131


P (mmHg) 91 98 103 110 112

3
Chapra and Canale, Numerical Methods for Engineers, 5th edition

115
5) The FCC Finance division has determined that a good approximation for the manufacturing costs
(less fixed expenses) for a specific product is of the form

where x is the sales level (dimensionless) reported by Marketing. The sales level and
manufacturing costs from different times in the past year are in the table below.

x (dimensionless) 4 6 8 10 12 14 16 18
c (thousands of dollars) 1.58 2.08 2.5 2.8 3.1 3.4 3.8 4.32

a) Write out this regression problem in the form of an overdetermined linear system.
b) Use MATLAB to solve this linear system. Plot a graph of the eight data points together with
the regression curve. Publish your work.

6) A chemical reaction is run 12 times, with the temperature T (in degrees Celsius) and percent
yield y recorded each time. The summary statistics are as follows:

The model of a linear regression equation would be y=a0+a1T. Compute the least-squares
estimate for a0 and a1, the standard error of the estimate sy/T, the 95% confidence intervals on a0
and a1, and the 95% confidence interval on the mean yield at a temperature of 40C.

To do that last part, youll need the 100(1-)% confidence interval for the quantity a0+a1x, given
by the formula

where

116
7) In an experiment to estimate the acceleration of an object down a ramp, the object is released
from rest and its distance from the top of the plane (in meters) is measured every 0.1 seconds
from t=0.1 to t=1.0. The mean data are presented in the following table.

t (seconds) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
h (meters) 0.03 0.10 0.27 0.47 0.73 1.07 1.46 1.89 2.39 2.95

It is desired to fit this data to the model h=a0+a1t+a2t2.


a) Explain the physical relevance of the three fitting parameters in terms of position, velocity,
and acceleration. (Hint: think back to the equations of motion from first-semester physics...)
b) Write the overdetermined matrix system Za=y whose solution vector a are the fitting
parameters.
c) Use MATLAB to find the general linear least squares regression parameters for this
experiment. Find the 95% confidence interval on a2 and interpret this result.

8) It is hypothesized that the thickness of lake ice in millimeters (x) in winter is a function of the
number of days per year of ice cover (t1), the number of days the bottom of the lake has a
temperature below 8C (t2), and the average snow depth in millimeters (t3) for a set of lakes in
Minnesota. The data for these lakes is below:

Lake x t1 t2 t3
1 730 152 198 91
2 760 173 201 81
3 850 166 202 69
4 840 161 202 72
5 720 152 198 91
6 730 153 205 91
7 840 166 204 70
8 730 157 204 90
9 650 136 172 47
10 850 142 218 59
11 740 151 207 88
12 720 145 209 60
13 710 147 190 63

(a) Fit the model to the equation x=a0+a1t1+a2t2+a3t3. Explain the units on each parameter ai.
(b) For the fourth fitting parameter a3, find the observed level of significance for testing the null
hypothesis that the parameter is equal to zero. Interpret your results.

117
9) The weights of eight vehicles and their variability in braking distances when stopping on a wet
surface are presented in the table below.

Weight (lb) 5890 5340 6500 4800 5940 5600 5100 5850
Variability in braking distance (ft) 2.92 2.40 4.09 1.72 2.88 2.53 2.32 2.78

(a) Explicitly use the equations at the top of page 106 in the Course Packet to determine the slope,
intercept, and R2 value of the linear regression of this data. Be sure to correctly select the
independent and dependent variables.
(b) Confirm your results using either the built-in functions or plotting tools in Microsoft Excel. Be
sure that your work can be clearly followed its not sufficient to just print out a plot or even a
spreadsheet of values (unless this spreadsheet makes it clear how each computation happens).
(c) Conduct the hypothesis test on correlation to determine the level of confidence with which we
can say there is any linear correlation at all between these two variables, no matter how weak it
may be.

10) On average, the surface area A of an individual person is related to the product of their mass M
and height H, each raised to a certain (unknown) power. (This product is also multiplied by some
unknown constant.)

Measurements for several individuals are given in the table below.

H (cm) 182 180 179 187 189 194 195 193 200
M (kg) 74 88 94 78 84 98 76 86 96
A (m2) 1.92 2.11 2.15 2.02 2.09 2.31 2.02 2.16 2.31

(a) Propose the form of the nonlinear regression equation A=f(M,H).


(b) Transform this nonlinear equation into a linear one and apply General Linear Least Squares
theory (that is, define the Z matrix and y vector) to solve for the unknown coefficients in
your regression equation.

118
7.4 Interpolation and Splines

The other form of curve fitting is interpolation, which is a technique that finds functions that pass
directly through every available data point. You have already used interpolation techniques before
linearly interpolating between entries in the steam tables, for example, to estimate a value at a
condition in between points in the table.

Interpolation is a useful skill to have when data is limited but you would like to estimate a value based
on what data you already have. Unlike regression, which aims to find a function to describe an entire
data set as best as possible, interpolation often aims to find a specific data value given the existing data
set.

The most common methods of interpolation involve finding a polynomial that passes through a given set
of points. For example, for two data points (x0,y0) and (x1,y1), the equation of the line that connects
these points is

So, to interpolate for some value x between x0 and x1, this formula provides the corresponding y-value.

The formula for interpolating using a second-order polynomial that passes through three data points
(x0,y0), (x1,y1), and (x2,y2), is

Notice that this quadratic interpolation has the same first two terms as the linear interpolation only
the third term is new.

Does the form of those fractions in the interpolation polynomial look familiar? Maybe if instead of
writing y, we write f(x)?

Ah, yes. Thats the approximation for the first derivative of a function (and this interpolating function is
starting to look a lot like a Taylor series!). More officially, the fraction above is the first finite divided
difference of a function, typically denoted f[x1,x0]. The giant fraction in the last term of the quadratic
polynomial is the second finite divided difference, which itself is the finite divided difference of two first
finite divided differences:

119
The pattern continues, and the nth order polynomial that interpolates n+1 points from (x 0,y0) to (xn,yn) is
then the sum of a number of terms involving divided differences:

There are some dangers with blindly fitting data to whatever order polynomial you like. Its possible that
an interpolation polynomial of high order will predict an unrealistic data value, for one. Also, since an
interpolation polynomial is designed to pass through every point, there are two important implications
to remember: (1) you cannot have two different y-values for the same x-value, and (2) the coefficient of
determination, r2, is 1 by definition. A nonsense polynomial will still have an r2 of 1.

The one nice thing about interpolation polynomials is that it is easy to add another data point and
increase the power of the polynomial to connect through all data points. Your values x 0, x1, xn do not
need to be in any particular order there is exactly one nth order polynomial that passes through n+1
points (the number of parameters is the same as the number of data points, or in other words, the
degrees of freedom for an interpolation problem is zero).

Another type of interpolation commonly encountered in engineering is multiple interpolation usually


double interpolation. Youve likely already run into this concept using data like the superheated steam
tables, where data is presented as a function of both temperature and pressure. If the specific
temperature and pressure are not given in your table, you must interpolate in both directions.
Mathematically, for a linear interpolation, it makes no difference if you interpolate across temperatures
first or pressures.

More generally, if we assume we have a set of data points of the form (xi,yj,zij), where z is a linear
function of both x and y, one way to perform double linear interpolation for a value between the points
(x0,y0,z00), (x1,y0,z10), (x0,y1,z01), and (x1,y1,z11) is to use the formula

though this is one of those cases where its probably either to just do the interpolation in both
directions instead of relying on a clunky formula.

Back to single interpolation now. Instead of using a single polynomial to pass through every data point in
an interpolation (can you imagine doing that for a column of values in a steam table? Yikes), its much
more common to interpolate using a spline. A spline is a piecewise continuous function that passes
through a collection of data (it gets its name from the thin, flexible drafting tool that helps to physically
draw smooth curves through points). The most common types of splines are linear and cubic.

A linear spline is basically a series of linear interpolations between every two data points a series of
straight lines that connect the dots. The obvious benefit to constructing a linear spline is it is simple. If
you have a table of data, need a data point between entries, and have no better information to go on, a
simple linear interpolation between the two closest points is your solution. One major detriment to
linear splines is that they are not continuously differentiable, but instead result in undefined derivatives
at most every data point. Further, linear splines are not particularly useful for approximating integrals of
a function that fits a data set (more on that in section 8.1).

120
A cubic spline is a series of cubic interpolations on each cluster of four points, but with three extra
stipulations about these cubic polynomials: the first derivatives of the spline at each knot (place where
one cubic polynomial ends and the next begins) must be the same for both cubic functions; the second
derivatives at each knot must be the same for both cubic functions, and the second derivative at the
very first point and the very last point must be zero. The major benefit to a cubic spline is that it is a
smooth, continuous, uncomplicated curve easy to compute, and easy to take the derivative or integral
of with usually an acceptable amount of error due to approximation. One major detriment is that cubic
splines take a bit of work to compute but this is easily remedied by using a built-in tool of a computer
program like MATLAB.

7.5 Interpolation versus Regression

We have basically considered two types of curve fitting strategies, but each appears to have its own
applications. So when is it best to use which type of curve fitting? Im not sure that there is always an
obvious answer, but there usually is.

The nice thing about regression techniques is they can reduce a large data set to a function and a small
number of parameters. Instead of having to carry around a lot of data points, you just have to carry
around that small set of parameters. Regression techniques are appropriate for problems with large
degrees of freedom. Regression helps to visualize overall trends in data, while potentially smoothing
out any outliers or error in measuring the data to begin with.

Interpolation is convenient when trying to deduce additional data or information within already existing
data. Instead of trying to capture the behavior of the overall data set, you are more interested in
locally predicting a specific data point within that data.

Both interpolation and regression can be sensitive to errors and outliers, though perhaps interpolation is
more sensitive because of the requirement that fitted curves pass through every point. Both
interpolation and regression can be abused and misused the common use of computing r2 values can
be misinterpreted for a regression, and the use of an r2 value in an interpolation is meaningless (because
it is always 1!). It can be tempting to increase the order of polynomials in regression and interpolation to
seek a better fit to the data, but in both cases, this can result in the polynomial predicting values
outside the actual range of the data.

Neither interpolation nor regression should be used to forecast or extrapolate data. In very rare cases, a
poorly obtained data set could even miss details for a function within the range of the data. Information
that is obtained from curve fitting is still empirical that is, based on experiment and should not be
treated otherwise.

121
7.5.1 Practice Problems

Explain whether each set of data should be treated with an interpolation or a regression and why.

1) You have collected data for the vapor pressure of a novel chemical as a function of temperature
and want to determine the Antoine equation coefficients.
2) You have a table for the vapor pressure of water for various temperatures, but need the value of
the vapor pressure for a temperature that is between two temperatures in the table.
3) You need a value of the enthalpy of steam in the steam tables for a temperature and pressure
that lies between entries in the table.
4) You have data for the specific heat capacity of a novel chemical as a function of temperature
and want to include it in Table B.2 of Felder and Rousseau.

122
7.6 Excel and MATLAB Exploration: Statistics and Curve Fitting Tools

Reminder: Its in your best interest to watch the following videos before coming to lab.

Chapter 12 of the free version of the MATLAB Interactive Tutorial published by Mathworks at
http://www.mathworks.com/academia/student_center/tutorials/mltutorial_launchpad.html.

The MATLAB Data Fitting and Working with Nonlinear Equations videos at
http://www.mathworks.com/academia/student_center/tutorials/computational-math-tutorial-
launchpad.html . Introduction to Data Fitting and Working with Nonlinear Equations and Data-Driven
Models (about 29 minutes).

Section 5.2 in Prataps Getting Started with MATLAB or Section 15.1 in Attaways MATLAB: A Practical
Introduction to Programming and Problem Solving.

By the end of this exploration, you should be able to

Compute descriptive statistics in Excel and MATLAB.


Create histograms using Excel and MATLAB.
Perform inferential statistics tests in Excel and MATLAB.
Perform regression and interpolation in Excel and MATLAB.

This week is a little different, because there are going to be times this semester where Excel can
accomplish the same tasks as MATLAB. Rather than ignore that fact, I want to you be aware of your
options. The purpose of this course is to be able to perform certain tasks using a computer, and you are
welcome to select the software that helps achieve that goal.

That said, Microsoft Excel has limitations compared to MATLAB that may be significant when it comes to
large data sets. Versions of Excel before 2003 severely limit the number of columns you can have in a
spreadsheet. Most actions in Excel will require being able to highlight selections of cells, and again, for
large data sets, this can be tedious.

Lets practice with some smaller data sets to get you a feel for how descriptive statistics work in both
Excel and MATLAB.

First lets create a single column vector in both Excel and MATLAB and compare the statistical functions
that are built into each program.

A=[1;2;3;4;5];

The table below contains many of the descriptive statistics you can calculate using either tool. Check to
see that you get the same results with each tool!

123
Recall for Excel that, for the table below, wherever it says cells, the notation is first cell:last cell. For
the example on the previous page, that would be A1:A5.

Function MATLAB Excel


Sum of a data set sum(vector) =sum(cells)
Mean of a data set mean(vector) =average(cells)
Median of a data set median(vector) =median(cells)
Mode (check documentation to mode(vector) =mode(cells)
see how program deals with
multiple modes)
Standard deviation of a data set std(vector) =stdev(cells)
Z-score zscore(vector) will return a Need to enter formula manually.
vector of zscores corresponding
to each element in the vector
Minimum min(vector) =min(cells)
Maximum max(vector) =max(cells)
Range range(vector) Need to subtract min from max.

Where MATLAB and Excel differ is in their treatment of matrices (as opposed to vectors) Create a matrix
with 2 columns and 5 rows in MATLAB and in Excel:

B=[1 2; 3 4; 5 6; 7 8; 9 10];

And try the commands in the above table once more. Now in Excel you would type, for example, A1:B5
to designate all of the cells in the above screenshot.

What is the difference between using MATLAB and Excel for matrices? What could you do in either
program to get them to give you the same result?

Excel and MATLAB both have tools that allow you to create histograms. The steps necessary to make
them are similar but just different enough that well spell them out separately.

124
In Excel, you prepare the data by entering it into the spreadsheet, and then in another location on the
spreadsheet, you enter bin values. In Excel, a bin value is the maximum value for a range of data. For
example, if your bins values are 50, 60, 70, 80, and 90, you are preparing Excel to count your data into
six ranges:
Values that are less than or equal to 50,
Values that have not been counted yet that are less than or equal to 60,
Values that have not been counted yet that are less than or equal to 70,
Values that have not been counted yet that are less than or equal to 80,
Values that have not been counted yet that are less than or equal to 90,
Values that have not been counted yet (and therefore must be greater than 90).

In Figure 3 below, the data is in the A column and the bins are in the D column.

Figure 3: Making a Histogram in Excel

To make the histogram in Excel, select Data Analysis from the Data tab, then choose Histogram. Your
input range should hold your data and the bin range will hold the bin values. If you hit OK, a table will
form on another sheet (or, according to the listed Output options, you can change the output options
to put it on the same sheet, or in a new Excel file). Figure 4 shows, in columns A and B, the result. Then
the data can be plotted using the Column Chart option from the Insert tab. By default, Excel puts gaps
in between each column in the graph, but you can reduce this Gap Width to zero by right-clicking the
data and choosing Format Data Series. Then, of course, youll want to remove the unhelpful legend and
title from the graph, which was not done yet in the Figure below.

125
Figure 4: Histogram Plot in Excel

Try recreating the histogram above.

Because one skill we are trying to develop this semester is the ability to adapt to modern tools, try to
generate the same histogram as used in this example using MATLAB. Some helpful commands include
hist, histc, and bar. (hist, in particular, is nicely versatile you can use it to both save the
bins/counts and to plot the results, depending on how its used.) You should have used these already for
a homework problem last week, so first see if you can generate a histogram from memory, but of
course, if needed, use doc to read up on the necessary commands, and see if you cant create the same
graph as you did in Excel.

126
When it comes to descriptive statistics, Excel and MATLAB both get the jobs done in their own way.
When it comes to inferential statistics, the story is similar, but MATLAB allows for more customization
than Excel (weirdly, Excel is great for two-sample tests, but not so much for one-sample tests).

The commands for z-tests and t-tests are actually really, really similar. In MATLAB, the only difference
between ztest and ttest is whether you want to use the normal distribution or t-distribution (see the
discussions in section 5.2.4 to remind yourself if you forget the difference). So well just look at the
ttest command here:

MATLAB: [h,p]=ttest(x,mean,alpha,tail) %(one-sample test)


[h,p]=ttest(x,y,alpha,tail) %(paired two-sample test)
[h,p]=ttest2(x,y,alpha,tail,vartype) %(unpaired two-sample test)
Excel: =ttest(array1,array2,tails,type)

In MATLAB, the ttest command gives multiple outputs. If you dont tell MATLAB to save them all, youll
only get the first output, which is a logical value 0 (fail to reject null hypothesis) or 1 (reject null
hypothesis). The second value is the probability of observing the sample mean if the null hypothesis is
true. The inputs are as follows: x is the vector of sample data (y is the other vector of sample data in a
two-sample test), mean is the mean you are testing for (the value in the null hypothesis), alpha is your
level of significance, and tail is a string that describes what kind of tailed test it is (both, left, or right).
vartype means variance type, that is, are the variances equal or unequal. You have to enter the
string equal or unequal in this place.

For our course, when two-sample tests are needed, we are almost always doing unpaired tests. Paired
tests are done when there is a direct one-to-one relationship with the data in the two samples (for
instance, checking the temperatures of the same five systems at two different points in time).

Excel is much more limited. The notation is very similar to MATLAB, but the results are presented
differently. Excel only performs two-sample tests. The two arrays are those samples. Tails is the number
of tails (1 or 2). It does not distinguish between left and right tailed tests because it just computes the p-
value of a hypothesis test (more on that in a second). Type is either 1 (paired), 2 (unpaired, equal
variance assumption), or 3 (unpaired, unequal variance assumption). Excel gives only one output: the p-
value, much like the p-value in the MATLAB version of the test.

The p-value is used in hypothesis testing slightly differently than the way described earlier in this text.
Recall that we would use the level of confidence to determine a critical test statistic (by looking it up in
a table). Then we would compute the test statistic (the mean of the sample) and compare the two.

Excel and MATLAB instead figure out the -value that corresponds to the actual test statistic, which is a
computation that is too complicated for me to expect you to do by hand in this course. But then,
instead of comparing the critical test statistic and the actual test statistic, you just compare the level of
confidence and the p-value. Does that make sense? Theyre analogous, so for example,

Reject null hypothesis if x<xc is the same as reject null hypothesis if p<.

Where MATLAB also makes that decision for us and returns a 0 or 1, Excel makes you interpret the p-
value yourself.

127
Testing for variances is similar to means. MATLAB does both one- and two-sample tests; Excel only does
two-sample tests.

MATLAB: [h,p]=vartest(x,variance,alpha,tail) %one-sample test


[h,p]=vartest2(x,y,alpha) %two-sample test
Excel: =ftest(array1,array2)

Again, Excel just computes a p-value; MATLAB will give one if prompted for a second output but
otherwise just gives a 0 (fail to reject null) or 1 (reject null). The inputs are the same as they would be for
means.

Try these out using Excel or MATLAB, according to your preference and/or the programs capability:

A group of experimenters have been trying to measure the viscosity of water. The results of ten
experimental trials are below (units of cP):

0.7068, 0.8364, 0.4271, 0.7392, 0.6849, 0.5222, 0.6096, 0.6873, 1.0108, 0.9299

The true viscosity of water is 0.653 cP. Perform a statistical test to determine if the experiment agrees
with this value.
(Hint: you may want to refer to section 5.2.4).

A second group of experimenters has this set of trial results (same units):

0.5180, 0.9565, 0.7255, 0.6467, 0.7245, 0.6325, 0.6406, 0.8020, 0.7939, 0.7947

You are asked to determine if the mean values of the two sets of experiments are statistically
equivalent. First you will have to perform a test to see if the variances are equal or not (why?) and then
you can use those results to test the means.
(Hint: sections 5.3.1 and 5.3.3 have the tests that youre looking for. And the computer will do the math
for you you just need to pick the right hypotheses and interpret them!)

128
The other statistical tools weve discussed this week involve interpolation and regression. In Excel, this
isnt particularly difficult you plot your data as a scatter plot, right click the data, and tell it to add a
trendline.

Adding a trendline in Excel by default assumes a linear regression. You can customize the trendline to be
a polynomial or logarithm or a few other functions. You can also have Excel display the equation and R 2
value on a chart (dont forget R2 is meaningless if there are zero degrees of freedom if you have four
points and connect them with a cubic polynomial, you have four constants in the polynomial! Of course
its going to pass directly through them; you just turned your regression into an interpolation).

You can also add (very important!) error bars to data points in Excel. I strongly advise against using
Excels default options (some variant of the standard error computed in both directions). When you
choose to format error bars (chart layout error bars more options), youll get a screen that looks
like this:

You should first compute the actual error in your error bars, then choose Specify Value in this menu.

MATLAB gives us a little more room to play and to do error analysis. The polyfit command will give
you the coefficients for a regression polynomial. It takes three inputs: a vector of x-values, a vector of y-
values, and the order of the polynomial:

[p,s]=polyfit(x,y,n);

The vector p is the set of coefficients for the interpolating polynomial. The structure s is information
that we can feed back into MATLAB for error analysis using polyval, and its convenient because we
can use it without having to do all the error analysis by hand:

[polyy,delta]=polyval(p,x,s)

129
The value delta is the standard error for the data points. If you wanted a 95% confidence interval on
the estimates at each data point, you can plot polyy+1.96*delta and polyy-1.96*delta against x.
(Why 1.96?)

To do polynomial interpolation in Excel or MATLAB, you should have one more point than the order of
the polynomial, then add a trendline or use polyfit as appropriate. Excel is more limited than MATLAB
in the order of the polynomial, but in general you shouldnt need a very high order polynomial.

MATLAB will also compute spline interpolations using the interp1 command:

y_i=interp1(xdata,ydata,x_i,'type')

This command will compute interpolated values y_i corresponding to values x_i (x_i can be either a
scalar or vector) given actual values xdata and ydata. The word in the type should be linear if
you want a linear spline, or spline if you want a cubic spline.

Now for some practice:

Use Excel and MATLAB to plot the quadratic regression line and error bars for the following data:

x 0 1 2 3 4 5
y 2.1 7.7 13.6 27.2 40.9 61.1

Your results should look roughly like this:


70
70 data
60 regression
60
50
50
40
40
30
30
20 20

10 10

0 0

-10 0 1 2 3 4 5 6 -10
0 1 2 3 4 5

Recall that for more complicated regression problems (see section 7.3.5) that you should set up a
customized overdetermined linear algebra problem to solve, and then use the backslash operator in
MATLAB. This is more important for non-polynomial functions, since MATLAB has polyfit, after all,
but here is what you would type for the above problem; please make sure you understand it:

x=0:5;
y=[2.1 7.7 13.6 27.2 40.9 61.1]';
z=[(x.^2)',x',ones(6,1)];
p=z\y;

Confirm that this vector p matches the coefficients found using polyfit.

130
7.6.1 Practice Problem

The following data is from Perrys Handbook for the solubility of sodium chloride in water (g NaCl per
100 g water) as a function of temperature.

Temperature (degrees C) 10 20 30 40 50 60 70 80 90
Solubility (g per 100 g H2O) 35.8 36.0 36.3 36.6 37 37.3 37.8 38.4 39.0

a) Determine the coefficients of the slope and intercept of a linear regression that fits this data (T
is the independent variable).
b) Fit a linear regression line to the above data and plot it together with the data and error markers
for a 95% confidence interval at each data point.
c) Fit a cubic spline to the above data and plot it together with the data (but not the other
regression line) as a plot separate from the plot requested in part (b).
d) A group of experimenters tried to determine the solubility of salt at 50 degrees C and in six trials
came up with values of 36.0, 37.0, 37.5, 36.4, 36.8, and 37.2 g per 100 g of water. Perform a
statistical test using a computer function to determine if the experiment is in agreement with
the literature.

131
8 Numerical Integration and Differentiation
Required Reading/Viewing:

The MATLAB Solving Ordinary Differential Equations videos at


http://www.mathworks.com/academia/student_center/tutorials/computational-math-tutorial-
launchpad.html . Introduction to Solving Ordinary Differential Equations, Numerical Solution to ODEs,
Solving First-Order Systems (about 36 minutes).

Recommended for Further Consideration:

Section 5.5 in Prataps Getting Started with MATLAB or Section 15.4 in Attaways text.

Appendix A.3 of Felder and Rousseaus Elementary Principles of Chemical Processes, 3rd edition
summarizes the two most basic methods of numerical integration.

Chapters 21, 23, 25, and 26-28 of Chapra and Canales Numerical Methods for Engineers, 5th or 6th
edition, provide a much more detailed description of the material covered in this section of the course
notes.

Chapter Goals

Numerically determine the integral of a function. Integrals pop up all over chemical engineering in
solutions to material and energy balances, determination of enthalpy differences for a temperature
change, and sizing of chemical process equipment like tanks and reactors, just to name a few.

Numerically determine the derivative of a function. Derivatives are likewise frequent guests in chemical
engineering applications effectively anywhere dynamics are involved, so especially in fluid dynamics,
heat and mass transfer, chemical reactions, and process control. Being able to numerically estimate a
derivative is critical to complex problems, some of which well even discuss in this course.

Solve systems of first- and second-order differential equations numerically. Understanding the
capabilities of programs like Excel and MATLAB will be very important to knowing what problems you
can solve. In the case of boundary value problems, you should be aware of approaches that may involve
numerically computing integrals and derivatives and using linear algebra.

Theres something about solving differential equations that causes a lot of people to shut down. The
truth is, solving these sorts of problems numerically tends to be a lot easier than solving them
analytically (the methods you learn in MATH 225, your differential equations course). All the effort you
devote to remembering different techniques separation of variables, superposition, annihilators,
variation of parameters is instead replaced with making sure you write your differential equation a
certain way. Then you can entrust that equation to any number of numerical methods methods that,
once youve coded them, are virtually always the same.

In this section, we review some techniques for computing integrals numerically and for approximating
derivatives for when that is important. We then devote the rest of this chapter to solving systems of
ordinary differential equations.

132
8.1 Integration Methods

Integration of functions crops up from time to time in chemical engineering applications in your
transport classes, dealing with multidimensional problems usually means integrating a balance equation
over a certain area or volume; in chemical kinetics, integration of specific functions leads to theoretical
reactor sizes; some process control schemes integrate information over time in order to help predict the
next setting for valves controlling flow processes. In chemical process safety, integrals are used in risk
assessment to estimate the effects of industrial accidents.

In calculus, you have probably already discussed the most common numerical integration techniques
for example, considering the integral of a function as the area under a curve, we can approximate that
area by breaking it up into a bunch of rectangles using the left-hand, right-hand, or midpoint rules,
where you used equally sized rectangles and just read the value of the function at specific points to
estimate their areas. Here, well briefly, but more formally, classify the methods you probably already
know, and throw a couple more into the mix.

Most integration rules are actually based on splines (recall section 7.4). Since splines are typically linear,
polynomial, or trigonometric, and the integrals for such functions are trivial to compute analytically,
most of the work in deriving these formulas is actually from curve fitting!

For a given function f(x) with known values at x0, x1, x2, , xn, (meaning there are n+1 points where the
function is known), the trapezoid rule approximates f(x) with a linear spline. The area under a linear
spline between two data points is the same as the area of a trapezoid, and so,

Written in terms of sigma notation, we can write the trapezoid rule as

If our x-values are evenly spaced, then all the values of xi-xi-1 are the same, and we can factor that part
out of the summation, which condenses the above formula to

Note that if you are writing your own computer code or using your own spreadsheet to compute this
sum, this can be written as a loop or a series of the same computations copied repeatedly and then
summed at the very end. This concept is even more crucial for the more complicated integration
formulas.

133
Instead of using linear splines on the function f(x), its possible to assemble quadratic splines or cubic
splines to fit the same data points. Simpsons 1/3 rule comes from fitting quadratic equations to sets of
three points (as opposed to linear equations to sets of two points) and then integrating under each
parabola. Note that since Simpsons 1/3 rule is drawing a number of quadratic equations connecting
each set of three points, value of n needs to be even.

This result, for evenly spaced values of x, is a little more complicated to display:

Simpsons 3/8 rule is a similar idea, using every four data points to fit a cubic polynomial and integrate
under that. In this case, n must be a multiple of 3.

8.1.1 Practice Problem

A hippopotamus digests food in two phases autocatalytically in its stomach, followed by catalytically in
its intestines. Since digestion is a chemical reaction, after all, its possible to model the intestines as a
tubular reactor, where there is a known relationship between reaction conversion X and volume V:

where Q is the volumetric flow rate of material through the reactor, X0 is the reaction conversion at the
start of the reactor (not necessarily zero especially in the case of reactors in series, and in this case
the intestines are in series with the stomach), Xf is the reaction conversion at the end of the reactor, C0
is the initial concentration of reactant (moles per volume), and r is the reaction rate (moles per volume
per time).

For a typical hippopotamus digestive system, the rate of reaction has been determined to be a function
of conversion4 as the following:

4
Adapted from Fogler, Elements of Chemical Reaction Engineering, 3rd edition

134
This typical hippopotamus has intestines of volume 0.15 m3 and processes 0.13 m3/day of grass.
Assuming that 25% of the digestion takes place in the intestines (so the other 75% took place before
entering them), numerically determine the final reaction conversion of the grass.

(Note: Carefully read this problem. It is not saying that X0=0.75! What is it saying?)

8.2 Taylor Series and Numerical Derivatives

Back in section 6.1.2, we briefly talked about using the definition of the derivative to give us an
approximate value for the derivative of a numerical function:

This is probably the quickest, easiest way to approximate a functions first derivative. Unfortunately, the
error in this approximation is typically about as large as h. We can improve upon this approximation by
making h smaller, or we can look toward more complicated computations for the same h.

Approximations on derivatives are generally determined from the Taylor series of a function:

If we truncate this series after the second term on the right, rearranging it gives us the familiar formula
at the start of this section. Notice also that this function is identical to the first finite divided difference
from section 7.4.

When approximating high-order derivatives, your first estimate will usually be the appropriate finite
divided difference. Regardless of the order of the derivative, the error for this approximate is always on
the same order as h itself. To have them all in one place, the first through fourth finite divided
differences are

(The coefficients in front of the function evaluations might look familiar theyre the binomial
coefficients, or the entries of the rows of Pascals triangle. Isnt math cool?)

135
These divided differences are forward approximations of the derivative because they use points after
the point where you want the derivative. You could likewise solve for backward approximations using
points before the point where you want the derivative; if you take the average of a forward and a
backward approximation, you obtain a centered approximation. When there is enough information to
do so, using such centered finite divided differences is the best way to approximate a derivative. For
example, here are the first and second centered divided differences:

These formulas below are also based on centered divided differences, but include more terms and are
more accurate. In each of the following formulas, the error of the approximation is on the order of h4
(and since h is less than 1, this can become particularly small!):

When would we ever need to approximate the derivative of a function? In chemical engineering, use of
numerical derivatives often accompanies a more complicated problem, such as solving for a
concentration or temperature profile of a system. For boundary value problems, often numerical
approximations of the derivative are necessary to enforce a boundary condition (it is just as likely that
the heat flux related to dT/dt at a boundary is set at a fixed value, as the temperature itself would be
fixed). On a related note, if a temperature profile is already known, differentiation is necessary to
compute heat flux.

Its also possible that a numerical approximation of a partial derivative is necessary, which is not really
that different from a regular derivative. For example, for a function f(x,y),

Mixed partials are just a little more complicated, but its the same idea as the rest basically first you
approximate the derivative in one direction at two different locations, then use those two different
locations to approximate the other derivative. In the formula below, x and y are interchangeable it
(virtually always) doesnt matter which direction you take the derivative first.

136
8.3 First-Order Ordinary Differential Equations

Systems of differential equations arise all over the place in chemical engineering in the differential
form of a balance equation, whenever a system is not at steady state, the result is a differential
equation. Solving such differential equations gives insight to the transient behavior of a system the
way a flow rate changes as a tank is drained, the way a chemical reaction proceeds, the way a piece of
chemical engineering equipment performs between the time it is started up and the time it reaches
steady state.

In some cases, it is possible to solve a differential equation analytically you probably have at least a
little bit of experience with this in calculus, and obviously thats basically the entire point of your
differential equations class. Here, we will not be using any of those methods at all, but rather
implementing numerical methods to find an approximate numerical solution for a system.

A first-order ordinary differential equation is an equation that includes only up to the first derivative of a
function in its formulation. The basic form of this equation is

where here x is the dependent variable and t is the independent variable. The solution to this
differential equation depends on the initial value of the independent variable x 0. We are doing
something a little different here, probably, compared to your previous experiences, so just to reiterate:
x is now the dependent variable that is subject to the independent variable t (usually representing time).

In practice, we will almost always use built-in differential equation solvers to work through problems.
The major work for you will be setting up the differential equation in a way it can be solved so first,
lets see how its done in general, before diving into how to set up more general problems.

The class of solution methods discussed here are the Runge-Kutta methods. Each of these methods
works in basically the same way: given a differential equation dx/dt=f(t,x) and a value for x at a certain
time t, dx/dt is used to approximate the trajectory of x at some future time t+dt. The only difference
between these various methods is the way this approximation is computed.

The general pseudocode for the R-K methods is as follows:

Set the initial value for the dependent variable x and independent variable t.
Set the final value for the independent variable t and the time step size h.
Repeat until the final value of t is reached:
Approximate the derivative dx/dt over the span of ti to ti+h.
Compute the new value for xi+1 at time ti+h as xi+1=xi+dx/dt*h.
Set the value of ti equal to ti+h.

137
Much like in the root-finding chapter, the only major difference between techniques is one part of the
iterated method. First we will look at the most basic way to approximate the derivative, using Eulers
method, before sharing some higher-order computations to achieve more accurate results.

8.3.1 Eulers Method

You probably have encountered Eulers method in some of your introductory math or engineering
courses: what better way to approximate the derivative over the range of [t, t+h] than by simply
evaluating it at the current value of x and t? Given the current value of our independent and dependent
variables, ti and xi, we can estimate the value for xi+1 (the value of x at ti+1=ti+h) as

What Eulers method offers in its simplicity is often Lets go to the Video!
overshadowed by the small step size necessary to achieve a
reasonable solution. The error in Eulers method is on the order Click here for a demonstration of
of the step size, h. This means, of course, that the error can be Eulers method using MATLAB. At
reduced by reducing the step size, but that also means a lot the Mathworks website, you can
more iterations to get to your answer. also download a version of this
implementation.
8.3.2 Higher-Order Runge-Kutta Methods

The idea behind the higher-order R-K methods is that the derivative dx/dt can be better approximated
by using more computations to get there. The general idea is, given current value xi at time ti, the next
value xi+1 can be found as

where the function is an approximation on dx/dt over the entire range of [t,t+h]. The order of the
numerical method employed indicates the approximate total numerical error used in applying that
method the first-order method (Eulers method) has error on the order of h (to the first power), while
fourth-order methods have error on the order of h4.

Eulers method uses the most basic function =f(t,x). For higher-order methods, the function takes on
the form

where n is the order of the method, the values ai are typically prescribed constants, and the
subfunctions ki are increasingly complicated expressions using x, t, and h (and usually the kis that come
before it). Note that for Eulers method, a1=1 and k1 is just f(ti,xi).

MATLAB solvers implement anywhere from second-order to fifth-order R-K methods in many of their
built in ODE solvers; here we will look at just two methods, starting with the modified Euler method,
which is a typical second-order method with the k-functions as follows:

138
In this method, a1=0 and a2=1. This means that the function k1 is indirectly used in the computation of
xi+1 notice that k1 is needed to evaluate k2, but then the next x-value only directly depends on k2.

What is happening in the modified Euler method? Basically, instead of just using the estimate of the
derivative directly at xi, this estimate is used instead to predict a value halfway between x(ti) and x(ti+1).
Then, the derivative of x is estimated at this halfway point, and this is used to approximate the
derivative of the function over the range of [ti,ti+1].

The most traditional version of Runge-Kutta methods is the classical fourth-order method, which seems
in most cases to strike a good balance between computational intensity and acceptable numerical error:

Notice that the first two k-functions are the same as the modified Eulers method.

8.4 Systems of First-Order Ordinary Differential Equations

If we are solving multiple differential equations simultaneously, we can rewrite our dependent variable x
as a vector:

The R-K methods do not significantly change, but the number of computations necessary to implement
them grows. If we have just two differential equations, k-functions must be developed for both x1 and
x2, and these k-functions depend on both x1 and x2. Its probably easiest to show whats going on by
using the modified Eulers method to solve a system of two differential equations:

139
It is also possible to break down a higher-order differential equation to a system of first-order
differential equations. This is because the general form of an nth order differential equation is

and can be rewritten as a series of first-order equations by making a series of substitutions starting with
x=z1.

Notice that z2 is by definition dx/dt, so z3 being dz2/dt means it corresponds to d2x/dt2, and so on. Only
the nth equation resembles the high-order differential equation we started with! With these
substitutions, any order of differential equation of the form here can be transformed into a system of
first-order differential equations.

8.5 Boundary Value Problems

The Runge-Kutta methods are procedures for solving an initial value problem that is, given a
differential equation and a value for the dependent variable(s) at a specific value of the independent
variable, we can numerically solve the differential equation over a range of the independent variable.

If the differential equation we are trying to solve is a second-order equation, in order to solve it as an
initial value problem we would need to know the value for both x and dx/dt at the same value for t,
which turns out to not be very likely. We do require two specific conditions defining the relationship
between t and x, but often we are given x for two different values of t instead. In this particular case, we
are actually solving a boundary-value problem instead.

The types of boundary-value problems we will cover here are just second-order differential equations,
but instead of knowing the initial value on both x and dx/dt at t0, like here:

we know the value of x at two different times t, like this:

We only know how to numerically solve first-order differential equations, so we make a substitution, like
z=dx/dt, to turn our one second-order equation into two first-order equations:

140
The methods weve covered only let us numerically solve differential equations where we have an initial
value completely defined. Now we have the initial value on x, but not on z, its derivative with respect to
t! How do we figure out a value for z at t=t0?

There are two major approaches for solving boundary value problems, and you should become
somewhat familiar with both. Shooting methods essentially combine trial and error with the ODE solving
methods like R-K methods to find a solution. Finite difference methods convert a differential equation
into a linear algebraic system. And you are reasonably familiar with root-finding and matrix equations!

8.5.1 Shooting Methods

The term shooting method comes from the idea that we will guess an initial condition on dx/dt, then
take a shot to see where the trajectory goes. If our guessed value of the initial condition on dx/dt is
the correct guess, then the trajectory should pass through the point (tf,xf). If not, we guess a different
initial condition and go again that is, we keep shooting, until we hit the target point.

Does this sound vaguely familiar? The shooting method is just a root-finding method in disguise!

Start with the problem and an initial guess at the solution to the problem.
(Now the problem is we dont know the initial condition on dx/dt is.)
Specify a desirable approximate error.
Set the current error to some large number.
While the current error is greater than the desired approximate error,
Label the current guess at the solution as the previous solution.
Use the previous solution in some computation to find a new current solution.
(Solve our differential equations using our guess on dx/dt, then see what x is at t=t f.)
Compute the new approximate error.
(See how much closer our current x at t=tf is compared to the previous x.)
Once the current error is less than or equal to the approximate error,
Report the current solution as the numerical solution to the problem.
Substitute the solution back into the problem to make sure the problem is truly solved.

If our differential equation is linear that is, if f(t,x,dx/dt) is a linear combination of t, x, and dx/dt with
no weird multiplications, exponents, logarithms, trigonometric functions, or other weird stuff, then the
shooting method gets even easier. If we guess two different values for the initial condition of dx/dt and
record the values of x(tf) for both guesses, we can linearly interpolate or even linearly extrapolate to find
the correct guess on dx/dt.

141
8.5.2 Finite Difference Methods

A finite difference method converts our differential equation into a linear algebraic system. We begin by
approximating the range of our independent variable t by breaking it into n evenly-sized discrete pieces
of size h:

...

t0=0 t1=h t2=2h tn=nh


... xn=known
x0=known x1=? x2=?

At each point, the differential equation must be true. Only instead of using the differential equation
outright, we use an approximation using the second centered divided difference:

In this formulation, h is the distance from point to point, and so it can be seen that the value of x at
every point ti depends on the value of x at ti-1 and t1+1. Since we know the value of x at the initial and
final value of t, this means we have a system of equations to solve! For i=2 up to i=n-2, this equation
with three unknowns is true:

Technically, this equation is also true for i=1 and i=n, but I call them out a bit separately because x0 and
xn are also known, so these two equations only have two unknowns:

Collectively, we have n equations with n unknowns. If x(t,x) is a linear function, then we can
numerically solve this system with a single matrix equation. (But also, if the second derivative is linear,
then we can also analytically solve this system you probably learned how to do this in calc 1 or 2, and
then were reminded about it in the first week of differential equations.) If x(t,x) is a nonlinear function,
then we need to use special techniques to numerically solve this system of n equations and n unknowns.
The MATLAB solver best equipped for this is fsolve, but for some such equations, you may also be able
to get away with an iterative root-finding method.

142
8.6 MATLAB Exploration: Differential Equations and Advanced Visualization

Reminder: Its in your best interest to watch the following videos before coming to lab.

The MATLAB Solving Ordinary Differential Equations videos at


http://www.mathworks.com/academia/student_center/tutorials/computational-math-tutorial-
launchpad.html . Introduction to Solving Ordinary Differential Equations, Numerical Solution to ODEs,
Solving First-Order Systems (about 36 minutes).

It is also helpful to have read Sections 5.5 and 6.3 in Prataps Getting Started with MATLAB or Section
15.4 in the Attaway text.

By the end of this exploration, you should be able to

Numerically solve initial value problems of ordinary differential equations.


Numerically solve boundary value problems of ordinary differential equations.
Numerically solve steady-state partial differential equation problems.
Plot in three dimensions in MATLAB.

This week, we will rely on built-in MATLAB programs to solve differential equations for us. Unlike your
differential equations course, we focus here on numerical solutions, which are approached in an entirely
different way than analytical solutions. In many cases, numerical solutions exist where analytical ones
do not!

Lets start with a simple example of a differential equation with one independent variable and one
dependent variable. There are multiple ways to solve this in MATLAB, but the quickest is to take
advantage of anonymous functions, which we used last week. The anonymous function that we write
needs to be specified as a function of both the independent and dependent variable, in that order,
whether both are in the equation or not, so be careful!

For example, consider the logistic growth equation for bacteria confined to a specific space:

where x is the biomass of bacteria in micrograms, and 0.3 is the growth rate of the biomass per hour.
The units on dx/dt are then micrograms per hour. Say we initially have 1 mcg of bacteria at time t=0 and
we wish to use this mathematical model to predict the biomass over 24 hours.

In MATLAB, the most condensed way to solve this equation requires two lines of code:

>> dxdt=@(t,x)0.3*x*(1-x/10);
>> [t,x]=ode45(dxdt,[0 24],1);

The result is now stored in two vectors, t and x, so if you want to see it, you should probably plot it:

>> plot(t,x)

143
10

x
4

0
0 5 10 15 20 25
t

Now lets break down the ode45 command. The general format is

[indep_var, dep_var]=ode45(function,[indep_initial indep_final],dep_initial)

though I strongly encourage you to doc ode45 to learn about other possible inputs and outputs. The
function may be written as an anonymous function or as an m-file in the same folder as your Current
Folder. Either way, there is something particular about this function: it must be written as a function of
both the independent and dependent variable (did you just read that on the last page? It is important).
This is true even in cases like the logistic growth example did you notice that the equation dx/dt is a
function of x only? But that we still included (t,x) in the function call? Thats because we have to.

func=@(indep_var,dep_var)expression;

Or an m-file

function output = func(indep_var,dep_var)


output=expression; %if you forget this ; you will be sad because MATLAB will
%be calling this function up to eleventy kabillion times

Try it yourself: the differential equation modeling the height of an ideal draining cylindrical tank is

where A is the cross-sectional area, h is the height, and g is the acceleration due to gravity (32.2 ft/s2). A
cylindrical tank of cross-sectional area 3 ft2 is originally filled to a height of 4 feet. A valve below the tank
is opened, resulting in an area of 0.01 ft2 through which fluid may flow. What is the level of the tank as a
function of time? (How long should you integrate dh/dt? You can just guess and check for now.)

Function:

ode45 call:

144
More often than not, we will have a system of differential equations to solve for instance, how many
times have you had to write exactly one material balance to solve a problem? If you are dealing with a
problem that is not at steady-state, those accumulation terms are not zeroes, but derivatives of mass (or
moles or concentration) with respect to time!

Lets work our way there with another kind of material balance: the population balance.

A predator-prey model tracks the mass of two different species (the predator and the prey), who, in a
closed system, can only be generated (through growth and/or birth) or consumed (through actually
being eaten or some other method of death). If this system is not at steady-state, the mathematical
model is accumulation = generation consumption.

ODE solvers in MATLAB solve systems of differential equations by expressing them as a single vector
differential equation. If this is your first time doing this, its not obvious. Its also probably pretty
confusing to do it as an anonymous function (but it is definitely doable). Here is this vector equation in
both forms:

function dm = dmdt(t,m)

dm=[3*m(1)-3*m(1)*m(2);
-m(2)+m(1)*m(2)];

or as an anonymous function

dmdt=@(t,m)[3*m(1)-3*m(1)*m(2);-m(2)+m(1)*m(2)];

For this example m is a vector of 2 values, which would be m(1) and m(2) in MATLAB. This idea extends,
as you might expect, to any number of values. Here, the anonymous function isnt too bad, but if we had
more terms, more complicated terms, or more equations, or constants that we wanted to define and/or
vary first, its probably easier to write a short m-file. Whatever you choose to get the job done is fine, so
long as you understand it.

The ode45 call varies slightly can you guess how? Since the dependent variable is now a vector, you
will have to specify initial values for the entire vector.

Try programming the above system of two equations, starting at initial values m1=1.2 and m2=1.1,
integrating it from t=0 to t=10.

145
1.4
m1
1.2 m2

m
1

0.8

0 2 4 6 8 10
t

Recall from section 8.4 that there is another way we could encounter a vector differential equation, and
that is when we are trying to solve a higher-order differential equation. Lets consider the temperature
profile of a long, thin wire:
Ta
T1 T2

The wire is attached to two surfaces, which have constant temperatures of T1 and T2. In between, the
wire is exposed to the atmosphere, which has ambient temperature Ta=20 degrees C. In ENCH 427,
437L, and 485L (and already this semester in ENCH 225!), youll look at considerably more complicated
geometries and energy transfer scenarios, but this one is pretty straightforward. With no
generation/consumption terms, there is just flow of energy into or out of the wire. Energy fluxes are
derivatives of temperature, but temperature is changing with location, so we obtain this second-order
differential equation:

where T is temperature (well use degrees Celsius), x is position along the wire (in meters), and h is a
heat transfer coefficient (in this case, a combination of the thermal conductivity of the wire and the
convective heat transfer coefficient between the wire and the air). Well use a value of 0.01 for h. What
are the units on h?

To solve this equation numerically, we need to transform it into a system of first-order differential
equations. Try it yourself (see section 8.4 for the general formulation if you need it) before looking on to
the next page

146
Our second-order equation can be written as two first-order equations if we define a variable such as
z=dT/dx. Then dz/dx is the same as d2T/dx2, and we have

It might be confusing that we have two variables, T and z, because MATLAB needs to solve this system
simultaneously - as a vector differential equation. This is going to be very much like the two-variable
biomass model from earlier in this exploration. Lets call our vector y so that y(1)=T and y(2)=z, which
changes our equations from those above to what? Ill do the first one, and you try the second one

Since we have two equations, we need two initial conditions, one for T (now y(1)) at x=0, and one for z
(y(2), which represents dT/dx) at x=0. But we dont have that. Instead we have two conditions for T
(y(1)) at two different xs! At x=0, T=T1. At x=L (the length of the wire), T=T2.

So we turn to the shooting method. We know at x=0, T=T1. We will guess a value for z at x=0, then use
ode45 to solve our equations up to x=L. Then we will look at the value of T at this point. If T=T 2, then we
guessed z correctly! If not, well have to guess again.

Try it: for a wire of length 10 m, (h is 0.01 and Ta=20 C) with two wall temperatures T1=40 C and
T2=200 C, try a guess of z=9 C/m and use ode45 to solve for T as a function of x from x=0 to x=L.

Define your function and constants:

Write the ode45 call:

[x,y]=ode45(

See what T is at x=L by proving the last value of the first column of y:

disp(y(end,1)) % this is the last element of the first column

Your result is less than 200 C. If you plot temperature versus position, youll see that the temperature is
always increasing, so we need it to increase faster. Try a larger value of y(2) for x=0 and see what
happens.

147
You could keep guessing z-values at x=0, and keep integrating this function, but lets take a moment to
think about what were doing. Weve written a code that does some computations, and what we care
about is that the value of the temperature at x=L is a certain value

This is a root finding problem! The function we are trying to solve is more complicated, but definitely
code-able in MATLAB. Take the lines of code that you used just now to solve this system of equations
and turn that into a short function m-file where the functions only input is your guess on the initial
condition. Then use one of the root-finding methods (perhaps bisection, though linear interpolation is
fastest in this case) from last week to figure out what the correct value of z at x=0 results in T=200 C at
x=10 m.

The other approach to solving boundary value problems is to discretize the independent variable
(usually into equally-sized length, squares, or cubes) and approximate the function using the
approximations for derivatives given earlier this chapter. For the problem we have been considering,
this means rewriting the differential equation

as an approximation that holds at each discrete point:

Since x, h, and Ta are all constant values, this is a linear equation! If we write this for each discrete
value x, we have a system of linear equations, which means its time to break out the linear algebra.

Try it: Using x=2 m, write out the equations for the above differential equation approximation using
x=2,4,6, and 8 m. What would the equations be at x=0 m and x=10 m?

You should be able to write this as a matrix equation and get a result for T(2), T(4), T(6), and T(8). If you
plot this discretized solution against the shooting method solution, youll find that they mostly agree.
Which solution do you trust more? How could you change one method to make the result look more like
the other method?

148
Now for our last exploration topic: 3-D visualization. In the case of solving a system with multiple
independent variables and one dependent variable, it can be helpful to plot a system in three
dimensions. You will plot the solution of a two-dimensional partial differential equation for this weeks
deliverable, but for practice just plotting, download the matrix of data volumes.mat from the Lab
Documents folder on Blackboard, then load it in MATLAB and plot it for a visualization of the steam
tables that depends on pressure and temperature.

The surf command is useful for these kinds of three-dimensional plots. As you might guess from the
name, it plots a surface if you give it the right inputs.

For the volume data given in the matrix V, the temperatures range from 50 C to 800 C in increments of
50 C. The pressures range from 10 bar to 500 bar in increments of 10 bar.

Once you have created vectors for T and P, you can plot a surface with the correct values on the axes:

surf(P,T,V)

If you just type surf(V), what happens? Can you see the difference?

In the Figure Window, theres an icon with a cube surrounded by a circular arrow thats the Rotate 3D
icon. This allows you to position your plot in a meaningful way to most clearly display your results. If you
need a 3D plot for a report, remember to properly label the axes as well!

0.08

0.06

0.04

0.02

0
800
600 500
400
400 300
200 200
100
0 0

149
Its also possible that your plot is a line, but in 3-D. That happens a lot in parametric equations
(remember being able to plot things in polar coordinates in math class? Its a lot like that). For example,
to plot a coil in 3-D, one such system of equations would be

x=cos(t)
y=sin(t)
z=t

for some range of t, say t=0 to t=100.

For every value of t, there is a distinct x, y, and z value. If you want to just plot those ordered triplets of
values (x,y,z) for all values of t, then you should use the plot3 command:

plot3(x,y,z)

100

80

60

40

20

0
1
0.5 1
0 0.5
0
-0.5 -0.5
-1 -1

If your plot looks different than mine above, can you figure out why? Talk this through with me or your
teaching assistant.

This should wrap up the majority of the abilities you are expected to have using MATLAB in this course.
Remember, when in doubt, to use MATLABs help/doc commands to explore functions that are new to
you. If you have the MATLAB help browser open, you can also search for any words, if you know what
you want to do, but dont know the name of the function that does it! Typically, the MATLAB help
browser contents are also the first thing that pops up if you do an internet search for a certain
technique in MATLAB (Mathworks hosts all this information on their website), so do give MATLABs help
browser a chance before running off to consult The Google.

150
8.6.1 Practice Problems

1) Your coworker found out you know a little something about a tubular reactors and so she was
wondering if you could help her with her latest work assignment of modeling a reaction through
a narrow tube. However, in this model, the first-order chemical reaction takes place in a
comparable amount of time to chemical diffusion, the flow of material from high to low
concentration. She has determined a model for the steady-state concentration of reactant c as a
function of distance down the tube x and remembers something about a shooting method, but
doesnt know what to do from there.

In the above equation, k is the reaction rate constant and D is the diffusivity.

The reactant should enter the tube at a concentration of 0.3 moles per liter, but there should be
zero reactant at the other end of the tube. The tubes length is 2 meters; the diffusivity is
1.5x10-2 m2/s, and the reaction rate constant is 0.05 per second. Show her how to use the
shooting method to find the concentration profile down the tube (concentration as a function of
distance) and visualize it using MATLAB.

2) The steady-state temperature profile in a cylindrical vessel housing a combustion reaction can
be described using the following equation:

where r is the normalized position in the vessel (dimensionless) and T is the normalized
difference in absolute temperature (also dimensionless). A value of r=1 corresponds to the end
of the vessel. A value of T=1 corresponds to a 100% increase in the original absolute
temperature of the vessel contents.

If the ends of the vessel (both r=0 and r=1) are held at a constant T=0, determine the normalized
temperature difference as a function of normalized position using the shooting method. You
may assume that the initial value of dT/dr is a positive value.

(Note: your guess for the value of dT/dr at r=0 is important there are actually TWO solutions to
this equation? Can you find them both?)

3) Use the finite difference method to resolve the above problem by discretizing the vessel into
positions r={0, 0.2, 0.4, 0.6, 0.8, 1}. Be careful! The equations needed to implement this method
are not linear. You will need to implement a nonlinear equation solver (root finder) for this
problem.

As before, there are two possible solutions to this problem.

151
4) The dynamics of a bioreactor with two species competing over the same space are described by
the system of equations

a) Using =1 and an initial condition of x1=x2=5, integrate this system of two equations from
t=0 to t=10, then plot x1 versus t and x2 versus t on the same graph. On a separate graph,
plot x1 versus x2.
b) Using every single combination of initial conditions x1={0, 1, 2, 3, 4, 5, 6} and x2={0, 1, 2, 3, 4,
5, 6} thats 49 different initial conditions altogether integrate the above system from t=0
to t=10. Plot only x1 versus x2 for each initial condition, but plot all 49 lines on the same
graph. Explain how what you see in this graph relates to the Newton-Raphson root-finding
method to find the steady-state solution to the above system of equations.
c) Repeat part (b) for =2, 3, and 4. Explain what you observe in steady state solutions as is
increased. Does it make sense if the parameter describes the amount of space species 1
needs to survive?

5) One of the earliest test problems for the shooting method uses an equation that describes the
confinement of a plasma column under radiation pressure. The differential equation is

where sinh is the hyperbolic sine (the function for this in MATLAB is, unsurprisingly, sinh).

For this problem, it is known that y=0 at x=0 and y=1 at x=1. Use the shooting method to do the
following:

a) Determine the value of dy/dx at x=0 (It is known to be between 0 and 10).
b) Plot y as a function of x from x=0 to x=1. (It will look sort of linear, but it isnt exactly.)

6) Use the finite difference method to approximate the solution to the same differential equation
used in the previous problem for x={0, 0.2, 0.4, 0.6, 0.8, 1}. This is not a linear problem, so follow
these steps:

a) Write (by hand) the four nonlinear equations that relate y(0.2), y(0.4), y(0.6), and y(0.8) to
one another. Write each equation in the form f(x)=0.
b) Define an anonymous vector function in MATLAB that collects your four equations from part
(a).
c) Use the fsolve function in MATLAB to find the solution.
d) Take your plot from the previous problem, and in the same plot, plot these four points as
markers on the same plot.

152
7) Use the ode45 command to integrate this system of three chemical species in a closed reactor
system from t=0 to t=1000 s, assuming initial concentrations c1=0.99524 gmol/L, c2=0.003577
gmol/L, and c3=5.028481 gmol/L. If you save the ODE function as a separate m-file, please ALSO
copy it over to the published m-file (and comment it out, so that the file runs as normal but I can
see what you did).

(a) Plot all three concentrations on the same graph with time as the independent variable.
(b) Plot c1 (abscissa) against c2 (ordinate).
(c) Plot all three concentrations against one another in a 3-D line plot.

To make multiple MATLAB plots appear in the same published file, use the figure command
(type the word figure as a separate line just before the new plot) to let MATLAB know you
want all three plots viewable simultaneously.

153
9 Dimensional Analysis and Similarity
Recommended for Further Exploration:

Chapter 7 of Fox and McDonalds Introduction to Fluid Mechanics, 6th edition or newer. This section of
the course notes is a very concise summary of the material found there.

Virtually any book on transport phenomena has at least a section of a chapter on dimensional analysis.

Chapter Goals:

Explain why rendering equations dimensionless is an important skill to have.

Conduct dimensional analysis for a given system and describe what quantities in that system could be
varied to completely explore that system experimentally.

Dimensional analysis is a tool that crops up in several chemical engineering applications that helps us
uncover trends in theoretical and experimental analysis that would otherwise remain hidden. In many
applications of conservation of mass, energy, and/or momentum, the behavior of a system often
depends on specific ratios of the system properties, and sometimes those ratios are not particularly
intuitive (see Table 4 below). Armed with knowledge from dimensional analysis, you can characterize
otherwise complicated systems by uncovering relevant dimensional numbers and conducting
experiments that deliberately vary these numbers. When you encounter a dimensionless number with
a persons name attached to it, its probably because that person is credited with applying dimensional
analysis and conducting experiments related to chemical engineering phenomena in areas like fluid flow,
heat transfer, or mass transfer.

Some dimensionless numbers specific to fluid flow are in Table 4 below. In this course, we will use some
of these numbers to illustrate the method of dimensional analysis so that you are able to perform the
technique itself. Your findings from dimensional analysis can save a lot of time in the laboratory if, for
instance, you are studying a system where the density, volume, and pressure of a system may vary, it
might be illuminating to vary them in specific ways to test a range of Euler numbers instead (and, you
may find that doubling the pressure and density of a system simultaneously yields no change in the
system you are studying!).

Table 4: Examples of Dimensionless Numbers Important to Fluid Dynamics

Dimensionless Number Formula


Reynolds

Euler

Froude

Weber

154
In the table above, is density, V is velocity, L is length, is viscosity, g is acceleration due to gravity,
and is interfacial (surface) tension. Take care in using these groups, because you need to know what
youre taking the density, velocity, length, etc., of. For instance, sometimes length means the diameter
of something spherical or cylindrical. The units may vary, but the important thing is that they all cancel.
Whats relevant here is each of these numbers are ratios and they have no dimension.

It is important to be able to identify any relevant dimensionless numbers that describe a system because
they can be used to inform us on the behavior of a system by studying a related one.

9.1 Rendering Equations Dimensionless

The general procedure for dimensional analysis is as follows:

1) Select the equations relevant to the system you want to model. In fluid mechanics, this is usually
a combination of differential equations and boundary conditions.
2) Select characteristic quantities with which to scale the variables in the equation (for example,
length, pressure, time). The characteristic quantities are constants and they must be
representative of the system.
3) Scale all variables in the governing equations by the characteristic quantities. This will yield
dimensionless equations in which dimensionless groups appear. The values of the dimensionless
groups (numbers) determine the properties of the differential equations.
4) Design and conduct experiments at a reasonable scale to develop data correlations for the
system of interest.
5) (optional) Use the results of data correlations to evaluate existing systems or design new ones.

This technique can be a tricky one to understand and master, and the best way to get better at it is to
work through some examples. So heres one for size:

Lets consider the pressure-driven flow of a Newtonian fluid in a tube for a reasonably high pressure so
that the fluid flow is turbulent. To simplify the equations we have to work with, well assume a system at
steady-state and a long tube, and just look at flow in the direction of the tube. Since this tube is
cylindrical, well have to work with polar coordinates (r, , z) instead of rectangular (x, y, z) ones.

Step 1. Select the relevant equations for the system. In this case, pressure-driven flow of a reasonably-
behaved fluid is dictated by the Navier-Stokes equation, and in a cylindrical tube, all the fun is in the z-
direction only:

where is density, vi is fluid velocity in the i direction, P is pressure, is the viscosity of the fluid, and gz
is the component of the gravity vector parallel to the z-direction.

Step 2. We will choose characteristic length D and velocity V in order to render our equation
dimensionless. Note that D has dimensions of length, and V has dimensions of length per time, so we
should define a characteristic time as D/V and not some other arbitrary expression. Likewise, we can
create a characteristic pressure with our velocity in mind, and the result is a characteristic pressure of

155
V2. (For now, dont worry about how we determined which characteristic quantities should be
expressed how... your ENCH 425 instructor should consider this with you in the fall.)

Step 3. We now reformulate all of our variables in dimensionless form, which well note with asterisks:

Substitute these variables into the original equation. This means you have to solve the dimensionless
variable equations for the original variables, then substitute them into the governing equation. In this
case, we get, with a little bit of algebraic rearrangement,

This may not look like much of an improvement from our original equation, but notice that two distinct
ratios of constants have clustered together now that the equation is entirely dimensionless! Those two
ratios are given in Table 4: they are the Reynolds number and Froude number for the system. The
Reynolds number is a ratio of the inertial forces of a fluid to its viscous forces, while the Froude number
is a ratio of the inertial forces to gravitational forces.

Step 4. Experiments can be conducted by paying attention only to these two dimensionless groups. If for
two different systems, the values for Re and Fr are the same, then the two systems are governed by the
same conservation of momentum equation. If on top of that, the dimensionless boundary conditions are
the same, the two systems are mathematically identical. We call that dynamic similarity.

Dynamic similarity can be used in two ways. For one, we can take advantage of dynamic similarity to
scale up or scale down a process rather than conduct experiments on a very large or very small
apparatus, we can design something reasonable and use data from that experiment to make valid claims
on larger or smaller systems. The other important use of dynamic similarity is data correlation. If we
need to know about the operation of an apparatus, we can conduct experiments on it. If we dont have
the apparatus, we can build a scale model and experiment on that, or we can use others results on scale
models.

Dimensional analysis and dynamic similarity also give us one more very important insight into
experimental design: we can use it to drastically reduce the amount of work necessary to characterize a
system. In the example walked through here, there are a lot of different constants at play: length and
radius of the pipe, velocity, viscosity, gravitational acceleration, and density. Thats six different
constants that have a bearing on our system! And some of those constants are really hard to change.
Even if we could change all six of them easily, the number of experiments needed to study a system
would be enormous. Say we wanted to test the effects of each of these variables by setting them to a
mere 4 different values each. That means there are 4x4x4x4x4x4=4096 different combinations to try
and thats only running one trial of each experiment! On the other hand, performing dimensional
analysis reveals there are really only two values that dictate the behavior of our system, the Reynolds
and Froude numbers. Even continuing with the lazy idea of only varying these numbers to 4 different
values, weve reduced our experimental load down to just 4x4=16 different kinds of trials.

156
9.2 The Buckingham Method

When at all possible, its generally best to render a governing equation dimensionless in order to discern
the dimensionless groups relevant to a particular engineering problem. Unfortunately, not every system
has an obvious system of equations that describe its phenomena, or that system is so complicated that
rendering it dimensionless is the last thing we want to do. There is a more general procedure, known as
the Buckingham method, which can help discern dimensionless groups for much more general
problems. (It can also help in uncovering important characteristic quantities in the case of rendering an
equation dimensionless as in the previous section.)

The Buckingham Pi Theorem states that the relationship among q quantities or variables whose units
may be broken down into u different fundamental dimensions may be written as a function of q-u
independent dimensionless ratios, or parameters. In most cases, these fundamental dimensions are
exactly what you would figure mass, length, time, etc. but the theorem also requires that the
quantities q are linearly independent in terms of the fundamental dimensions u. This means, in a few
cases, the q quantities can actually be expressed in terms of fewer fundamental dimensions if posed a
particular way.

Lets see what were talking about with some examples.

Example 1. Lets consider the flow of an incompressible fluid through a circular pipe with no gravity
effects. From our previous discussion, we mentioned that there are six different values that we could
possibly change to study our system: pipe diameter, pipe length, pressure drop, fluid velocity, fluid
viscosity, and fluid density. So the number of quantities we are working with is 6.

There are 3 fundamental dimensions we are working with: mass M, length L, and time t. The dimensions
on diameter is L, length is L, pressure drop is M/(Lt2), velocity is L/t, viscosity is M/(Lt), and density is
M/L3. The number of dimensionless parameters or parameters is q-u=6-3=3, so there is a function that
relates these three parameters. One way to write such a relationship is to say that one parameter is a
function of the other two:

We then select 3 of the 6 quantities to help us completely characterize the system. The selected
quantities should cover independent combinations of units, so it doesnt make sense to choose both
diameter and length. There are other rules of thumb to consider in this analysis, which well get to in a
little bit. So for now, just to finish this example, lets select diameter d, velocity v, and density as our
core quantities that appear in every dimensionless group.

Next, we seek to find dimensionless combinations of the core quantities with each non-core quantity in
turn. The three dimensionless groups are all products of the core quantities raised to some currently-
unknown power. For example, for 1,

This expression must be dimensionless, so some combination of a, b, and c, must make the dimensions
cancel in all cases. In terms of an equation on the dimensions, we have

157
where the equal sign in brackets represents that this is more an evaluation of units or dimensions
instead of an actual equation. So, the exponents on M, L, and t need to sum together to be zero. We can
then write one equation in terms of a, b, and c for each fundamental dimension, M, L, and t:

By inspection we see that the solution is c=-1, b=-2, and a=0. Substituting these back into our expression
for 1, we get

which is the Euler number! We can repeat this idea for the other two parameters, incorporating
length and viscosity. These two equations should satisfy, for some other values of a, b, and c:

For 2 we obtain b=c=0, a=-1, which gives us l/d as a dimensionless group. For 3, we get a=b=c=-1,
which results in 1/Re, the inverse of the Reynolds number. So, we find our relationship summarized in
three dimensionless groups:

where you may note that I just report the Reynolds number Re instead of 1/Re. This is just because the
Buckingham theorem only points out what ratios drive the behavior of a system, so we dont even really
know how those ratios come into play they could be inverted, squared, raised to a decimal power,
who knows. The only way to find out is through experiment!

And through well-designed experiments, its been shown, in cases where the length is significantly larger
than the diameter of the pipe, the ratio l/d is a negligible factor in fluid flow experiments. But theres no
way to know that without doing those experiments first! (Or I guess you could consult a book or this
course packet, but where is the fun in that.)

158
For a general situation, here is the recommended six-step procedure for determining the parameters:

1) List all q of the dimensional quantities involved. If we neglect to include all the relevant
quantities, well still find dimensionless parameters, but they may not completely characterize
our system. If we accidentally include irrelevant quantities, they will eventually drop out either
mathematically, using the Buckingham method, or experimentally, when you are able to show
the quantity has no effect on the system.
2) Select the u fundamental dimensions. Almost always these are mass M, length L, time t, or
temperature T (but just for completeness, the other fundamental dimensions are moles, electric
current, and luminous intensity). There is a small chance, however, that using the fundamental
dimensions will not result in a system of independent equations. In that case, you must swap
out one or more fundamental dimensions with independent combinations of them.
3) Write out the dimensions of all q quantities in terms of the u fundamental dimensions.
4) Choose a set of u dimensional quantities that includes all of the fundamental dimensions. These
quantities will all be combined with the remaining quantities, one at a time, so they are the
core quantities. No core quantity should have dimensions that are the same or merely a
power of another (so dont include multiple lengths, or length and volume, both as core
quantities). The core quantities may appear in every dimensionless group, so do not include
quantities that would be considered the dependent variable in an experiment (for example,
pressure).
5) Set up equations in terms of the dimensions, multiplying the core quantities raised to
unknown powers together with each of the remaining quantities there will be q-u equations
because there are q-u dimensionless groups. Solve for the exponents on the core quantities for
each equation by making sure all the dimensions cancel out.
6) Check that the groups obtained are indeed dimensionless. Some fluid mechanics books suggest
using force instead of mass as a fundamental dimension in the process of this check step.

The result of the Buckingham method, again, is just a proposed set of dimensionless quantities that are
related. The relationship among these parameters must be determined experimentally.

Step 4 is probably the most frustrating, especially if you are new to using this method. In most fluid
mechanics problems, the result of dimensional analysis has been known for ages, and so the choice in
core quantities is based on whats already known in the field. Thats why, for example, viscosity is rarely
(but very occasionally it is!) a core quantity its just known from experience not to be involved in
multiple dimensionless groups for the same system equations. Likewise, density, velocity, and some
length are almost always chosen as core quantities.

Lets see two more examples in action. The first example is very similar to the one we already
considered see if you can do it using the six steps before reading through the entire worked-out
problem.

Consider the steady flow of an incompressible fluid in a rough pipe. The pressure drop P in this pipe
depends on the pipe length l, diameter d, fluid velocity v, viscosity , density , and the average height
of the roughness, e. We seek a set of dimensionless groups that can be used to correlate experimental
data.

Step 1 is taken care for us in the preceding paragraph: we have seven quantities involved: P, l, d, v, , ,
and e. Note that we are explicitly stating that pressure depends on the rest.

159
For step 2 we will select mass M, length L, and time t as the three fundamental dimensions.

For step 3, we list the dimensions:


P [=] M/Lt2
l [=] L
d [=] L
v [=] L/T
[=] M/Lt
[=] M/L3
e [=] L

Step 4 is to select the core quantities. We choose , v, and d, since those are the usual choices in fluid
mechanics.

We expect there to be q-u=4 dimensionless groups, so we have four equations to write and solve, each
using the non-core quantities once.

By inspection, the result is c=-1, b=-2, and a=0 and the first dimensionless parameter is P/(v2).

All three values are -1 so the dimensionless quantity is /(vd), or 1/Re.

This is the same equation for dealing with both the length and the roughness height. The result is b=c=0
and a=-1, so our last two dimensionless groups are l/d and e/d.

Step 6. Check by using force, length, and time as the fundamental dimensions, since we used mass,
length, and time to find the dimensionless groups.

160
So, this confirms that we have found 4 related dimensionless groups, and we can say the dimensionless
group involving pressure is a function of the rest, since we said P is the dependent variable:

In order to figure out what this functional dependence is, we would have to resort to experiment.

This next example is a case where choosing mass, length, and time as fundamental dimensions gets us
into trouble.

Consider the capillary effect: when a small tube is dipped into a pool of liquid, a meniscus forms inside
the tube. The height of this meniscus relative to the surface of the liquid, h, seems to depend on the
tube diameter d, specific weight, , and surface tension . We are looking for the dimensionless groups
to help us guide an experiment.

Step 1: our quantities are h, d, , and .

Step 2: we choose mass, length, and time as the fundamental dimensions.

Step 3: the dimensions for each quantity are


h[=]L
d[=]L
[=]M/(L2t2)
[=]M/t2

Step 4: we need to choose 3 core quantities, meaning there is only one dimensionless group.

Step 5: the equation is

One obvious solution is b=c=0 and a=-1. Unfortunately, thats not the only solution we could also solve
this system with a=1, c=1, b=-1. So which result is the correct one?

The problem arises because our choice of fundamental dimensions is not independent. If we were to
write a matrix of the dimensions of our core quantities (similar to the way we were doing chemical
compounds back in Section 2.5), we get this matrix:

161
This is not an independent system of equations notice that mass and time appear exactly the same
way in two of our dimensional quantities. This matrix is not of full rank. If you were to do similar analysis
on the core quantities from our previous examples, you would get a matrix of full rank.

Experience tells us that if mass doesnt work in dimensional analysis, then force probably will. So we
begin this problem again

Step 1: our quantities are h, d, , and .

Step 2: we choose force, length, and time as the fundamental dimensions.

Step 3: the dimensions for each quantity are


h[=]L
d[=]L
[=]F/L3
[=]F/L

Notice that time doesnt come into play now. So we really only have 2 fundamental dimensions.

Step 4: we need to choose 4-2=2 core quantities, and there will be two dimensionless groups. A length
will definitely come into play, so we choose d, and either or will work this time Ill choose but it
might be good practice for you to try it with .

Step 5 is to write out the equations:

Here b=0 and a=-1. So the dimensionless group is just h/d.

Here a=-2 and b=-1. That means the other dimensionless group is /(d2).

Step 6: Check that the groups are dimensionless. Since we wound up using force in the analysis, lets go
back to using mass as a dimension to check:

162
This confirms the two dimensionless groups that govern this system, so its off to conduct experiments
to find what the actual functionality is for the equation

9.2.1 Practice Problems

1) The fundamental frequency n of a stretched string is a function of the strings length L, its
diameter D, its mass density , and the applied force F. Suggest a set of dimensionless
parameters relating these five variables.

2) The maximum pitching moment cmax (dimensions are ML2/t2) that is developed by the water on a
flying boat as it lands is presumed to depend on the following variables:

, the angle made by the flight path of the boat with the horizontal
, the angle defining the altitude of the boat
m, the mass of the boat
L, the length of the hull
, the density of the water
g, acceleration due to gravity
R, the radius of gyration of the boat

The variables and are dimensionless already, and represent two dimensionless quantities
that affect cmax. What are the other dimensionless groups that arise when using the Buckingham
method?

3) For unsteady-state heat conduction in a solid, the following variables are proposed to affect the
process: material density , material specific heat capacity cp, length in the dimension of heat
transfer L, time t, material thermal conductivity k, and location within the solid z (same
dimensions as length). Determine the dimensionless groups that relate the variables.

9.3 Similarity

In the motivation and description of rendering equations dimensionless, we mentioned the most
important reasons for the approach are to be able to correlate data and to be able to scale up or
scale down a process to a desired scope. This is because knowledge of dimensionless groups gives us
insight into the underlying mathematical model for a system, and if two systems are mathematically
identical, they give rise to dynamic similarity.

There are two other types of similarity that are important to consider when constructing experimental
prototypes: geometric similarity and kinematic similarity. Two systems are geometrically similar when
they have the same shape and all linear dimensions of the systems are related by the same constant
scale factor. Two flow systems are kinematically similar when the velocities at corresponding points only
differ by a constant scale factor (and stream patterns related by a constant scale factor). If two flows are
kinematically similar, they are also geometrically similar.

163
More formally, two flows are dynamically similar when they have force distributions that differ only by a
constant scale factor but otherwise related in magnitude and direction. All dynamically similar flows are
kinematically similar (but not necessarily vice versa). If dynamical similarity exists, then data measured
from a model flow may be related quantitatively to conditions in an experimental prototype. To
achieve dynamic similiarity, flows must be geometrically similar and have all the same values for each
independent dimensionless group.

Example. The drag F on an object is related to the diameter of the object and the fluid density, viscosity,
and velocity, according to the following dimensionless group functionality:

The full-scale object we wish to use in the field is 12 inches in diameter and will be subject to seawater
(=1.69x10-5 ft2/s, =64.03 lbm/ft3) flowing past it at a rate of 8.44 ft/s. At what air velocity should we
subject a model that is 6 inches in diameter to have dynamic similarity? For air, =1.57x10-4 ft2/s,
=0.07657 lbm/ft3.

For dynamic similarity, all functional dimensionless groups need to be the same. We have enough
information here to compare Reynolds numbers, which must be equal:

Solving for the unknown v, we obtain 157 ft/s.

So we subject the model to a air velocity of 157 ft/s and measure its drag to be 5.58 lbf. What will the
drag on the actual object be? Since the flows are similar, the dimensionless groups should be equal, so

Solving for the unknown force, we get 53.9 lbf.

9.3.1 Practice Problems

1) A 1/6-scale model of a torpedo is tested in a water tunnel to investigate its drag force. Relevant
properties that affect drag force include the torpedo diameter, cross-sectional area, and
velocity, as well as the water density and viscosity. If the actual torpedo is expected to reach a
velocity of 20 knots, what is the equivalent model velocity? If the drag experienced by the model
is 10 lbf, what will be the actual torpedos drag?

2) You have landed a part-time job doing special effects for a very low-tech independent film and
needs a 360:1.0 scale model of a storm on a seashore. In reality, storm waves of amplitude 2.0
meters would occur on the breakwater of the shore at a velocity of 8.0 m/s. You figure that the
variables that will be relevant between the scale model and reality are wave length, amplitude,

164
velocity, period, and acceleration due to gravity. (Experience with analysis suggests using length
and acceleration as the core quantities for best results.)
What should be the size (amplitude) and speed (velocity) of the waves in your model? If you
need the model for time-lapse footage, and if in reality a tidal period is 12 hours, what is the
tidal period for the model?

3) Fluid under turbulent flow with a velocity v inside a pipe of diameter d is subject to heat transfer
through the pipe wall. The heat transfer coefficient h is a function of the variables d, , , cp, k,
and v. Determine the dimensionless groups that predict the behavior of this process. This is a
rare case where you will want viscosity to be a core quantity; thermal conductivity also appears
in multiple dimensionless groups.
Suppose you want to predict the heat transfer behavior of a fluid whose viscosity is twice that of
water. Is there a set of extrinsic properties you can adjust so that the process is mathematically
similar? If so, what are they? If not, what properties would need to be adjusted instead?

165
10 The Scientific Method and Experimental Design
Recommended for Further Consideration:

Sections 9.1-9.3 of Navidis Principles of Statistics for Engineers and Scientists, 1st edition, or any other
statistics text.

Chapter Goals

Describe the differences and interconnectedness between the scientific method and the engineering
method.

Design an experiment with specific intentions for manipulating certain inputs and replicating trials.

Conduct statistical analysis on experiments with repeated trials to make meaningful observations and
conclude with useful results.

Armed with considerable ability to perform numerical computation and statistical analysis, we are
nearly ready to put all of our knowledge to work in a series of laboratory-based experiments. The rest of
this course is heavily focused on three different elements: experimental design, technical
communication, and ethics. Here we will motivate the experimental work we will be doing in the coming
weeks, and provide one last set of statistical tools as they related to your design of experiments.

10.1 The Engineering Method and the Scientific Method

The scientific method is a common staple to science projects, especially those that incorporate science
fair-style projects. It is a formal distillation of the process of scientific inquiry:

Formulate a question (What are you trying to learn?)


Formulate a hypothesis based on that question (Based on current experience, professional literature,
etc., what do you expect to happen?)
Develop an experimental plan (What will you do to explore the hypothesis? What data do you need?)
Predict the result (What results correspond to the hypothesis being correct or incorrect?)
Conduct the experiment (How should data be collected? How much data is enough?)
Compare prediction to experiment (Analyze data)
Form a conclusion (Statistically interpret your data)
Use conclusion to inform new hypotheses (and the cycle starts again!)

Note that the conclusion of a scientific experiment is not that the hypothesis is correct or not, but rather
whether or not the experimental results are consistent with the hypothesis. Very rarely in science do
we stumble upon a new scientific law, but instead we work on developing our understanding with a
working theory. Note also that theory has distinctly different meanings in non-technical and technical
writing in plain English, a theory is a hunch or an idea; here, we are talking about scientific theory, or
an accurate and predictive description of a phenomenon.

166
Meanwhile, the engineering method is similarly cyclical, but instead of seeking to learn about a scientific
theory, we are trying to address a given problem or need:

Identify and define goals and constraints.


Generate ideas (through research and brainstorming).
Select the best approach (sometimes informed by simulation).
Model the solution (create a prototype).
Evaluate the solution (test the prototype).
Refine the solution.
Use solution to inform the goals and constraints of the problem (and the cycle starts again!)

In both engineering and scientific methods, communication is critical and should be required at every
step of the way. This communication can take on many forms, from technical procedures to laboratory
notebooks to formal reports. In scientific experiments, clear and consistent communication is necessary
for the reliable repetition of an experiment. In engineering design, clear and consistent communication
is critical for interacting with customers or clients, for having detailed records for patents, for clearly and
concisely sharing information with co-workers, technicians, management, and others.

The engineering and scientific methods are related largely by the way engineering methods rely on
science to inform decisions. In other words, the generate ideas step of the engineering method
depends on scientific inquiry. In the majority of your time in the chemical engineering curriculum, you
will spend considerable time on engineering design, but for the remainder of this course we will work on
fostering an appreciation for the scientific method.

With that in mind, lets look at each step of the scientific method in more detail.

Formulate a question. What is the goal of this exploration? What are you trying to learn? This seems like
a relatively simple step, but it can be easily overlooked, and the result is often a disjoint or incoherent
experiment. The question helps you to focus your efforts. With that in mind, it should be a single
question (dont try to ask multiple different things at once), an important question (someone should
care to know the answer), and a novel question (in the case of cutting-edge scientific research). Further,
you should be able to defend your choice in question with a rational argument, not just opinion.

Formulate a hypothesis (or hypotheses) based on that question. Your hypothesis is usually a positive
(as opposed to negative, because you can less often disprove a negative) statement with a clear
direction (indication of causality). The hypothesis needs to be something that makes sense statistically,
so that there is an actual difference between rejecting and failing to reject the hypothesis. This
hypothesis must be testable there must be some experiments you can conduct to draw a meaningful
conclusion. This should be a precise and concise statement much narrower than the question you
posed. Finally, its perfectly fine for the hypothesis to be wrong.

There are several ways to write a valid hypothesis, but perhaps two that are especially applicable to our
course: hypotheses that center around manipulation and those that focus on observation. A
manipulation type hypothesis is usually a statement about the way a system will change when
something else happens. For example, a plant will grow faster if it is watered more often. An
observation type hypothesis usually seeks to classify a system (or compare multiple systems) when
conditions cannot be changed. For example, there are more pine trees in Michigan than Maryland.

167
Develop an experimental plan. When considering potential experiments, be sure to consider what data
you will need to evaluate the hypothesis. You should be able to identify which variables are dependent
and independent, and of the independent variables, which ones can be manipulated in your efforts to
explore their effects. Your experimental plan should include careful consideration of equipment needed
for the experiment and the methods required to conduct those experiments. How many different values
of manipulated variables should you test? How many times should you repeat a trial? If you are
investigating a manipulation type hypothesis, do you have a control? How will you know your data is
reliable?

The most important aspect of any experimental plan (or engineering design) is safety. The second most
important aspect of that plan (or design) is also safety. Have you fully identified all hazards associated
with your plan? Have you found Material Safety Data Sheets for all chemicals that will be used in the
lab? What dangers do those chemicals pose, if any? At a bare minimum, basic lab safety is always in
effect appropriate eye protection and closed-toe shoes are worn and long hair is tied back. If there are
chemical, temperature, or other hazards possible, action must be taken to mitigate potential risks.

Predict the result. What do you expect to happen when you conduct this experiment? What might the
data look like? In particular, what would data that supports your hypothesis look like? Be specific, be
precise, and be quantitative. Does the experiment as described provide all the relevant data that you
need? Running through this thought experiment (and documenting it!) can be important in revising
your experiment before it is actually run.

Conduct the experiment. Carefully collect your data and document all observations. (See Jeter and
Donnell, Chapter 2.14.) Be sure to collect enough data. Have a plan for proper treatment of outliers if
you can document what is clearly a mistake, you have a rationale for its exclusion in your report. If no
mistake appears to be made for a data point, it must be included in your data analysis, outlier or not.
How do you know when you are done with the experiment? Were there deviations from your plan?
What? Why? Make sure you have notes regarding all actions and decisions.

Compare prediction to experiment. What data analysis is appropriate for this experiment? How will you
communicate the data and results for this experiment? There are lots of options as far as plots go
(scatter plot? With or without curve fitting? What kind of curve fitting? How do you display parameters?
Bar chart or histogram? How do you express statistical error?), or perhaps a table is more appropriate.
How do you portray independent/manipulated variables versus dependent ones? What units make
sense? (Does it make sense to do dimensionless quantities? If so, what are the characteristic
quantities?) This is where a lot of the statistical analyses from Chapters 3-0 come into play. Its also
possible that you need more sophisticated statistical tools at your disposal we will cover a few more in
the rest of this chapter.

Form a conclusion. What can you say about your hypothesis? It is quite possible that there will be
multiple ways that you can interpret your data, so you must decide which way to interpret it. There
must be a rational basis for this choice. How does your data relate to other references? What did you
learn from this experiment and why is it important (why should we care?)?

If you cannot answer these questions, why not? What went wrong? What other information would you
need to be able to answer them? Can you use your findings to inform new hypotheses?

168
10.2 One-Factor Experiments

Experiments are often conducted to investigate the effect that one or more quantities have on some
outcome. The quantities that are varied in an experiment (the manipulated variables) are called factors.

A one-factor experiment is an experiment where only one system variable is manipulated over a range
of values. The experiment is often repeated several times (these are officially referred to as replicates)
for specific values of that manipulated variable (these are officially called treatments or levels). The
result, the dependent variable, of each trial of an experiment is the response variable or outcome
variable. The question addressed by a one-factor experiment is whether varying the level of the factor
results in a different mean response.

In a completely randomized experiment, we would consider each treatment as a representation of a


population of results with that treatment, and the observed responses as a random sample of that
population. Therefore, we can conduct statistical analysis on these samples, and the question that often
arises concerns the differences in these samples: are the means for the levels actually different? If so,
which ones are different?

10.3 Analysis of Variance (Multi-Sample Test for Means)

Weve discussed hypothesis testing for one and two samples, but its just as likely that we want to
compare more samples than that. Enter Analysis of Variance (abbreviated ANOVA), a technique used to
test claims involving three or more means.

For an ANOVA test, the null and alternative hypotheses are


H0: All population means are equal.
Ha: At least one population mean is different from the others.
If the null hypothesis is rejected, all one-way ANOVA lets us conclude is that not all the means are the
same. We cant specify which means are different without turning to other statistical analysis, which
well discuss when we get into experimental design.

In order to use ANOVA, we must be able to assume that each sample is from a normally distributed
population, is independent from every other sample, and has the same variance as every other sample.
The test statistic for the ANOVA test is the ratio of two variances, so we will use the F-distribution again.
However, computing this test statistic is more complicated than before.

First we must compute the mean square between samples and the mean square within samples. The
mean square between is a measure of the variance due to differences among the different samples,
while the mean square within is a measure of variance for an individual sample:

169
where k is the number of samples (sets of trials) and N is the sum of the sizes of all samples (total
number of all trials), si is the standard deviation for sample i, ni is the sample size of sample i, xi is the
mean of sample i, and x-double-bar is the grand mean of this data the mean of the sample means xi.
The sum of samples between is simply the mean square between before dividing by k-1, while the sum
of samples within is the mean square within before dividing by N-k.

The test statistic F is then the ratio of the mean square between to the mean square within:
F=MSB/MSW.

If there is little to no difference among the means, then MSB will be about the same as MSW, so F will
be close to 1 and we fail to reject the null hypothesis (which, recall, is that there is some difference
among the means). However, if one of the means is statistically significantly different than the others,
MSB will be large and we can reject the null hypothesis. By this line of thinking, its apparent that
ANOVA tests are always right-tailed F-tests. Therefore, if F is greater than a critical value Fc, we reject H0.

Again, since were using the F-distribution, we need values for d.f.N and d.f.D. The degrees of freedom for
the numerator is equal to k 1, where k is the number of samples. The degrees of freedom for the
denominator is N k, where N is the sum of the sizes of all samples.

Example. Four researchers are trying to measure the time it takes a fluid flow experiment to reach
steady state. Each researcher chooses to run the experiment individually, but afterward they consider
putting all their data together. They want to make sure their data at least has the same mean value.
Their data is below.
A: 34, 28, 18, 24
B: 47, 36, 30, 38, 44
C: 40, 30, 41, 29
D: 21, 30, 37, 23, 24
Run an ANOVA test with = 0.05 to determine whether the means are the same or not.

Solution. Well follow the same general five steps as before. As mentioned, the hypotheses are
H0: A = B = C = D
Ha: At least two means are different.
For this test, we have = 0.05, d.f.N = 4 1 = 3, and d.f.D = 18 4 = 14. From here, we can look up the
critical value Fc = 3.34 and well reject H0 if F > Fc. Then we have to compute the test value. You should
find that MSW = 608/14 = 43.4 and MSB = 549.9/3 = 183.3, so F = 4.22, and we reject H0. There is
enough evidence at the 95% confidence level to suggest that at least one mean is different than the
others. In order to determine which mean is different, we could resort to conducting six different two-
sample tests for the mean, or we could implement a strategy given on the next page.

Note that if there are only two samples, you dont really have a factor to test - instead we would just do
a two-sample test on the mean from back in section 5.3.1.

So, all ANOVA tells us is whether all population means are equal or not. If we reject the null hypothesis,
all we know is one mean is statistically different than the others. Usually we want to know a little bit
more than that for instance, which mean is different? There are two tests well consider here (and you
only have to conduct one of them).

170
The Bonferroni method tests C different null hypotheses simultaneously:
H0: i=j
for every possible combination of levels i and j.

The critical value is from the t-distribution with N-1 degrees of freedom but using /(2C) instead of just
.

The test statistics are, for every combination of i and j,

We reject the null hypothesis if |t|>|tc|. Slightly less formally, we can except to reject the null
hypothesis at confidence level in the case that

The Tukey-Kramer method is also popular, but requires that we introduce yet another probability
distribution. The critical value for the hypothesis test comes from the Studentized range distribution,
which requires two degrees of freedom values much like the F-distribution.

The null hypotheses are still posed simultaneously:


H0: i=j
for every possible combination of levels i and j.

The critical value is from the q-distribution with d.f.N=k and d.f.D=N-k degrees of freedom and confidence
level .

The test statistics are, for every combination of i and j,

We reject the null hypothesis if q>qc. Slightly less formally, we can except to reject the null hypothesis at
confidence level in the case that

171
In general, both methods will arrive at the same conclusion for differences in sample means. The results
of these hypothesis tests allow us to state with statistical certainty that there is a difference between
two sample means (and therefore, two different levels of the one-factor experiment).

Example (from Navidi)

The following MATLAB script runs a one-way ANOVA on an experiment where four different welding
materials are tested for their hardness (using the Brinnell scale). Each column corresponds to a different
material (factor), and the rows represent repeated trials (replicates).

trials=[250 263 257 253;


264 254 279 258;
256 267 269 262;
260 265 273 264;
239 267 277 273];
anova1(trials);

The results in MATLAB are below:

Source SS df MS F Prob>F
-----------------------------------------------
Columns 743.4 3 247.8 3.87 0.0294
Error 1023.6 16 63.975
Total 1767 19

The columns in the above table are the between values, and the error are the within values. SS
stands for Sum of Squares and MS stands for Mean Square. Notice that MS is just the SS divided by the
degrees of freedom. The F value is the ratio of MSB to MSW, and the Prob value is the probability that
we fail to reject the null hypothesis. In this case, since the probability is so small, we can probably safely
reject the null hypothesis, which means at least one of the four samples has a different population mean
than another one. ANOVA will not tell us which samples are different, but MATLAB helps to suggest
which might be the case with a box and whisker plot:

172
280

275

270

265

260

255

250

245

240

1 2 3 4

Next we would have to use the Bonferroni method to conclude that samples 1 and 3 differ.

10.3.1 Practice Problems

1) The Fictitious Chemical Company is testing a new procedure for extracting algae from a water
bath and has chosen to measure its effectiveness by how much light can pass through various
samples of treated water. The procedure can be run in one of three ways (labeled A, B, and C
below) and experimentalists have already run several trials on each of the three techniques.
Their data is below.

Percentage of light blocked by treated water samples


Treatment A Treatment B Treatment C
8 4 6
6 7 5
7 8 6
8 6 6
8 3 7
9 9 6
11 7 5

Using 1-Factor Analysis of Variance, determine the level of confidence with which we can say
that any one treatment is statistically significantly better or worse than the others. If there is
some difference among the treatments, determine which treatment have different results (if
you use MATLAB, it provides a way to do this graphically with little additional effort).

173
2) Three groups of scientists have been tasked with experimentally determining the absolute
viscosity (dimensions mass per length-time) of a new synthetic fluid for use in heat exchangers.
A sample of the fluid is heated to 40 degrees C and placed inside a tube called a viscometer, and
the time it takes to flow through the tube is recorded. Assume the samples are all independent.
Using one-way analysis of variance, can we conclude that any one group is statistically different
than the other?

You may assume the density of the fluid is 1.000 g/cm3.

Calibration constant, mm2/s2 Group 1 time, s Group 2 time, s Group 3 time, s


0.003689 191.3 192.0 191.2
0.003880 179.2 177.7 178.4
0.003650 193.4 192.9 194.1
0.003821 188.8 187.4 187.6
0.003579 206.2 205.3 208.2
0.003559 201.7 200.1 199.1
0.003723 184.0 182.0 182.0

3) The maximum hourly emissions (in g/m3) of SO2 from four power plants are listed below. With
what probability can we conclude the emissions are not all the same among the plants?

Plant W: 438, 619, 732, 638


Plant X: 857, 1014, 1153, 883, 1053
Plant Y: 925, 786, 1179, 786
Plant Z: 893, 891, 917, 695, 675, 595

10.4 Two-Factor Experiments and Two-Way Analysis of Variance

As you might expect, a two-factor experiment is one in which two factors are considered
simultaneously. A two-factor experiment requires collecting data for all combinations of the two factors
(which we are stating here because its different than performing two separate one-factor experiments,
where all but one factor is held constant throughout both experiments). Conducting a two-factor
experiment may reveal interactions between the two independent variables in an experiment (where
two one-factor experiments would not).

In a two-factor experiment, its perhaps even more important that all sample sizes for each treatment
combination are the same. It greatly simplifies the math, and we will only consider that case here.

In order to compactly and clearly display the results of a two-factor experiment, experimental results
are collected by row factor and column factor inside a table. There are kr levels of the row factor and kc
levels of the column factor. There are therefore krkc different combinations of the two factors and all
of these combinations must be considered in order to have a complete design.

Armed with a two-factor experiment and its results, we then seek to determine the effects of the
factors. The row effect for each row will indicate the degree to which that rows level tends to produce

174
outcomes that are larger or smaller than the grand mean. The column effect for each column does the
same for column levels. The interaction of the row and column levels must also be determined. How are
these values computed? With two-way analysis of variance, of course!

In two-way analysis of variance, we have three separate null hypothesis to test instead of just
concluding that the various means are equal, we more specifically seek to make claims regarding the
row means, column means, and interactions. First, a bunch of computational definitions:

The mean for a specific treatment combination (the ith row and the jth column factors) is denoted ij.
For any level i of the row factor, the average of all means ij in the ith row is denoted i.. As an equation,

Likewise, for a level j of the column factor, the average of all means ij in the jth column is .j and as an
equation, it looks like this:

The population grand mean is simply denoted and it is the mean of the means.

The row effects, denoted i, are the differences between the row means and the grand mean. The
column effects are written as j and are the differences between the column means and the grand
mean. The interaction ij is a slightly more complicated combination of the specific treatment
combination, its row mean, column mean, and grand mean. All three formulas are below.

The row and column effects are sometimes called the main effects to distinguish them from the
interactions. In the case where all interactions are zero, it is said that the additive model applies. Based
on the definitions above, we can now write each treatment mean in terms of the grand mean, main
effects, and interactions:

When the additive model does not apply, the combined effects of a row level and column level cannot
be determined from their individual effects.

175
Two-way analysis of variance addresses three main questions:
Does the additive model hold? (Are interactions zero?)
Is the mean outcome the same for all levels of the row factor? (Are the row effects zero?)
Is the mean outcome the same for all levels of the column factor? (Are the column effects zero?)
These questions should be addressed in order, because if the additive model is rejected, it does not
make sense to consider the main effects.

Example (from Navidi)

In MATLAB, the matrix of experimental values must be set up in a specific way. Each column still
corresponds to a factor (now called the column factor), but each X rows corresponds to the row factor.
In the experiment below, each combination of the row and column factor was tested four times, so the
first four rows correspond to the same first value of the row factor, the middle four rows correspond to
the same second value, and the final four rows correspond to the same final value.

trials=[86.8 71.9 65.5 63.9


82.4 72.1 72.4 70.4
86.7 80.0 76.6 77.2
83.5 77.4 66.7 81.2
93.4 74.5 66.7 73.7
85.2 87.1 77.1 81.6
94.8 71.9 76.7 84.2
83.1 84.1 86.1 84.9
77.9 87.5 72.7 79.8
89.6 82.7 77.8 75.7
89.9 78.3 83.5 80.5
83.7 90.1 78.8 72.9];
anova2(trials,4);

Notice that in MATLAB, the anova2 function requires a second input, which is the number of repeated
values for each combination of the row and column factors. The result is a table similar to that of one-
way ANOVA, but with additional rows to describe the row factor and the interaction:

Source SS df MS F Prob>F
----------------------------------------------------
Columns 877.56 3 292.521 9.36 0.0001
Rows 327.14 2 163.57 5.23 0.0101
Interaction 156.98 6 26.164 0.84 0.5496
Error 1125.33 36 31.259
Total 2487.02 47

The first part of the table we should look at is the probability value for the interaction. This is the
probability that we can fail to reject the null hypothesis for interaction, which is a rather large
percentage. Therefore, we fail to reject this null hypothesis and conclude that the additive model holds
(is plausible) that is, the mean values of the experiment can be deduced from knowing the row and
column effects, with no major contributions from specific combinations of rows and columns.

Next we can look at the row effects and column effects. Again, the probability column is the probability
that we fail to reject the null hypotheses, which is rather small in both cases. Therefore, we can reject
the null hypothesis for both the row and the column effects, which means there is a difference in

176
population means based on certain values of the column factor, and there is a difference in population
means based on certain values of the row factor.

Formal statistical tests to determine which means are statistically different are outside the scope of this
course, but it might be instructional at this point to perform a multiple regression on the data in order to
discern trend (see section 7.3.5).

10.4.1 Practice Problem

The following is a partial Two-way ANOVA table for the results of an experiment in which a pesticide was
applied to a sample at various concentrations and for different lengths of time and then measured to
determine the amount absorbed by the sample.

Source SS Df MS F Prob>F
Concentration 49.991 2 ?? ?? ??
Duration 19.157 2 ?? ?? ??
Interaction 0.337 4 ?? ?? ??
Error 6.250 27 ??
Total ?? ??

(a) Complete the above table. In order to compute the values for the probability column, use this
fcdf function in MATLAB, as follows:
>> prob=1-fcdf(F_value,df_source,df_error)
You may consider testing out the fcdf function on the ANOVA tables elsewhere in this
chapter to make sure you have it right (keeping in mind there will be rounding error).
(b) Explain whether the additive model is plausible or not. Do either the concentration or the
duration of the application affect the amount absorbed?

10.5 Factorial Design

Many experiments involve varying several factors, not just one or two. That is, there are more than two
independent variables that might affect a single dependent variable. In this case, we quickly start to run
into a problem of combinatorics: were still going to need replicates, so how many trials of an
experiment are we talking, here, to determine the effects of more than two variables? If we have three
factors (independent variables), and we only test three different values (levels) for those factors, that is
nine trials without even taking replicates into account!

When we are varying p factors in an experiment, generally we restrict ourselves to two levels (low and
high or -1 and 1) because the number of experimental trials grows exponentially with the number
of factors. Such experiments, appropriately, are referred to as a 2p factorial experiments.

Lets consider an example of investigating the chilling of soda from room temperature. Say you have
designed a 23 factorial experiment to vary three different factors the container the soda is in (can
versus bottle), the time you have to cool it (30 versus 60 minutes), and the location you place the soda
(refrigerator versus freezer). The dependent variable is the temperature of the soda after each
experiment. It is hoped that all three factors are independent, but we have to show this statistically.

177
As in the past couple of sections, we will turn to ANOVA to determine if there are statistically significant
effects on our dependent variable. The MATLAB function for n-way ANOVA is called anovan. It works
mostly like anova1 and anova2, but by default, does not test for the interactions between parameters.
We will change the default settings to allow for this analysis.

First, lets note that we will have to perform eight different trials (thats not counting replicates) to
account for all combinations of container, cooling location, and cooling time. The table below shows al
the combinations, plus the results of three replicates for each combination.

Factor (Independent Variable) Response (Dependent


Variable)
Time (X1) Location (X2) Container (X3) Temperature
30 Fridge Can 64, 65, 66
60 Fridge Can 60, 61, 62
30 Freezer Can 41, 42, 43
60 Freezer Can 27, 29, 31
30 Fridge Bottle 64, 65, 66
60 Fridge Bottle 60, 61, 62
30 Freezer Bottle 43, 43, 46
60 Freezer Bottle 28, 29, 30

To set this problem up in MATLAB, we have to be careful the same way that we were back when we
were doing multiple linear regression we can include the data in any order, so long as each column
corresponds to the same experiment. I will use L to represent low (30 minutes, fridge, and can) and
H to represent high (60, freezer, bottle) values of the three factors.

time = ['L' 'H' 'L' 'H' 'L' 'H' 'L' 'H'];


location = ['L' 'L' 'H' 'H' 'L' 'L' 'H' 'H'];
container = ['L' 'L' 'L' 'L' 'H' 'H' 'H' 'H'];
time=[time time time];
location = [location location location];
container = [container container container];

T=[64 60 41 27 64 60 43 28 65 61 42 29 65 61 43 29 66 62 43 31 66 62 46 30];

anovan(T,{time,location,container},'model','full')

The anovan function needs to be told to run the full model to consider interaction effects. This means
it will perform hypothesis tests for individual factors, and all multiples of factors. The result is a table like
the one below:

178
Source Sum Sq. d.f. Mean Sq. F Prob>F
---------------------------------------------------------
X1 486 1 486 299.08 0
X2 4374 1 4374 2691.69 0
X3 1.5 1 1.5 0.92 0.351
X1*X2 150 1 150 92.31 0
X1*X3 1.5 1 1.5 0.92 0.351
X2*X3 1.5 1 1.5 0.92 0.351
X1*X2*X3 1.5 1 1.5 0.92 0.351
Error 26 16 1.625
Total 5042 23

As in the case for two-way ANOVA, the final column gives us the level of significance with which we can
reject the null hypothesis. The above results have two important implications:

Since we can only reject the third null hypothesis with a significance of 0.35 (i.e., 65%
confidence), we must fail to reject the null. Its plausible that soda container has no effect on
final temperature.
Since we can reject the fourth null with a significance of virtually 0 (i.e., virtually 100%
confidence), there is interaction between the time and location values.

The conclusion is that the model that best matches the above data is likely of the form

where t is time, L is the contribution from the location (almost certainly the ambient temperature), f is
some unknown function, and the a-values are fitting parameters. The next steps include conducting
more experiments (to probe more combinations of time and location) and using theory to suggest the
form of the function f. N-way ANOVA automatically assumes all functions f are the product of the
variables.

10.6 Model Selection

After running ANOVA, a common follow-up question is, what model or equation best describes the
relationship between the outcome of an experiment and its independent factors? This is not an easy
question, and right away, I want to point out four things:

1) When there is little or no known physical theory to support an experiment, several different
functions will fit experimental data equally well. Sometimes there simply is no way to know
which model is best.
2) There are several accepted methods for choosing a function, and the statistics used to test the
function can contradict one another meaning if you run each of the accepted methods, they
may each suggest a different model is best.
3) The proposed factors may not even be related to the outcome ANOVA can give us some
insights, but even then, the experiment should be repeated to test these insights.
4) Choosing a model is not governed by rules, but more suggestions.

There are literally entire textbooks devoted to model selection. So this section of the course packet is
going to be a quick and dirty introduction. Lets start with Occams Razor:

179
The best scientific model is the simplest model that explains the observed facts.

This leads us to the following suggestion for this course:

The model equation should contain the smallest number of variables/parameters necessary to explain
the data.

But as with all suggestions or heuristics, there are known exceptions. Here are the big ones:

A linear model should always contain both a slope and an intercept, unless there is some
physical theory that suggests there is no intercept.
If a power xn is included in a model, then all powers less than that (xn-1, xn-2, ..., x2, x, plus an
intercept) should also be included, unless there is some physical theory that suggests otherwise.
If a product xixj is included in a model, then the individual variables xi and xj should also be
included, unless there is some physical theory that suggests otherwise.

Hopefully by reading those above bullets you noted some repetition. Its hugely important that we rely
on theory as much as possible to guide the form of our model.

Aside from using ANOVA, there is one other hypothesis test I want to show you that can help determine
whether a variable can be removed from a model. Say that we know the following model with p+1
parameters is known to describe an outcome:

where ei is the error or residual for the data point. Lets refer to this as the full model.

We want to test the null hypothesis

If this null hypothesis is valid, the model will remain correct if we drop the variables from xk+1 to xp,
resulting in the reduced model

To compute the test statics for this null hypothesis, you will need to compute the sum of squared errors
(or SSR, sum of squared residuals, same thing) for both models. Then the test statistic is

The critical F-value is found using p-k degrees of freedom in the numerator and n-p-1 degrees of
freedom in the denominator.

In this course, ANOVA is usually sufficient for developing a model (keeping Occams Razor and the other
above heuristics in mind as you pick an equation), when you do not have any known theory to apply.

180

You might also like