Lecture 11 Regression

Computational Methods in
Chemical Engineering
LECTURE 11
LINEAR AND NONLINEAR REGRESSION
CURVE FITTING
ÖZGE KÜRKÇÜOĞLU-LEVİTAS
Linear Least-Squares Regression
 Derive an approximating function that fits the shape or

general trend of the data without necessarily matching the
individual points.
 One approach is least-squares method.
 Simplest case is fitting a straight line:
y= 𝑎0 + 𝑎1 𝑥 + 𝑒
 a0: intercept
 a1: slope
 e: error or residual e=y−𝑎0 − 𝑎1 𝑥
Discrepancy between the true value of y
and the approximate value a0+a1x
predicted by linear equation
 Best line through the data would be to minimize the sum of
the squares of the residuals (errors).
𝑛
2
𝑆𝑟 = ෍ 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖
𝑖=1
n: total number of point
yi
xi
𝑛
𝑆𝑟 = ෍ 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 2
𝑖=1
 To determine a0 and a1:
= 0 minimize Sr
= 0 minimize Sr
 Here,
 Then:
2 equations with 2 unknowns:
Called Normal equations
x- and y- are mean values

Example: fit a straight line to data
y= 𝑎0 + 𝑎1 𝑥
Calculate the means
y= −234.3 + 19.5𝑥
y
y= −234.3 + 19.5𝑥
Quantification of Error of Linear Regression
𝑛
 Sum of squares: 𝑆𝑟 = ෍ 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 2
𝑖=1 𝑛
 Discrepancy between the data and the mean 𝑆𝑡 = ෍ 𝑦𝑖 − 𝑦ത 2
𝑖=1
Standard deviation for the regression line:
Standard error of estimate

Standard deviation for the regression line:
Spread of the data around the mean
Linear regression is better

Small residual error Large residual error
Correlation coefficient r
Coefficient of determination r2
A perfect fit:
Sr = 0 and r2 = 1,
 line explains 100% of
the variability of the data.
 IMPORTANT!
 Just because r2 is close to 1 does not mean that the fit is
necessarily good. It is possible to obtain a high r2 for x and
y that are not linear.
Anscombe’s four data sets along with the best-fit line, y = 3 + 0.5x
Linearization of Nonlinear Relationships
lny=lna1+b1x 1/y=1/a3+b3/a3/x
logy=loga2+b2logx
 exponential model
 power model
 saturation-growth-rate model
 The relationship between x and y is not always
linear.
 1st step in any regression analysis is: plot and visually
inspect data.
Example:
 Fit power equation to the data by making a logarithmic

transformation
linearization
 Mean values
Fit of the transformed data
logy=loga2+b2logx
The power equation

How to solve in MATLAB?
>> x = [10 20 30 40 50 60 70 80];

>> y = [25 70 380 550 610 1220 830 1450];
>> [r,m,b] = regression(x,y)

r=
0.9383 %Also calculate r2= 0.8804
m=
19.4702 %a1
b=
-234.2857 %a0
>> plotregression(x,y)
y= −234.3 + 19.5𝑥
 OR, fit a linear logarithmic equation by,
>> [r,m,b] = regression(log10(x),log10(y))
r=
0.9737
m=
1.9842
b=
-0.5620
function [a, r2] = linregr(x,y)
% linregr: linear regression curve fitting
% [a, r2] = linregr(x,y): Least squares fit of straight
% line to data by solving the normal equations
% input:
% x = independent variable
% y = dependent variable
% output:
% a = vector of slope, a(1), and intercept, a(2)
% r2 = coefficient of determination
n = length(x);
if length(y)~=n, error('x and y must be same length'); end
x = x(:); y = y(:); % convert to column vectors
sx = sum(x); sy = sum(y);
sx2 = sum(x.*x); sxy = sum(x.*y); sy2 = sum(y.*y);
a(1) = (n*sxy-sx*sy)/(n*sx2-sx^2);
a(2) = sy/n-a(1)*sx/n;
r2 = ((n*sxy-sx*sy)/sqrt(n*sx2-sx^2)/sqrt(n*sy2-sy^2))^2;
% create plot of data and best fit line
xp = linspace(min(x),max(x),2);
yp = a(1)*xp+a(2);
plot(x,y,'o',xp,yp)
Chapra 3rd ed.
grid on
 Built-in function polyfit fits a least-squares nth order polynomial to data
as,
>> p = polyfit(x, y, n)
For our example n=1 since straight line is a 1st order polynomial.
>> x = [10 20 30 40 50 60 70 80];

>> y = [25 70 380 550 610 1220 830 1450];
>> a=polyfit(x,y,1)
a=
19.4702 -234.2857 %y= −234.3 + 19.5𝑥
>>z=polyval(a,45) % evaluate y at x=45 using the coefficients in a

z=
641.8750
Function:
 power y = bxm
 exponential y = bemx or y = b10mx
 logarithmic y = mln(x)+b or y = mlog(x)+b
 reciprocal y = 1/(mx+b)
First rewrite the functions in a form that can be fitted

with a linear polynomial (n=1)
y = mx+n
 power ln(y) = mln(x)+ln(b)

 exponential ln(y) = mx+ln(b) or log(y) = mx+log(b)
 reciprocal 1/y = mx+b
For a given data it is possible to foresee which of the functions
has the potential for providing a good fit. This is done by
plotting the data using different combinations of linear &
logarithmic axes.
x-axis y-axis Function
linear linear y=mx+b
logarithmic logarithmic y=bxm
linear logarithmic y=bemx OR y=b10mx
logarithmic linear y=mln(x)+b OR y=mlog(x)+b
linear linear (plot 1/y) y=1/(mx+b)

Function polyfit
power y=bxm polyfit(log(x), log(y), 1)
exponential y= bemx polyfit(x, log(y), 1)

y=b10mx polyfit(x,log10(y),1)
logarithmic y = m ln(x) + b polyfit(log(x), y, 1)
y=mlog(x)+b polyfit(log10(x),y,1)
reciprocal y = 1/(mx+b) polyfit(x,1./ y, 1)
Other considerations
 Exponential functions cannot pass through the origin

 Exponential functions can only fit data with all positive y’s
or all negative y’s
 Logarithmic functions cannot model x=0, or negative
values of x
 For the power function y=0 when x=0
 The reciprocal equation cannot model y=0
Example:
t 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
w 6.00 4.83 3.70 3.15 2.41 1.83 1.49 1.21 0.96 0.73 0.64
5
• Data is first plotted with
linear scales on both axis.
4
w 3
• Power function ?
2 • Logarithmic function ?
1 • Reciprocal or exponential ?
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
Linear-linear
power:
w=btm ------ polyfit(log(t), log(w), 1)
logarithmic
w = m ln(t) + b ---- polyfit(log(t), w, 1)
w=mlog(t)+b ---- polyfit(log10(t),w,1)
reciprocal
w = 1/(mt+b) ------polyfit(t,1./ w, 1) --- 1/y = mx+b
exponential
w= bemt -----polyfit(t, log(w), 1) ---- ln(w) = mt+ln(b)
x Linear- y logarithmic x Linear – y reciprocal
1.6
More or less linear 1.4
1.2
Not linear
exp 1
0.8
0.6
0
10 0.4
0.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
>> p = polyfit(t,log(w),1); %exp form 5

Fits well
>> m = p(1); 4
>> b = exp(p(2));
3
>> tc = 0:0.1:5;
>> wc = b*exp(m*tc); 2
>> plot(t,w,'o',tc,wc) 1
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
General Linear Least-Squares and
Nonlinear Regression
1. Exponential Model:
𝑦 = 𝑎𝑒 𝑏𝑥
For this case, the sum of the squares of the residuals is,
Differentiate Sr with respect to a and b

 Rearrange to obtain:
From 1st equation:
b can be found by
numerical methods
(such as bisection)
Example:
Many patients get concerned when a test involves injection of a

radioactive material.
For example for scanning a gallbladder, a few drops of Technetium-99m
isotope is used. Half of the Technetium-99m would be gone in about 6
hours. It, however, takes about 24 hours for the radiation levels to reach
what we are exposed to in day-to-day activities.
Below is given the relative intensity of radiation as a function of time.
Use the exponential model

Find:
γ= 𝐴𝑒 λ𝑡 The value of the regression constants A and λ
 Plot data first:
>> t=[0 1 3 5 7 9];
>> gamma=[1. 0.891 0.708 0.562 0.447 0.355 ];
>> plot(t,gamma,'o')
γ = 𝐴𝑒 λ𝑡
λ is found by solving the nonlinear equation
A found from
λ = -0.1151 A=0.9998
γ = 0.9998𝑒 −0.1151𝑡
>> t=[0 1 3 5 7 9];
>> gamma=[1. 0.891 0.708 0.562 0.447 0.355 ];
>> plot(t,gamma,'o')
>> hold on
>> x=0:24;
>> plot(x,0.9998*exp.(-0.1151.*x))
γ = 0.9998𝑒 −0.1151𝑡
@24 h,
γ=6.31 x 10-2
After 24 hours, only 6.3% of

radioactive material is left
2. Polynomial Regression:
Suppose that we fit a 2nd order polynomial (or quadratic)

𝑦 = 𝑎0 + 𝑎1 x+𝑎2 𝑥 2 + 𝑒
For this case, the sum of the squares of the residuals is,
 To generate the least-squares fit, we take the derivatives of Sr with
respect to each unknown coefficients of the polynomial,
 Then, these equations can be set to zero and rearranged,
3 unknowns, solve for a0,

a1 and a2
 For mth order polynomial,
 Standard error,
Example:
 Fit a second-order polynomial to the data

 Simultaneous linear equations:
 Use MATLAB
>> N = [6 15 55;15 55 225;55 225 979];
>> r = [152.6 585.6 2488.8];
>> a = N\r
a =
2.4786
2.3593
1.8607
Standard error of the estimate
Coefficient of determination
3-General Linear Least Squares
𝑦 = 𝑎0 𝑧0 + 𝑎1 𝑧1 +𝑎2 𝑧2 +𝑎3 𝑧3 +…+𝑎𝑚 𝑧𝑚 +e *
z0, z1, …, zm: m+1 basis functions
For simple linear regression: z0= 1, z1 = x

For polynomial regression: z0= 1, z1 = x, z2 = x2 … zm = xm
Equation (*) in matrix notation:
𝑦 = 𝑍 𝑎 + 𝑒
𝑦 = 𝑍 𝑎 + 𝑒
[Z] is a matrix of the calculated values of the
basis functions at the measured values of the
independent variables
m: number of variables
n: number of data points
n≥ 𝑚 + 1 : Z may not be a square matrix
observed values of the dependent variable
unknown coefficients
residuals
sum of the squares of the residuals

Minimize Sr by taking its partial derivative with
respect to each of the coefficients then set the
resulting equation equal to zero
Normal equations:
coefficient of determination
Example:
 Fit a second order polynomial to the data
𝑦 = 𝑎0 + 𝑎1 𝑥+𝑎2 𝑥 2
Use MATLAB
>> x = [0 1 2 3 4 5]';
>> y = [2.1 7.7 13.6 27.2 40.9 61.1]';
% Create the Z matrix 𝑦 = 𝑎0 + 𝑎1 𝑥+𝑎2 𝑥 2

>> Z = [ones(size(x)) x x.^2] 1 𝑥 𝑥2
Z =
1 0 0
1 1 1
1 2 4
1 3 9
1 4 16
1 5 25
% [Z]T[Z] results in the coefficient matrix for the normal
equations
>> Z'*Z
ans =
6 15 55
15 55 225
55 225 979
% solve for the coefficients of the least-squares
>> a = (Z'*Z)\(Z'*y)
ans =
2.4786 %a0
2.3593 %a1
1.8607 %a2 𝑦 = 2.4786 + 2.3593𝑥+1.8607𝑥 2
%compute the sum of the squares of the residuals

>> Sr = sum((y-Z*a).^2)
Sr =
3.7466
%r2 can be computed

>> r2 = 1-Sr/sum((y-mean(y)).^2)
r2 =
0.9985
% Sy/x standard error can be computed
>> syx = sqrt(Sr/(length(x)-length(a)))
syx =
1.1175
 OR
>> x = [0 1 2 3 4 5]';
>> y = [2.1 7.7 13.6 27.2 40.9 61.1]';
>> Z = [ones(size(x)) x x.^2];
>> a = polyfit(x,y,2)
a=
1.8607 2.3593 2.4786
 OR 𝑦 = 2.4786 + 2.3593𝑥+1.8607𝑥 2
>> x = [0 1 2 3 4 5]';
>> y = [2.1 7.7 13.6 27.2 40.9 61.1]';
>> Z = [ones(size(x)) x x.^2];
>> a = Z\y
a=
2.4786 2.3593 1.8607
4-Nonlinear regression:
There are many cases in engineering and science where
nonlinear models must be fit to data. These models have a
nonlinear dependence on their parameters, such as,
𝑦 = 𝑎0 1 − 𝑒 −𝑎1𝑥 + 𝑒
This equation cannot be manipulated into a linear form.
However, coefficients may be found using optimization

techniques to directly determine the least-squares fit.
 An objective function to compute the sum of squares:
f(𝑎0 , 𝑎1 ) = σ𝑛𝑖=1(𝑦𝑖 − 𝑎0 1 − 𝑒 −𝑎1𝑥 )2
An optimization routine can be used to determine a0 and a1

that minimize the function.
MATLAB’s fminsearch built-in function can do that
optimization.
[x, fval] = fminsearch(fun,x0,options,p1,p2,...)
x = a vector of the values of the parameters that minimize the

function fun, fval = the value of the function at the minimum,
x0 = a vector of the initial guesses for the parameters,
options = a structure containing values of the optimization
parameters as created with the optimset function
p1, p2, etc. = additional arguments
Example:
 Fit the power model to data
Previously, we fit the power model to

data from by linearization using
logarithms, and found:
Now use nonlinear regression.

 Let’s create an M-file function to compute the sum of the
squares. Call it fSSR.m,
function f = fSSR(a,xm,ym)
yp = a(1)*xm.^a(2);
f = sum((ym-yp).^2);
 In the command line,
>> x = [10 20 30 40 50 60 70 80];
>> y = [25 70 380 550 610 1220 830 1450];
The minimization of the function is then implemented by

>> fminsearch(@fSSR, [1, 1], [], x, y)
ans =
2.5384 1.4359
y
Difficult to tell which model describes the data best

Lecture 11 Regression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 11 Regression

Uploaded by

Copyright:

Available Formats

Computational Methods in

 Derive an approximating function that fits the shape or

n: total number of point

 To determine a0 and a1:

x- and y- are mean values

Standard deviation for the regression line:

Standard error of estimate

Spread of the data around the mean

Linear regression is better

 Fit power equation to the data by making a logarithmic

The power equation

>> x = [10 20 30 40 50 60 70 80];

>> [r,m,b] = regression(x,y)

>> x = [10 20 30 40 50 60 70 80];

>>z=polyval(a,45) % evaluate y at x=45 using the coefficients in a

First rewrite the functions in a form that can be fitted

 power ln(y) = mln(x)+ln(b)

logarithmic logarithmic y=bxm

linear logarithmic y=bemx OR y=b10mx

logarithmic linear y=mln(x)+b OR y=mlog(x)+b

linear linear (plot 1/y) y=1/(mx+b)

exponential y= bemx polyfit(x, log(y), 1)

 Exponential functions cannot pass through the origin

More or less linear 1.4

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0

>> p = polyfit(t,log(w),1); %exp form 5

Differentiate Sr with respect to a and b

From 1st equation:

Many patients get concerned when a test involves injection of a

Use the exponential model

After 24 hours, only 6.3% of

Suppose that we fit a 2nd order polynomial (or quadratic)

 Then, these equations can be set to zero and rearranged,

3 unknowns, solve for a0,

 Fit a second-order polynomial to the data

Standard error of the estimate

z0, z1, …, zm: m+1 basis functions

For simple linear regression: z0= 1, z1 = x

Equation (*) in matrix notation:

observed values of the dependent variable

sum of the squares of the residuals

 Fit a second order polynomial to the data

% Create the Z matrix 𝑦 = 𝑎0 + 𝑎1 𝑥+𝑎2 𝑥 2

%compute the sum of the squares of the residuals

%r2 can be computed

This equation cannot be manipulated into a linear form.

However, coefficients may be found using optimization

f(𝑎0 , 𝑎1 ) = σ𝑛𝑖=1(𝑦𝑖 − 𝑎0 1 − 𝑒 −𝑎1𝑥 )2

An optimization routine can be used to determine a0 and a1

x = a vector of the values of the parameters that minimize the

 Fit the power model to data

Previously, we fit the power model to

Now use nonlinear regression.

The minimization of the function is then implemented by

Difficult to tell which model describes the data best

You might also like