Professional Documents
Culture Documents
Biocontrol - Week 3, Lecture 1: Goals of This Lecture
Biocontrol - Week 3, Lecture 1: Goals of This Lecture
Goals of this lecture Background on system identication - Fitting models - Selecting models Suggested readings System identication: theory for the user L Ljung, Prentice-Hall
Model
d = M ()
d = {d1 , ..., dN }
= {1 , ..., M }
Unknown parameters
independent of time: Parametric identication (t) depends on time and d = {d(t1 ), ..., d(tN )}
Examples:
Predict tomorrows trac on the 405 based on historical data Current average velocity on the 405 between Wilshire and Santa Monica Blvd Yesterdays average velocity distribution on the 405.
ESTIMATOR
Model class
choice depends on the questions we want to answer! linear or non-linear grey box (rst principles) or black box (I/O) parametric or non-parametric (functions)
M = {M (, d) : Q}
within the model class, choose the model that best ts the data according to a certain performance measure
Validation criterion
Elisa Franco, Caltech
depends on the model purpose - convergence, error variance, consistency wrt new data...
4
data
validation data
Prior information
Model selection
Model validation Most identication procedures consider a class of models that are linear, discrete time, lumped parameters, single output.
Elisa Franco, Caltech 5
Estimators
Data are noisy
Estimated quantities are therefore random variables.
p(d, ) 1 2
Unbiased estimates.
x
p(d, ) V1
Minimum variance: between two estimators, pick the one that gives estimates with the least variance characteristics.
V2
p(d, )
V3 : N3 > N2
V1 : N1
V2 : N2 > N1
Estimates that converge in mean square (in some sense): the more data we add to our set, the smaller the variance of the estimator output should be.
x
p(d, )
probability density function of the data, given a certain value of the parameters
. I F () = E
. F Iij () = E
2 ln p(d, ) i j
ln p(d, )
Var[] (I F )1
Cramr-Rao inequality
Given the data: y(t), t = 1, ..., N and for each t: u(t) = u1 (t), ..., uM (t) (typically overdetermined data set)
? = 1 ... M .
such that
y(t) = u(t)
Dene the error: e(t) = y(t) u(t) Model estimation criterion: Minimum of the cost:
= U U 1 U y
J = 0 U (Y U ) = 0
Moore-Penrose pseudoinverse
Note: the model needs to be linear in the parameters, not necessarily on the independent variables! Example: tting the function f (x) = 0 + 1 x + 2 x2
Elisa Franco, Caltech 8
L() = p(d, )
Select for which:
p(d, )
L( ) L()
Recall the least square example, with added zero-mean Gaussian noise:
y = U + v v G(0, V )
1 1 1 p(x, ) = e 2 (xU ) V (xU ) (2)N det(V )
(y U )V 1 (y U )
Abstract problem formulation: Given a dynamical system with partially measurable states and zero mean Gaussian disturbances, we want to nd the best linear estimator to x
x = Ax + Bu + Rv v y = Cx + Rw w
The system is linear, so the mean will be zero or driven by the input u We want to minimize the error covariance:
e(t) = x x min E[e(t)e (t)] = min E[P (t)]
1 L = P C Rw
The error covariance is minimized by the Kalman linear gain. KF can be used to estimate time-varying parameters!
10
Identication in biology
Most identication procedures consider a class of models that are linear, discrete time, lumped parameters, often single output... Most biological processes are nonlinear! Class of models is uncertain Limited number of measurable quantities
In the context of biology, identication almost coincides with - O-line parameter estimation - Model selection Objectives: -Gain insight on system -Simulation-aided design of experiments -Bio-molecular programming (good identication allows redesign of pathways)
11
NLP problems have a global minimum only when cost functional and constraints are convex!
-Convex optimization, Boyd SP and Vandenberghe L Cambridge University Press. -Nonlinear programming, Bertsekas D Athena Scientic
In practical cases, they are not. Numerical methods used to solve NLP problems must carefully handle local minima! Simple gradient methods wont work.
Parameter Estimation in Biochemical Pathways: A Comparison of Global Optimization Methods Moles CG, Mendes P, Banga JR Genome Research. 2003 Nov 1; 13(11): 2467-2474 Elisa Franco, Caltech 12
Clustering methods
1. sample points in the search domain 2. transform the sampled points to group them around the local minima 3. apply a clustering technique to identify groups that (hopefully) represent neighborhoods of local minima => Minimize redundant local searches
Simulated annealing
Cost function = energy landscape E Repeat: Pick temperature T Move in the parameter space E0 keep params E E>0 keep params w/ P (E) = e kB T T initially large, decreases gradually for ne tuning, jumps allowed to avoid local minima
13
Model selection
x = f (x, ) x = f (x, )
1 2 1 2
uorescence
Which is the best model? Need a tradeo between accuracy and overtting
14
I(G, M ) =
G(z) ln
G(z) M (z|)
dz
I(G, M ) =
G(z) ln (G(z)) dz
G(z) ln (M (z|)) dz
S. Kullback and R.A. Leibler. On information and suciency. The Annals of Mathematical Statistics. Vol. 22, pg. 70-86. 1951 H. Bozdogan. Akaikes information criterion and recent developments in information complexity. Journal of Mathematical Psychology, Vol. 44. 2000
15
AIC: application
1. Fit each model parameters with simulated annealing 2. Select model with AIC
AIC
table
Iterative approach
= 1 , ..., P possible measurements = 1 , ..., N
Discretized
x = Ax + Br + C r = f (x, )
= J WJ
Bayesian iteration
17
The better the model, the smaller the experimental tracking error
Consider mass action kinetics up to second order. Model linearized for controller design or gradient based optimization
Elisa Franco, Caltech
18
MDP (model discrimination problem) Given a pair of candidate models with the same input and output spaces, nd an input, called the disparity certicate, that yields dierent outputs for all possible disturbances.
MIP (model invalidation problem) Given the inputs and outputs for a series of executed experiments, nd which candidate model maps the inputs to dierent outputs for all possible disturbances.
Elisa Franco, Caltech 19