Biocontrol - Week 3, Lecture 1: Goals of This Lecture

BioControl - Week 3, Lecture 1
Goals of this lecture Background on system identication - Fitting models - Selecting models Suggested readings System identication: theory for the user L Ljung, Prentice-Hall
Elisa Franco, Caltech
Need for sys-id in biology

MAPK
epidermal growth factor receptors
How can we validate a model? What can we measure?
Kinase cascade Phosphorylation
Stochasticity Unknown dynamics Steady state vs dynamic measurements
Downstream phosphorylation of many dierent proteins Stimulate/repress gene expression
System identication perspective

Data
Model
d = M ()
d = {d1 , ..., dN }
= {1 , ..., M }
Unknown parameters
independent of time: Parametric identication (t) depends on time and d = {d(t1 ), ..., d(tN )}
Examples:
t > tN t = tN t1 < t < t N

Prediction Filtering Smoothing
Predict tomorrows trac on the 405 based on historical data Current average velocity on the 405 between Wilshire and Santa Monica Blvd Yesterdays average velocity distribution on the 405.
Core elements of system identication

Data
source of information; partial; noisy; sometimes experiments can be designed
ESTIMATOR
Model class
choice depends on the questions we want to answer! linear or non-linear grey box (rst principles) or black box (I/O) parametric or non-parametric (functions)
Model selection criterion
M = {M (, d) : Q}
within the model class, choose the model that best ts the data according to a certain performance measure
= arg min J()
Validation criterion
depends on the model purpose - convergence, error variance, consistency wrt new data...
4
Core elements of system identication

System
experiment design methods
data
validation data
Prior information
Model selection
Model parameter estimation
Model validation Most identication procedures consider a class of models that are linear, discrete time, lumped parameters, single output.
Elisa Franco, Caltech 5
Estimators
Data are noisy
Estimated quantities are therefore random variables.
p(d, ) 1 2
A good estimator should yield:
Unbiased estimates.
x
p(d, ) V1
Minimum variance: between two estimators, pick the one that gives estimates with the least variance characteristics.
V2
p(d, )
V3 : N3 > N2
V1 : N1
V2 : N2 > N1
Estimates that converge in mean square (in some sense): the more data we add to our set, the smaller the variance of the estimator output should be.
x
Limits to the precision of estimation

The variance of any estimator is conditioned by the data source statistical properties!
p(d, )
probability density function of the data, given a certain value of the parameters
Let us dene the Fisher Information Matrix:
. I F () = E
. F Iij () = E
2 ln p(d, ) i j
ln p(d, )
Could we improve the FIM by experimental design?
It can be proved that for any estimator we pick:
Var[] (I F )1
Cramr-Rao inequality
Example 1: parametric identication

Least squares: linear regression, has analytical solution
Given the data: y(t), t = 1, ..., N and for each t: u(t) = u1 (t), ..., uM (t) (typically overdetermined data set)
? = 1 ... M .
such that
y(t) = u(t)
Dene the error: e(t) = y(t) u(t) Model estimation criterion: Minimum of the cost:
= U U 1 U y
J = 0 U (Y U ) = 0
1 2 1 min J() = e (t) = (Y U ) (Y U ) 2 t=1 2

data d(x) t f(x)
Moore-Penrose pseudoinverse
Note: the model needs to be linear in the parameters, not necessarily on the independent variables! Example: tting the function f (x) = 0 + 1 x + 2 x2
Example 2: Maximum Likelihood estimators

Given a likelihood function
L() = p(d, )
Select for which:
p(d, )
L( ) L()
for any possible

d x
Recall the least square example, with added zero-mean Gaussian noise:
y = U + v v G(0, V )
1 1 1 p(x, ) = e 2 (xU ) V (xU ) (2)N det(V )
Minimizing the likelihood function is equivalent to minimizing its log.

min
(y U )V 1 (y U )
We recover the weighted least square solution...

9
Example 3: Kalman Filter

Suppose we want to track an object in space, but we can only measure its position, and its noisy. Can we estimate its velocity? Note - Observability condition required
Abstract problem formulation: Given a dynamical system with partially measurable states and zero mean Gaussian disturbances, we want to nd the best linear estimator to x
x = Ax + Bu + Rv v y = Cx + Rw w
Prediction x = A + Bu + L(y C x) x Correction L =?
The system is linear, so the mean will be zero or driven by the input u We want to minimize the error covariance:
e(t) = x x min E[e(t)e (t)] = min E[P (t)]
Now, since e = (A LC)e + Rv v LRw w
P = (A LC)P + P (A LC)T + Rv V Rv + LRw W Rw L
1 L = P C Rw
The error covariance is minimized by the Kalman linear gain. KF can be used to estimate time-varying parameters!
10
Identication in biology
Most identication procedures consider a class of models that are linear, discrete time, lumped parameters, often single output... Most biological processes are nonlinear! Class of models is uncertain Limited number of measurable quantities
In the context of biology, identication almost coincides with - O-line parameter estimation - Model selection Objectives: -Gain insight on system -Simulation-aided design of experiments -Bio-molecular programming (good identication allows redesign of pathways)
11
Nonlinear parameter estimation

Given a set of data, calibrate the model to reproduce the experimental results in the best possible way. Most often, we fall in the Nonlinear Programming (NLP) class of optimization problems.
Given the data d(t), dene:
. e(, t) = d(t) y(, t) min J(, T ) =

subject to: NLP problem:
T
e(, t) W (t)e(, t)dt
NLP problems have a global minimum only when cost functional and constraints are convex!
-Convex optimization, Boyd SP and Vandenberghe L Cambridge University Press. -Nonlinear programming, Bertsekas D Athena Scientic
x = f (x, y, , , t) x(t0 ) = x0 h(x, y, , ) = 0 g(x, y, , ) 0 L U
dynamic constraints trajectory constraints parameters constraints
In practical cases, they are not. Numerical methods used to solve NLP problems must carefully handle local minima! Simple gradient methods wont work.
Parameter Estimation in Biochemical Pathways: A Comparison of Global Optimization Methods Moles CG, Mendes P, Banga JR Genome Research. 2003 Nov 1; 13(11): 2467-2474 Elisa Franco, Caltech 12
Parameter estimation: global optimization algorithm classes

Adaptive stochastic methods
1. independent variables = random variables 2. center the distribution of the RV about best search point found 3. adaptive search steps in the region
Clustering methods
1. sample points in the search domain 2. transform the sampled points to group them around the local minima 3. apply a clustering technique to identify groups that (hopefully) represent neighborhoods of local minima => Minimize redundant local searches
Genetic algorithms (evolutionary computation)

1.Initialize and evaluate initial population 2.Repeat: Perform competitive selection Apply genetic operators to generate new solutions Evaluate solutions in the population Until some convergence criteria is satised
Simulated annealing
Cost function = energy landscape E Repeat: Pick temperature T Move in the parameter space E0 keep params E E>0 keep params w/ P (E) = e kB T T initially large, decreases gradually for ne tuning, jumps allowed to avoid local minima
13
Model selection
x = f (x, ) x = f (x, )
1 2 1 2
uorescence
Which is the best model? Need a tradeo between accuracy and overtting
14
Akaike Information Criterion (AIC)

We should not strive for the truth, but for reasonable approximations L. Ljung
The truth: y = G(z) Kullback-Leibler distance: Approximation - model: M (z|)
I(G, M ) =
G(z) ln
G(z) M (z|)
dz
I(G, M ) =
G(z) ln (G(z)) dz
G(z) ln (M (z|)) dz
Akaike approximation to the K-L distance:
Number of parameters! AIC(G, M ) = 2 ln L(|y) + 2P

Log-likelihood function
Model selection criterion:

Mi M
i |y) + 2Pi min AIC(G, Mi ) = 2 ln L(
S. Kullback and R.A. Leibler. On information and suciency. The Annals of Mathematical Statistics. Vol. 22, pg. 70-86. 1951 H. Bozdogan. Akaikes information criterion and recent developments in information complexity. Journal of Mathematical Psychology, Vol. 44. 2000
15
AIC: application
1. Fit each model parameters with simulated annealing 2. Select model with AIC
AIC
table
Dunlop MJ, Franco E, Murray RM, ACC2007

Model discrimination: lets compare some papers

Gadkar KG, Gunawan R, Doyle FJ. Iterative approach to model identication of biological networks. Bmc Bioinformatics, 6, 2005
Iterative approach
= 1 , ..., P possible measurements = 1 , ..., N
noise cov J= =meas

W
Discretized
x = Ax + Br + C r = f (x, )
= J WJ
max Fisher information matrix Quad Prog
Bayesian iteration
max det(IF ) s.t. max id Exp. Feasible
Model prediction error criterion
17
Model discrimination through experimental design Case 1

Stimulus design for model selection and validation in cell signaling. Apgar JF, Toettcher JE, Endy D, White FM, Tidor B PLoS Comput Biol. 2008 Feb; 4(2): e30 Design time-varying model-based controller to achieve desired output min
The better the model, the smaller the experimental tracking error
Consider mass action kinetics up to second order. Model linearized for controller design or gradient based optimization
ANY FLAW IN THIS METHOD?
18
Model discrimination through experimental design Case 2

Model Discrimination of Polynomial Systems via Stochastic Inputs Georgiev D and Klavins E, CDC 2008 Discrete time models Polynomial state transition functions
MDP (model discrimination problem) Given a pair of candidate models with the same input and output spaces, nd an input, called the disparity certicate, that yields dierent outputs for all possible disturbances.
MIP (model invalidation problem) Given the inputs and outputs for a series of executed experiments, nd which candidate model maps the inputs to dierent outputs for all possible disturbances.

Biocontrol - Week 3, Lecture 1: Goals of This Lecture

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biocontrol - Week 3, Lecture 1: Goals of This Lecture

Uploaded by

Copyright:

Available Formats

BioControl - Week 3, Lecture 1

Elisa Franco, Caltech

Need for sys-id in biology

How can we validate a model? What can we measure?

Kinase cascade Phosphorylation

Stochasticity Unknown dynamics Steady state vs dynamic measurements

Downstream phosphorylation of many dierent proteins Stimulate/repress gene expression

Elisa Franco, Caltech

System identication perspective

t > tN t = tN t1 < t < t N

Prediction Filtering Smoothing

Core elements of system identication

Model selection criterion

= arg min J()

Core elements of system identication

Model parameter estimation

A good estimator should yield:

Elisa Franco, Caltech

Limits to the precision of estimation

Let us dene the Fisher Information Matrix:

Could we improve the FIM by experimental design?

It can be proved that for any estimator we pick:

Elisa Franco, Caltech

Example 1: parametric identication

1 2 1 min J() = e (t) = (Y U ) (Y U ) 2 t=1 2

Example 2: Maximum Likelihood estimators

for any possible

Minimizing the likelihood function is equivalent to minimizing its log.

We recover the weighted least square solution...

Elisa Franco, Caltech

Example 3: Kalman Filter

Prediction x = A + Bu + L(y C x) x Correction L =?

Now, since e = (A LC)e + Rv v LRw w

P = (A LC)P + P (A LC)T + Rv V Rv + LRw W Rw L

Elisa Franco, Caltech

Elisa Franco, Caltech

Nonlinear parameter estimation

. e(, t) = d(t) y(, t) min J(, T ) =

e(, t) W (t)e(, t)dt

x = f (x, y, , , t) x(t0 ) = x0 h(x, y, , ) = 0 g(x, y, , ) 0 L U

dynamic constraints trajectory constraints parameters constraints

Parameter estimation: global optimization algorithm classes

Genetic algorithms (evolutionary computation)

Elisa Franco, Caltech

Elisa Franco, Caltech

Akaike Information Criterion (AIC)

Akaike approximation to the K-L distance:

Number of parameters! AIC(G, M ) = 2 ln L(|y) + 2P

Model selection criterion:

i |y) + 2Pi min AIC(G, Mi ) = 2 ln L(

Elisa Franco, Caltech

Dunlop MJ, Franco E, Murray RM, ACC2007

Model discrimination: lets compare some papers

noise cov J= =meas

max Fisher information matrix Quad Prog

max det(IF ) s.t. max id Exp. Feasible

Model prediction error criterion

Elisa Franco, Caltech

Model discrimination through experimental design Case 1

ANY FLAW IN THIS METHOD?

Model discrimination through experimental design Case 2

You might also like