Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

BioControl - Week 3, Lecture 1

Goals of this lecture Background on system identication - Fitting models - Selecting models Suggested readings System identication: theory for the user L Ljung, Prentice-Hall

Elisa Franco, Caltech

Need for sys-id in biology


MAPK
epidermal growth factor receptors

How can we validate a model? What can we measure?

Kinase cascade Phosphorylation

Stochasticity Unknown dynamics Steady state vs dynamic measurements

Downstream phosphorylation of many dierent proteins Stimulate/repress gene expression

Elisa Franco, Caltech

System identication perspective


Data

Model
d = M ()

d = {d1 , ..., dN }

= {1 , ..., M }

Unknown parameters

independent of time: Parametric identication (t) depends on time and d = {d(t1 ), ..., d(tN )}
Examples:

t > tN t = tN t1 < t < t N


Elisa Franco, Caltech

Prediction Filtering Smoothing

Predict tomorrows trac on the 405 based on historical data Current average velocity on the 405 between Wilshire and Santa Monica Blvd Yesterdays average velocity distribution on the 405.

Core elements of system identication


Data
source of information; partial; noisy; sometimes experiments can be designed

ESTIMATOR

Model class

choice depends on the questions we want to answer! linear or non-linear grey box (rst principles) or black box (I/O) parametric or non-parametric (functions)

Model selection criterion

M = {M (, d) : Q}

within the model class, choose the model that best ts the data according to a certain performance measure

= arg min J()

Validation criterion
Elisa Franco, Caltech

depends on the model purpose - convergence, error variance, consistency wrt new data...
4

Core elements of system identication


System
experiment design methods

data

validation data

Prior information

Model selection

Model parameter estimation

Model validation Most identication procedures consider a class of models that are linear, discrete time, lumped parameters, single output.
Elisa Franco, Caltech 5

Estimators
Data are noisy
Estimated quantities are therefore random variables.
p(d, ) 1 2

A good estimator should yield:

Unbiased estimates.
x

p(d, ) V1

Minimum variance: between two estimators, pick the one that gives estimates with the least variance characteristics.
V2

p(d, )

V3 : N3 > N2

V1 : N1

V2 : N2 > N1

Estimates that converge in mean square (in some sense): the more data we add to our set, the smaller the variance of the estimator output should be.
x

Elisa Franco, Caltech

Limits to the precision of estimation


The variance of any estimator is conditioned by the data source statistical properties!

p(d, )

probability density function of the data, given a certain value of the parameters

Let us dene the Fisher Information Matrix:

. I F () = E
. F Iij () = E

2 ln p(d, ) i j

ln p(d, )

Could we improve the FIM by experimental design?

It can be proved that for any estimator we pick:

Var[] (I F )1

Cramr-Rao inequality

Elisa Franco, Caltech

Example 1: parametric identication


Least squares: linear regression, has analytical solution

Given the data: y(t), t = 1, ..., N and for each t: u(t) = u1 (t), ..., uM (t) (typically overdetermined data set)
? = 1 ... M .

such that

y(t) = u(t)

Dene the error: e(t) = y(t) u(t) Model estimation criterion: Minimum of the cost:
= U U 1 U y

J = 0 U (Y U ) = 0

1 2 1 min J() = e (t) = (Y U ) (Y U ) 2 t=1 2


data d(x) t f(x)

Moore-Penrose pseudoinverse

Note: the model needs to be linear in the parameters, not necessarily on the independent variables! Example: tting the function f (x) = 0 + 1 x + 2 x2
Elisa Franco, Caltech 8

Example 2: Maximum Likelihood estimators


Given a likelihood function

L() = p(d, )
Select for which:

p(d, )

L( ) L()

for any possible


d x

Recall the least square example, with added zero-mean Gaussian noise:
y = U + v v G(0, V )
1 1 1 p(x, ) = e 2 (xU ) V (xU ) (2)N det(V )

Minimizing the likelihood function is equivalent to minimizing its log.


min

(y U )V 1 (y U )

We recover the weighted least square solution...


9

Elisa Franco, Caltech

Example 3: Kalman Filter


Suppose we want to track an object in space, but we can only measure its position, and its noisy. Can we estimate its velocity? Note - Observability condition required

Abstract problem formulation: Given a dynamical system with partially measurable states and zero mean Gaussian disturbances, we want to nd the best linear estimator to x
x = Ax + Bu + Rv v y = Cx + Rw w

Prediction x = A + Bu + L(y C x) x Correction L =?

The system is linear, so the mean will be zero or driven by the input u We want to minimize the error covariance:
e(t) = x x min E[e(t)e (t)] = min E[P (t)]

Now, since e = (A LC)e + Rv v LRw w

P = (A LC)P + P (A LC)T + Rv V Rv + LRw W Rw L

1 L = P C Rw

The error covariance is minimized by the Kalman linear gain. KF can be used to estimate time-varying parameters!

Elisa Franco, Caltech

10

Identication in biology
Most identication procedures consider a class of models that are linear, discrete time, lumped parameters, often single output... Most biological processes are nonlinear! Class of models is uncertain Limited number of measurable quantities

In the context of biology, identication almost coincides with - O-line parameter estimation - Model selection Objectives: -Gain insight on system -Simulation-aided design of experiments -Bio-molecular programming (good identication allows redesign of pathways)

Elisa Franco, Caltech

11

Nonlinear parameter estimation


Given a set of data, calibrate the model to reproduce the experimental results in the best possible way. Most often, we fall in the Nonlinear Programming (NLP) class of optimization problems.
Given the data d(t), dene:

. e(, t) = d(t) y(, t) min J(, T ) =


subject to: NLP problem:
T

e(, t) W (t)e(, t)dt

NLP problems have a global minimum only when cost functional and constraints are convex!
-Convex optimization, Boyd SP and Vandenberghe L Cambridge University Press. -Nonlinear programming, Bertsekas D Athena Scientic

x = f (x, y, , , t) x(t0 ) = x0 h(x, y, , ) = 0 g(x, y, , ) 0 L U

dynamic constraints trajectory constraints parameters constraints

In practical cases, they are not. Numerical methods used to solve NLP problems must carefully handle local minima! Simple gradient methods wont work.

Parameter Estimation in Biochemical Pathways: A Comparison of Global Optimization Methods Moles CG, Mendes P, Banga JR Genome Research. 2003 Nov 1; 13(11): 2467-2474 Elisa Franco, Caltech 12

Parameter estimation: global optimization algorithm classes


Adaptive stochastic methods
1. independent variables = random variables 2. center the distribution of the RV about best search point found 3. adaptive search steps in the region

Clustering methods
1. sample points in the search domain 2. transform the sampled points to group them around the local minima 3. apply a clustering technique to identify groups that (hopefully) represent neighborhoods of local minima => Minimize redundant local searches

Genetic algorithms (evolutionary computation)


1.Initialize and evaluate initial population 2.Repeat: Perform competitive selection Apply genetic operators to generate new solutions Evaluate solutions in the population Until some convergence criteria is satised

Simulated annealing
Cost function = energy landscape E Repeat: Pick temperature T Move in the parameter space E0 keep params E E>0 keep params w/ P (E) = e kB T T initially large, decreases gradually for ne tuning, jumps allowed to avoid local minima

Elisa Franco, Caltech

13

Model selection

x = f (x, ) x = f (x, )
1 2 1 2
uorescence

Which is the best model? Need a tradeo between accuracy and overtting

Elisa Franco, Caltech

14

Akaike Information Criterion (AIC)


We should not strive for the truth, but for reasonable approximations L. Ljung
The truth: y = G(z) Kullback-Leibler distance: Approximation - model: M (z|)

I(G, M ) =

G(z) ln

G(z) M (z|)

dz

I(G, M ) =

G(z) ln (G(z)) dz

G(z) ln (M (z|)) dz

Akaike approximation to the K-L distance:

Number of parameters! AIC(G, M ) = 2 ln L(|y) + 2P


Log-likelihood function

Model selection criterion:


Mi M

i |y) + 2Pi min AIC(G, Mi ) = 2 ln L(

S. Kullback and R.A. Leibler. On information and suciency. The Annals of Mathematical Statistics. Vol. 22, pg. 70-86. 1951 H. Bozdogan. Akaikes information criterion and recent developments in information complexity. Journal of Mathematical Psychology, Vol. 44. 2000
15

Elisa Franco, Caltech

AIC: application
1. Fit each model parameters with simulated annealing 2. Select model with AIC

AIC

table

Dunlop MJ, Franco E, Murray RM, ACC2007


Elisa Franco, Caltech 16

Model discrimination: lets compare some papers


Gadkar KG, Gunawan R, Doyle FJ. Iterative approach to model identication of biological networks. Bmc Bioinformatics, 6, 2005

Iterative approach
= 1 , ..., P possible measurements = 1 , ..., N

noise cov J= =meas


W

Discretized
x = Ax + Br + C r = f (x, )

= J WJ

max Fisher information matrix Quad Prog

Bayesian iteration

max det(IF ) s.t. max id Exp. Feasible

Model prediction error criterion

Elisa Franco, Caltech

17

Model discrimination through experimental design Case 1


Stimulus design for model selection and validation in cell signaling. Apgar JF, Toettcher JE, Endy D, White FM, Tidor B PLoS Comput Biol. 2008 Feb; 4(2): e30 Design time-varying model-based controller to achieve desired output min

The better the model, the smaller the experimental tracking error

Consider mass action kinetics up to second order. Model linearized for controller design or gradient based optimization
Elisa Franco, Caltech

ANY FLAW IN THIS METHOD?

18

Model discrimination through experimental design Case 2


Model Discrimination of Polynomial Systems via Stochastic Inputs Georgiev D and Klavins E, CDC 2008 Discrete time models Polynomial state transition functions

MDP (model discrimination problem) Given a pair of candidate models with the same input and output spaces, nd an input, called the disparity certicate, that yields dierent outputs for all possible disturbances.

MIP (model invalidation problem) Given the inputs and outputs for a series of executed experiments, nd which candidate model maps the inputs to dierent outputs for all possible disturbances.
Elisa Franco, Caltech 19

You might also like