Professional Documents
Culture Documents
2014 Article 6684
2014 Article 6684
Abstract
Background: Parameter estimation for differential equation models of intracellular processes is a highly relevant bu
challenging task. The available experimental data do not usually contain enough information to identify all parameters
uniquely, resulting in ill-posed estimation problems with often highly correlated parameters. Sampling-based Bayesian
statistical approaches are appropriate for tackling this problem. The samples are typically generated via Markov chain
Monte Carlo, however such methods are computationally expensive and their convergence may be slow, especially if
there are strong correlations between parameters. Monte Carlo methods based on Euclidean or Riemannian
Hamiltonian dynamics have been shown to outperform other samplers by making proposal moves that take the local
sensitivities of the system’s states into account and accepting these moves with high probability. However, the high
computational cost involved with calculating the Hamiltonian trajectories prevents their widespread use for all but
the smallest differential equation models. The further development of efficient sampling algorithms is therefore an
important step towards improving the statistical analysis of predictive models of intracellular processes.
Results: We show how state of the art Hamiltonian Monte Carlo methods may be significantly improved for steady
state dynamical models. We present a novel approach for efficiently calculating the required geometric quantities by
tracking steady states across the Hamiltonian trajectories using a Newton-Raphson method and employing local
sensitivity information. Using our approach, we compare both Euclidean and Riemannian versions of Hamiltonian
Monte Carlo on three models for intracellular processes with real data and demonstrate at least an order of
magnitude improvement in the effective sampling speed. We further demonstrate the wider applicability of our
approach to other gradient based MCMC methods, such as those based on Langevin diffusions.
Conclusion: Our approach is strictly benefitial in all test cases. The Matlab sources implementing our MCMC
methodology is available from https://github.com/a-kramer/ode_rmhmc.
Keywords: MCMC methods, Parameter estimation, Hybrid monte carlo, Steady state data, Systems biology
© 2014 Kramer et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication
waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise
stated.
Kramer et al. BMC Bioinformatics 2014, 15:253 Page 2 of 11
http://www.biomedcentral.com/1471-2105/15/253
these settings as they offer an insight into the surround- very quickly but at the expense of requiring a much larger
ings of local objective function minima and facilitate the number of samples to obtain low variance Monte Carlo
uncertainty analysis of predictions. Consequently, statisti- estimates. Of course, these larger samples also need to be
cal methods are being used more and more frequently for stored and empirical estimates of posterior predictive dis-
parameter estimation in systems biology (see e.g. [1,2]). tributions using a larger number of sampled points will be
In the Bayesian approach, parameters are considered ran- computationally slower.
dom variables and inferences are formulated in terms of The statistical superiority of HMC has already been
probability densities. demonstrated on a variety of examples, e.g. using a 100-
A major hindrance is the time required to generate sam- dimensional multivariate Gaussian distribution as target
ples from the resulting posterior distributions. Evaluation density (see Ch.5 in [8]); it seems to be a very promis-
of the posterior can be computationally expensive; this is ing approach, however HMC has so far rarely been used
particularly the case for ODE models, where each sampling to infer parameters for nonlinear ODE models. A recent
step requires numerical calculation of model trajectories. simple example is given in [1], where it is shown that the
Metropolis-Hastings [3,4] algorithms are among the most sample quality and convergence of HMC to the poste-
popular choices for this task (see e.g. [2,5]). Variants of rior can be improved even further by using an appropriate
these core algorithms are also widely available in parame- metric to define a Riemannian geometry over the param-
ter estimation software packages, e.g. GNU MCSIM [6], or eter space. This approach employs the local sensitivity of
the MCMCSTAT M ATLAB toolbox [7]. the model at each step to inform the proposal distribu-
The speed of convergence of the Markov chain is cru- tion, leading to more efficient sampling in cases where the
cial for the efficiency of the MCMC approach. Finding a posterior has a complicated covariance structure.
good tuning of the Markov chain’s transition distance can The major difficulty that arises generally for HMC type
be time consuming and difficult [1,8,9], and it has been algorithms when dealing with ODE models is that the
recognized that high correlations between the parame- model outputs and their sensitivities have to be simulated
ters can substantially slow down the convergence speed at every point along trajectories in parameter space. We
of standard MCMC algorithms [10]. In these cases the address this computational issue by proposing an exten-
step size must often be chosen to be very small in order sion to HMC algorithms especially designed to sample
to obtain a reasonable acceptance rate, resulting in highly efficiently from models with steady state data under mul-
auto-correlated samples. A larger number of samples are tiple perturbations. In particular, we use sensitivity analy-
then required to obtain low variance estimates of the sis within a Newton-Raphson approach to efficiently track
inferred quantities. the steady states across each of the Hamiltonian trajecto-
Various strategies are employed to increase the dis- ries, instead of calculating them explicitly, thus drastically
tance of transitions in the sampling space, yet at the same reducing this additional cost.
time maintain a high acceptance rate. Several adaptive Steady state observations are typically not as informa-
versions of the Metropolis-Hastings algorithm have been tive as time series data, but can be obtained at com-
suggested in this context (see e.g. Ch.4 in [8]). These adap- paratively low cost. They can be used in the first cycle
tation processes however only make proposals based on of modeling where a qualitative model is validated via
a global Gaussian approximation of the posterior, which parameter fitting. If the model is not invalidated, dynamic
can be a disadvantage when the posterior density has a data should be gathered for further analysis to narrow
complex lower scale structure. down model parameters further. The posterior parame-
Hybrid or Hamiltonian Monte Carlo (HMC) algorithms ter density from previous analysis cycles may be employed
([11]), can dramatically increase the acceptance rate while to inform the prior density of future experiments, and
still providing samples with low auto-correlation. This is herein lies one of the benefits of the Bayesian approach.
accomplished by an elaborate transition step that requires The properties of steady states offer the possibility to
the integration of a secondary ODE system describing use analytical calculations to obtain output sensitivi-
dynamics in parameter space that depend on the gradient ties. This is a lot faster than the numerical integration
of the target distribution. In this way, the HMC algorithm of dynamic sensitivities and improves sampling perfor-
uses the structure of the ODE model more exhaustively. mance. We describe this further in Section ‘Efficient cal-
At first glance this approach increases the computational culation of geometry’. Typically, the steady state data will
costs, but this is often compensated by the improved not be sufficient to uniquely identify all parameters; how-
sample quality compared to the simpler algorithms. The ever issues of unidentifiability may also occur for dynamic
advantage of low auto-correlations in typical HMC sam- time series data. Bayesian model analysis is designed
ples is that the sample size need not be as large to compute to deal with precisely this case and allows full charac-
summaries to a satisfactory precision. A wasteful algo- terisation of the uncertainties that arise when making
rithm might provide a sample with high auto-correlations predictions.
Kramer et al. BMC Bioinformatics 2014, 15:253 Page 3 of 11
http://www.biomedcentral.com/1471-2105/15/253
We evaluate our approach on three different steady state calculated, but are both initialized with a numerical solu-
models with real data: a model for Erk phosphorylation in tion to the initial value problem. The parameters ρ are
the MAPK signaling pathway, and two alternative models always positive in chemical reaction networks. In order
for the phosphorylation of insulin receptor substrate IRS to avoid dealing with borders of the positive orthant it is
after insulin stimulation. We also provide standard HMC advantageous to sample in logarithmic space θ = log(ρ).
sampling speeds as reference. Since we operate on θ we will drop ρ from all subsequent
expressions and consider it implicitly understood that the
Approach model functions f and h will perform the transformation
Our modeling framework is motivated by real biological (the same applies to symbolic expressions like x̄(θ , u)).
experiments: measurements on systems subject to long Measurements are obscured by noise δij , that is, the
term or permanent perturbations, such as gene expres- observed data
sion regulation via different promoters, silencing of genes
D = yij , σij : i = 1, . . . , ν; j = 1, . . . , nu (3)
or the stable transfection with variants of the system’s
enzymes with altered reactive properties. These types of relates to the measurement model in this way,
experiment, among others, provide steady state responses
!
to perturbations. In the following, we assume data is accu- yij = zij + δij , δij ∼ N 0, σij2 . (4)
mulated by observing the steady state under different
conditions u. Gaussian noise is important for a vast amount of bio-
logical applications. However, the noise model is not of
Model structure crucial importance to the sampling algorithms and can
Intracellular processes are often described as biochemical be changed as needed [13]. We assume that variances
reaction networks, i.e. ODE models with vector fields that σij2 are available, e.g. from empirical variance estimates of
are based on chemical reaction kinetics. Here we consider different experimental replicates.
systems of the following form:
Bayesian parameter estimation
ẋ = f (x, ρ, uj ) j = 1, . . . , nE , (1) Equipped with our model and a data set D, we define
the likelihood function p(D|θ ), as a measure of plausibil-
where the state x ∈ Rn+ describes molecular concen- ity for each θ . In the case of a Gaussian error model, the
trations, ρ ∈ Rm + denotes the model parameters, uj ∈ likelihood given all experimental measurements follows as
Rυ+ is an input vector which describes experimental con- ⎛ ⎞
nE ν
ditions such as the suppression of certain reactions or 1 yij − hi (x̄, θ , uj ) 2
p(D|θ ) ∝ exp ⎝− ⎠.
the introduction of mutants, and nE denotes the num- 2 σij
j=1 i=1
ber of performed experiments, i.e. the number of different
input vectors. We consider constant inputs, the initial We note that we often employ the log-likelihood in
conditions x(0) = x0 are assumed to be known and unpa- many calculations, which we denote LD (θ ) = log p(D|θ ).
rameterised and the function f shall be smooth. If the The likelihood of a set of parameters therefore requires
model exhibits complex bifurcations, special care has to the steady state solution x̄(θ , uj ) of the ODE model, which
be taken when dealing with initial conditions and basins is usually calculated by numerical integration of (1) until
of attraction. We do not address these issues here. convergence is achieved. In a Bayesian approach, prior
In this study we assume that the system converges to a knowledge regarding the plausible values of p(θ ) is incor-
steady state characterised by f (x̄, ρ, w) = 0 for any avail- porated via Bayes’ theorem:
able input w, which is the generic behaviour of such mod- p(D|θ )p(θ )
els [12]. Measurements are taken only after the system has p(θ |D) = , (5)
p(D)
reached steady state:
where p(θ ) and p(θ |D) are the prior and posterior dis-
x̄(ρ, uj ) = lim ϕ(t, x0 , ρ, uj) , (2) tributions, respectively. In this work we assume Gaussian
t→∞
priors N (μ, ). The evidence p(D) is a normalization
zij = hi (x̄, ρ, uj ) = C x̄(ρ, uj ), C ∈ Rν×n , constant that is independent of θ and is not needed
during Markov chain Monte Carlo sampling, since this
where z ∈ Rν×nE is the observable model output when
term cancels in the calculation of the Metropolis-Hastings
input uj is applied, and x̄(ρ, uj ) is the steady state of
acceptance ratio. Expected values of a function F(θ ) with
the respective trajectory ϕ(t, x0 , ρ, uj). We assume linear
respect to the posterior,
output functions characterized by the real matrix C.
Standard HMC methods and the proposed Newton-
Ep(θ|D) [ F(θ )] = F(θ )p(θ |D) dθ , (6)
Raphson variants differ in the way the steady states are
Kramer et al. BMC Bioinformatics 2014, 15:253 Page 4 of 11
http://www.biomedcentral.com/1471-2105/15/253
may be estimated using Monte Carlo integration given logarithm. One of the advantages of employing Rieman-
posterior samples, nian geometry is that the calculation of Gy (θ ) requires
N only first order sensitivities Sl r = ∂ x̄l /∂θr . For steady
1
F̂ = F(θi ), θi ∼ p(θ |D) . (7) state ODE models, we can calculate a general expression
N for the expected Fisher Information based on a likelihood
i=1
derived from Gaussian measurement errors with a linear
Riemannian structure of parameter space observation model, as defined in (1) and (2):
Exploration of the posterior distribution for models
defined by systems of ODEs is often severely hampered by
the strong correlation structure present in the parameter ∂ ∂
Gy (θ)rs = Ep(y|θ ) LD (θ) LD (θ)
space, which makes it difficult to propose MCMC moves ∂θr ∂θs
⎡⎛ ⎞
that are far from the current point and accepted with high
yi1 j1 − l Ci1 l x̄l (θ, uj1 )
probability. Recent advances in MCMC attempt to cir- = Ep(y|θ ) ⎣⎝ Ci1 l Sl r ⎠
σi21 j1
cumvent these issues by utilising the underlying geometric i 1 j1 l
⎛ ⎞⎤
structure induced by the sensitivity of a statistical model
yi2 j2 − k Ci2 k x̄k (θ, uj2 )
[1]. In addition to exploiting gradient information, we ×⎝ Ci2 k Sks ⎠⎦
σi22 j2
can construct MCMC algorithms based on higher order i 2 j2 k
geometric structure by considering the expected Fisher (yij − l Cil x̄l (θ, uj ))2
Information, which [14] noted satisfies all the mathemat- = Cil Sl r
ij
σij2 σij2 l
ical properties of a metric tensor and hence induces a
Riemannian geometry on the parameter space. The use × Cik Sks p(yικ |θ)dyικ + 0
of this geometry allows us to define a measure of dis- k ικ asymmetric terms: i1 =i2 ,j1 =j2
tance between sets of parameters in terms of the change in
= Cil Sl r
Cik Sks
posterior probability, rather than changes in the values of ij l k
the parameters themselves. In other words, MCMC algo- 2
rithms based on Riemannian geometry make local moves yij − l Cil x̄l θ, uj
× p(yij |θ)dyij
according to a local coordinate system that automatically σij2 σij2
adapts based on the local sensitivity of the model, tak- σij2
ing small steps in directions of high sensitivity and bigger = Cil Sl r (θ, uj ) Cik Sks (θ, uj ) .
ij
σij2 σij2
steps in directions of lower sensitivity, while also taking l k
where (θ ) = G(θ ). In other words, the dynamics now Jacobian is invertiblea . Similarly, we can write the follow-
take into account the second order sensitivities of the ing equation for the second order sensitivity,
statistical model of interest, since the covariance of the
momentum variable is based on the expected Fisher Infor- d ∂fi (x̄, θ, w) ∂ x̄l (θ, w) ∂fi (x̄, θ, w)
0= +
mation at each point. dθk ∂ x̄l ∂θj ∂θj
l
This Hamiltonian system is however harder to integrate
n n
numerically, since its equations of motion are now defined ∂ 2 fi (x̄, θ, w) ∂ x̄r ∂ 2 fi (x̄, θ, w) ∂ x̄l (θ, w)
= +
implicitly due to the dependence of the momentum on the ∂ x̄r ∂ x̄l ∂θk ∂θk ∂ x̄l ∂θj
r=1 l=1
position parameter θ . The generalised Leapfrog integra-
tor is the simplest such algorithm for this task, however n
∂fi (x̄, θ, w) ∂ 2 x̄l (θ, w)
its use may significantly add to the computational expense +
∂ x̄l ∂θk ∂θj
l=1
of simulating proposals, since a much larger number of
likelihood evaluations are generally necessary to solve the n
∂ 2 fi (x̄, θ, w) ∂ x̄r ∂ 2 fi (x̄, θ, w)
equations of motion compared to standard HMC [1,15]. + + ,
∂ x̄r ∂θj ∂θk ∂θk ∂θj
r=1
In addition to the state sensitivities, we require the gra-
dients of G(θ ) for solving the Hamiltonian system, which (18)
necessitates the calculation of second order sensitivities, leading to a linear equation for the second order sensitivity
denoted from now on by S. This provides the motivation jk j
Si = ∂θk Si ,
for the next subsection, in which we propose a computa-
tionally efficient approach to evaluate the required geo- n n
n
∂ 2 fi (x̄, θ , w) k ∂Jil (x̄, θ , w)
Ji l
j jk
metric quantities for enabling faster inference in steady 0= Sr + Sl + Sl
state ordinary differential equation models. ∂ x̄r ∂ x̄l ∂θk
r=1 l=1 l=1
n
Efficient calculation of geometry ∂ 2 fi (x̄, θ , w) k ∂ 2 fi (x̄, θ , w)
+ Sr + .
Practical implementation of Hamiltonian Monte Carlo ∂ x̄r ∂θj ∂θk ∂θj
r=1
samplers depends critically on the ability to efficiently cal-
(19)
culate the required geometric quantities. Here we show
how this may be done for steady state systems. Again, the existence of a solution depends on the
Output sensitivities dhi (x̄, θ , w)/dθk (for any given input invertibility of the Jacobian Jf (x̄, θ , w). We note that the
w = u[1,nE ] ) occur in several places in the RMHMC algo- same LU-decomposition of Jf can be used for the first
rithm. For ODE models, they are needed to construct the and second order sensitivities. Usually all derivatives of
metric tensor G and the gradient of VD ({x̄}, θ , u). In the f appearing in (16) and (19) can be calculated analyti-
case of steady state data, the sensitivities can be obtained cally; for large systems, a symbolic calculation package
from the steady state condition: will do the job, e.g. GiNaC, GNU Octave[forge]’s symbolic
package or MATLAB’s symbolic toolbox. A particularly
! convenient way of storing the models and doing the sym-
0 = fi (x̄(θ , w), θ , w)
bolic calculations is VFGEN [17] which provides MATLAB
! df i (x̄(θ , w), θ , w) output.
0= (16)
dθj Equipped with these instructions for sensitivity calcu-
lations we can easily calculate the metric G(θ ) and the
n
∂fi (x̄, θ , w) ∂ x̄l (θ , w) ∂fi (x̄, θ , w) gradient of the log-likelihood for a given value of θ ,
= + .
∂ x̄l ∂θj ∂θj
l=1 nE ν
dVD (x̄, θ , u) 1 d
− =−
We will drop the arguments of x̄ to shorten notation. dθk 2 dθk
j=1 i=1
In (16) we see that the steady state sensitivity is obtained
yij − hi (x̄(θ , uj ), θ , uj ) 2 d log(p(θ ))
by solving the following linear algebraic equation, × − ,
σij dθk
nE ν
0 = Jf (x̄, θ , w)S(θ , w) + K (x̄, θ , w) , (17) yij − l Cil x̄l (θ , uj )
=
−1
σij2
⇒ S(x̄, θ , w) = −Jf (x̄, θ , w) K (x̄, θ , w) , j=1 i=1
d log(p(θ ))
× Cil Sl k (x̄, θ , uj ) − .
j dθk
where K (x̄, θ , w)i = ∂θj fi (x̄, θ , w). We denote the solu- l
tion to (17) as S(x̄, θ , w), which is easy to obtain when the (20)
Kramer et al. BMC Bioinformatics 2014, 15:253 Page 7 of 11
http://www.biomedcentral.com/1471-2105/15/253
Although HMC algorithms make large transitions and reuses appropriate fluxes, nevertheless the integra-
across the parameter space, the transitions are con- tor used by the toolbox has to integrate an initial value
structed using multiple small integration steps of the problem with n + nm + nmm state variables (number
Hamiltonian dynamics (12). Since these parameter of states, number of first order sensitivities, number of
changes in the integration are gradual, the sensitivity can second order sensitivities).
be used directly to estimate the steady state x̄(θ + , w) The models are stored as symbolic variables for the
after each small parameter step , Newton-Raphson type algorithms, symbolic calculations
m of the required derivatives of f then yield a standard
j
x̄i (θ + , w) = x̄i (θ , w) + Si j , (21) matlab function for the Jacobian and the sensitivities.
j=1 All linear equations are solved using MATLAB’s backslash
operator. We provide MATLAB code for easy reproduction
For this reason, it is very convenient and efficient to recal-
of all of the results in this section.
culate the steady states using a Newton Raphson method,
Since the methods we used are not restricted to
where we begin with x̄0 at the estimate (21),
the RMHMC algorithm, we also tested the simplified
x̄r+1 = x̄r − Jf (x̄r , θ , w)−1 f (x̄, θ , w) , (22) Metropolis-adjusted Langevin algorithm (SMMALA) [1].
and for any input w, (22) is iterated until satisfactory pre- We applied the same modifications to SMMALA and
cision is reached, otherwise we proceed exactly as in the measured the relative sampling speed, which can be
original HMC algorithm. We call these variants NR-HMC inspected in Table 1.
and NR-RMHMC.
Example: Erk phosphorylation in MAPK signaling
Sampling efficiency We consider a modification of the model for Erk phos-
For the estimation of the auto-correlation we employed phorylation in the MAPK signaling cascade introduced in
the M ATLAB script UWerr documented in [18]. By mea- [20], who investigated robustness of steady state phospho-
suring the execution time tE of the sampling we can rylation of Erk with respect to changes in the total Erk
calculate an effective sampling speed, corrected for auto- amount ErkT = u:
correlations: u
N ẋ1 = ρ1 − (2ρ1 g(u) + ρ2 )x1 + (ρ2 − ρ1 g(u))x2 ,
v= , (23) 1+u
2τint.,L tE ẋ2 = ρ1 g(u)x1 − ρ2 x2 ,
where N is the sample size and τint.,L the integrated auto-
1
correlation length with respect to the estimation of the g(u) =
mean log-likelihood. 1+u
Large auto-correlations reduce the effective speed v: the (24)
sampled points strongly depend on their predecessors and where x = ([pErk] , [ppErk])T are the once and twice
many Markov chain moves are required to obtain inde- phosphorylated modifications of Erk.
pendent information about the posterior distribution. A According to the conclusions in [20] the robustness of
sampling method with increased cost per Markov chain this system with respect to varying ErkT is due to negative
transition might outperform a simple-but-cheap method feedback from Erk to Raf and MEK; this was not inves-
if the returned points are sufficiently less correlated. The tigated by the authors using an ODE model but directly
effective speed v takes the purpose of sampling into in experiment and by modifying the steady state solu-
account. When comparing algorithms we also list a rela- tion of the open loop system. To account for this negative
tive speed vr , where we normalize each vi using the lowest feedback we modified the phosphorylation rate of the
observed speed.
Table 1 Effective sampling speed measurements for
Results and discussion
SMMALA and the modified NR-SMMALA
In this section we apply the Bayesian parameter estima-
tion framework to three examples from literature, which Problem size NR-SMMALA SMMALA
feature the type of data and modeling approach described v in s−1 90 ± 2 71 ± 2
2×2
in Section ‘Methods’. A comparison of the performance of τint.,L 0.88 ± 0.02 0.88 ± 2
all algorithms is provided in Additional file 1: Figure S1. v in s−1 62 ± 1 32 ± 1
All simulations were done in MATLAB. We used the 3×6
τint.,L 1.20 ± 0.03 1.52 ± 0.04
SBPOP Toolbox bundle [19] to store the models and solve
v in s−1 0.30 ± 0.07 0.031 ± 0.006
the ODE (x̄) for RMHMC. The implementation makes use 6 × 14
τint.,L 290 ± 67 166 ± 31
of the symmetry of the second order sensitivity: Si =
jk kj
Si ,
Kramer et al. BMC Bioinformatics 2014, 15:253 Page 8 of 11
http://www.biomedcentral.com/1471-2105/15/253
Figure 1 MAPK posterior comparisons fltr. (1) posterior inferred from RMHMC sample. (2) Newton Raphson driven RMHMC posterior inferred
from sample. (3) posterior inferred from NR-HMC sample (4) exact posterior on a grid. We used a kernel density estimator (kde) to infer densities from
samples.
original model by multiplying it with u/(1 + u). This Example: insulin dose response
modification of the model does provide negative feed- A larger example is provided in [21], in which the authors
back, though we do not suggest any specific mechanism analyze the insulin pathway in eukaryotic cells (primary
or interpretation for it, but rather we aim at illustrating human adipocytes). Different models were tested in their
how a modeling hypothesis can be tested. The two dif- ability to describe phosphorylation of the insulin recep-
ferent Erk variants can be knocked out individually, but tor substrate (IRS) after insulin stimulation. The data sets
have a very similar function. This enables the experimen- provided in their supplementary material consist of dose
talists to reduce the amount of Erk to several intermediate response curves, which we interpret as steady state data,
levels. For more details on the biological system we refer as well as dynamic responses, which we disregard here.
to [20]. From the set of models provided, we chose one of the
The normalized data provided in the supplementary least complex models (shorthand label: Mma) with m =
material of [20] contains 10 steady state measurements of 6 parameters, which was nevertheless sufficient to fit the
x2 obtained with western blots under different perturba- steady state data, as well as the best fitting model (Mifa).
tion conditions in which Erk1 and/or Erk2 were knocked The interaction graph of Mma is shown in Figure 2. The
down, resulting in different ErkT concentrations. The data model comprises 5 molecular species: the insulin receptor
point belonging to the experiment in which no knock- (IR and IR∗ ), phosphorylated IR (IRP), IR substrate (IRS),
down was performed serves as control experiment with and phosphorylated IRS (IRSP). These reactants form two
u = 1 in normalized units. groups with built in conservation relations. Since the sums
Unfortunately these measurements are not acompanied
by information about standard deviations in the publi- d
cation. For the purposes of an illustrative example we [IR] + [IRP] + IR∗ = 0 ,
dt
suggest that the value σij = 0.2 seems reasonable. The (26)
d
corresponding measurement model reads ([IRS] + [IRSP]) = 0 ,
dt
h(x̄, ρ, uj ) = C x̄(ρ, uj ) ,
(25)
C = (0, 1) , σij = 0.2 ∀ i, j .
Figure 3 Comparison of five different fits for the insulin dose resonse data. The data is shown as error bars (last in each group), the fits are
shown in boxplot style in the order: (i) Mifa model and NR-RMHMC sampler, (ii) Mifa model NR-HMC sampler, (iii) Mma model NR-RMHMC sampler,
(iv) Mma model NR-HMC sampler, (v) Mma model and unchanged RMHMC sampler with numerical ODE solutions using SBToolBox2[mex]. All
algorithms succeeded in fitting the data.
do not change over time, we only require n = 3 indepen- Mifa model on our desktop machine using the unmodified
dent state variables to write down the ODE model: RMHMC algorithm.
As shown in Additional file 1: Figure S1, the effec-
x1 = [IR] , x2 = [IRP] , x3 = [IRSP] ,
tive speed of NR-RMHMC was about 23 times higher
u = [ins] % the insulin concentration , than that of standard RMHMC for the 3 × 6 Mma
T
x0 = 10 0 0 ,
which defines the initial value problem (ivp) for
⎛ ⎞
−1 −1 1 0 0 0
ẋ = ⎝ 1 1 0 −1 0 0 ⎠ diag(ρ)φ(x) ,
0 0 0 0 1 −1
⎛ ⎞
x1 u
⎜ x1 ⎟
⎜ ⎟
⎜ 10 − x1 − x2 ⎟
φ(x) = ⎜⎜
⎟,
⎟
⎜ x2 ⎟
⎝ x2 (10 − x3 ) ⎠
x3
with measurement model
h({x̄ρ,u }, ρ, uj ) = C x̄(ρ, uj ) , C = (0, 0, 98) . (27)
We have used the value for the output parameter C1,3
reported in the publication and will not estimate it during
sampling.
Figure 4 Relative speed for the Simplified Manifold
The larger Mifa model, which comprises 6 independent
Metropolis-Adjusted Langevin Algorithm (SMMALA) using the
state variables and 14 parameters and is capable of fit- same modifications as in the RMHMC code. We calculate the
ting dynamic responses as well, is treated similarly and sensitivities using systems of linear equations, we use the sensitivities
included as SB model in the supplement. to obtain the metric and the starting point for the Newton Raphson
Figure 3 shows a comparison of fits using different mod- method which is used to calculate steady states. We tested the relative
speed of both the modified and unmodified SMMALA on all three
els and samplers, illustrated as box plots. The boxes fit
example models with uninformative priors. In all cases, the modified
the observed experimental data well (error bars). It was version of SMMALA was faster than the reference implementation.
not computationally feasible to sample the 14 parameter
Kramer et al. BMC Bioinformatics 2014, 15:253 Page 10 of 11
http://www.biomedcentral.com/1471-2105/15/253
model, indicating a significant speed up. The largest parameter estimation for this special class of ODE mod-
observed correlation coefficient of the posterior was els and improves scalability of statistical sampling-based
56 (Mma) ≈ 0.8. Since the posterior does not exhibit approaches for these models in general. The success of
strong enough correlations between the parameters, this approach motivates further development of HMC
the flat (Euclidean) metric used in NR-HMC was of no algorithms for the more general case of dynamic time
disadvantage and hence NR-HMC performed best. With series data, which would broaden its utility.
the larger Mifa model RMHMC came to its limits, while
the modified NR-RMHMC was still able to generate sam- Endnote
a
ples in an acceptable time. The use of a Newton-Raphson We note that this is not always the case: whenever
approach for this model resulted in two orders of mag- conservation relations are present, for example, the
nitude improvement in sampling performance for the Jacaobian is not invertible anywhere. However, in such
NR-HMC algorithm. Although NR-HMC was superior cases it is sufficient to use these conservation relations to
in cases with uninformative priors, the advantage of NR- reduce the number of state variables, as we do in the
RMHMC becomes evident in the case of an informative examples.
prior. The informative prior was built by assigning smaller
variances to some of the parameters θi while keeping
the rest vague (large variances). See the setup files in the Additional file
supplement for details.
Additional file 1: Figure S1. Performance analysis for the original RMHMC
Conclusion for ODE models and two steady state data adapted HMC algorithms.
Figure 4 shows that there are significant albeit less dra- Stuttgart, Germany. 2 Department of Mathematics, Imperial College London,
SW7 2AZ London, UK.
matic benefits for the introduced techniques for this algo-
rithm as well. Once again, we can use the comparatively Received: 15 January 2014 Accepted: 7 July 2014
inexpensive steady state sensitivity analysis to obtain the Published: 28 July 2014
metric tensor G as well as the Newton-Raphson method References
for the calculation of steady states to great effect. 1. Girolami M, Calderhead B: Riemann manifold langevin and
There remains the question of whether to employ NR- hamiltonian monte carlo methods. J R Stat Soc: Series B (Stat Methodol)
2011, 73(2):123–214. doi:10.1111/j.1467-9868.2010.00765.x.
HMC or the more complicated NR-RMHMC algorithm 2. Wilkinson DJ: Stochastic Modelling for Systems Biology Mathematical and
for performing inference over steady state systems. In Computational Biology, vol. 11. London, UK: Chapman & Hall/CRC; 2006.
most cases, steady state data is not sufficient to uniquely 3. Gelman A, Carlin JB, Stern HS, Rubin DB: Bayesian Data Analysis, Texts in
Statistical Science. 2edn. London, UK: Chapman & Hall, CRC; 2004.
fit the parameters to specific values, and we are unable to 4. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E:
know a priori whether the parameters in the system will Equation of state calculations by fast computing machines. J Chem
exhibit strong correlation structure. In practice, one might Phys 1953, 21(6):1087–1092. doi:10.1063/1.1699114.
5. Kaderali L, Dazert E, Zeuge U, Frese M, Bartenschlager R: Reconstructing
therefore start with the simpler NR-HMC scheme, and signaling pathways from RNAi data using probabilistic Boolean
resort to the Riemannian version of it if the need arises. threshold networks. Bioinformatics 2009, 25(17):2229–2235.
In cases where the prior may be well specified, the differ- 6. Bois FY: GNU MCSim: Bayesian statistical inference for SBML-coded
systems biology models. Bioinformatics 2009, 25(11):1453–1454.
ences in scale between different parameters may require
doi:10.1093/bioinformatics/btp162.
NR-RMHMC for improved efficiency, as we observed in 7. Haario H, Laine M, Mira A, Saksman E: DRAM: Efficient adaptive MCMC.
the example section. In Statistics and Computing. Volume 16. Switzerland: Springer;
2006:339–354.
We conclude that the use of Newton-Raphson meth-
8. Brooks S, Gelman A, Jones G. L, Meng X-L (Eds): Handbook of Markov Chain
ods for calculating the steady states within HMC algo- Monte Carlo. Handbooks of Modern Statistical Methods. London, UK:
rithms is a valuable contribution to the field of numerical Chapman & Hall/CRC; 2011.
Kramer et al. BMC Bioinformatics 2014, 15:253 Page 11 of 11
http://www.biomedcentral.com/1471-2105/15/253
doi:10.1186/1471-2105-15-253
Cite this article as: Kramer et al.: Hamiltonian Monte Carlo methods for
efficient parameter estimation in steady state dynamical systems. BMC
Bioinformatics 2014 15:253.