Dynamic Density Estimation of Market Microstructure Variables Via Auxiliary Particle Filtering

Dynamic Density Estimation of
Market Microstructure Variables

via Auxiliary Particle Filtering
DANIEL NEHREN, DAVID FELLAH, JESUS RUIZ-MATA,
AND YICHEN QIN
It is illegal to make unauthorized copies of this article, forward to an unauthorized user or to post electronically without Publisher permission.
The Journal of Trading 2012.7.4:55-64. Downloaded from www.iijournals.com by Riadh Zaatour on 03/04/14.
I
DANIEL NEHREN n the past decade, we have witnessed an the market microstructure variable p(yt ;θ),
is the global head of the explosion in the amount of research on density estimation is essentially parameter
Equity Quantitative
financial market microstructure. This estimation. However, density (parameter)
Solutions groups at
J.P. Morgan in is partially because this subject is inter- estimation usually uses cross-sectional data
New York, NY. esting, but also because it has become more and (or i.i.d. data) as opposed to time series data.
more important in capital market in terms of Even if with time series data, ergodicity is
DAVID FELLAH price formation and regulation. In this article, often assumed to make data easier for analysis.
runs the Americas we establish a framework in which the distri- In our case, we do not have the ergodicity as
Quantitative Solutions’
Algorithmic Trading
bution of market microstructure variables can market microstructure is inf luenced by many
group at J.P. Morgan be modeled sequentially. Inferential tasks can events happening across the day. Meanwhile,
in New York, NY. be carried out easily under our framework. we only have one single path of the time series
The distribution of market microstructure data to estimate a whole series of distributions.
JESUS RUIZ-M ATA variables can be used in so many ways, for Traditional methods become inadequate in the
runs the Quantitative
example, to help develop better trading strat- face of these difficulties.
Solutions’ Portfolio
Analytics group at J.P. egies, to enhance more robust risk manage- Our proposed method is designed to
Morgan in New York, ment tools, or to provide more informative deal with this situation. Instead of directly
NY. market signals. However, modeling distribu- estimating parameters, we give them a prior
tions is unquestionably harder than modeling distribution pt (θ). At each time point, the
YICHEN QIN the market microstructure itself since we only parameter estimation becomes updating the
is a Ph.D. student at the
Department of Applied
have one observation at each time point. The posterior distribution pt (θ|yt ). Moreover,
Math and Statistics, Johns method presented in this article is our attempt the posterior distribution is considered as
Hopkins University in in this direction. Positive results have been the prior distribution for the next time point
Baltimore, Maryland. found, but we believe more exciting results (i.e., pt+1(θ)=pt (θ|yt )). When iterating this
will be forthcoming. Bayesian updating, we have a series of poste-
rior distributions of the parameter. With the
GOAL AND BACKGROUND parametric distribution of the market micro-
structure variable conditional on parameters
Our ultimate goal is to estimate the dis- p(⋅|θ), we obtain the marginal distribution
tribution of any market microstructure variable of the market microstructure variable by
yt at any time t based on the past information integrating over the parameter space, that is,
y1:t so that we can evaluate the status of the p( ) ∫ p( | θ) pt (θ | yt ) dθ .
market and take appropriate actions. When Another benefit coming from the prior
assuming a parametric distribution family for distribution is the f lexibility of the market
FALL 2012 THE JOURNAL OF T RADING 55

microstructure variable distribution. With different There are two sets of variables: state variables xt
prior distributions, the market microstructure variable and observable variables yt. xt represents the status of the
distribution can take shapes that traditional parametric system. It evolves following the process defined by f. yt
families will not be able to. For example, if the prior represents the measurement of the status of the system.
distribution is a probability mass function, the mar- It is a measurement of xt at time t through g. εt and ηt
ginal distribution of the microstructure variable is a are i.i.d. distributed noise, they are also independent.
mixture of distributions with weights being the prob- εt represents the randomness of the dynamic system,
ability mass defined in the prior. More concretely, let us while ηt represents the measurement error. Our goal
assume p(θ={0,1})= 0.9, p(θ={5,10})= 0.1, and y follows is to, through the observable variables, make statistical
a normal distribution with θ, then the marginal distribu- inference on state variables at each time point.
tion of y is 0.9ϕ(0,1)+0.1ϕ(5,10), which is a mixture of In our analysis, we choose the bid–ask spread as a
normal distributions. This distribution can be used for particular example of market microstructure variables.
modeling outliers. When assuming a continuous prior We assume that, at each time point, the market micro-
distribution, the marginal distribution can take arbitrary structure variable yt (e.g., the bid–ask spread) follows a
shape, which grants us a great deal of f lexibility. gamma distribution with parameters αt and βt , which
In this article, we use the bid–ask spread as an also follow a vector autoregressive model (VAR) of the
example of market microstructure variables, but please order 1. We can only observe yt , but not αt and βt (i.e.,
keep in mind that our methodology is generally appli- xt ={αt , βt}). Hence, yt is the observable variable, and αt
cable to any market microstructure measure. Given that and βt are the state variables. Since the measurement
the bid–ask spread can only be positive, we assume it function is a gamma distribution, this is a nonlinear
follows a gamma distribution. We choose the gamma state space model.
distribution because of its f lexibility. The gamma dis- The VAR process takes values in the entire real
tribution can adopt skewness and kurtosis of data, and line, but the parameters of the gamma distribution can
its parameter estimation is relatively easy. only be positive. To make the VAR suitable for our
The spread follows a process with a particular analysis, we build the model using log αt and log βt
pattern across the day. Right after the opening of the instead since log function maps a positive number to
market, the spread tends to stay high and have large the entire real line.
volatility, which ref lects the chaos of supply and demand The nonlinear state space model is summarized
and information accumulated since the closing price on as follows:
the previous day. During the middle of the day, the
spread usually remains low because the price formation is yt ∼ gamma (α t βt ) α t βt > 0 (3)
stable and practitioners have better information about the
market. Toward the end of the day, the spread goes up ⎡
⎢ log α t ⎤⎥ ⎢ log α t −1 ⎥
⎡ ⎤ ⎛ ⎡0 ⎤ ⎡⎢ Σ11 Σt ⎤⎥ ⎞
12
again but with less magnitude than in the morning. ⎢ ⎥ = ρ⎢ ⎥+ε h εt N ⎜ ⎢ ⎥ , ⎢⎢ t21 ⎥
⎟
⎢
⎢⎣ log βt ⎥⎥⎦ ⎢ log β
⎢⎣
⎥
t −1 ⎥⎦
t
⎝ ⎣0 ⎦ ⎢⎣ Σ t Σt22 ⎥⎥⎦ ⎠
STATE SPACE MODELING (4)
Let us brief ly talk about the state space model as Aiming at the simplicity, we take ρ to be the
the basis of our framework. The state space model is a identity matrix, which means the VAR is a random
mathematical model of a dynamic system that is driven walk. We also let Σ12
t
Σ t21 = 0 , which means there is
by an unobserved underlying process and has outputs no interaction between αt and βt. The state space model
that are generated by the underlying process at every becomes
time point. It is defined as follows:
yt ∼ gamma ( t
βt ) (5)
xt f ( xt −1 , t ) (1) log llogg α t −1 + ε1t ε1t ∼ N ( Σ11 ) (6)
t t
yt g( x t , t ) (2) log βt l g βt −1 + εt2
log εt2 ∼ N ( Σ t22 ) (7)
56 DYNAMIC DENSITY ESTIMATION OF M ARKET M ICROSTRUCTURE VARIABLES VIA AUXILIARY PARTICLE FILTERING FALL 2012
Note that there is a t on the subscripts of Σ11
t
and Σ t22, Usually, the original distribution p(x) is very hard
which means these hyper parameters are changing across to draw samples from but can be evaluated point-wisely.
time. The time varying assumption is realistic because To approximate p(x), we generate a sample { i }iN=1 fol-
there will be structural changes of the underlying pro- lowing another distribution q(x), called the importance
cesses across the day as news and other important infor- function. The approximation of p(x) is given by
mation become available, and there is also a significant
difference in trading activities between the beginning N
p( x i ) N
p( x ) = ∑ w i δ xi ( x ) where
r wi ∝ and ∑w i
=1
of the day and the rest of the day. However, estimation i =1 q( x i ) i =1
of these parameters can be very difficult. We propose an (8)
advanced sequential estimation methodology, which will

be explained in the rest of the article. This is called the importance sampling. Proof of
the equivalence can be found in Ross [2006].
PARTICLE FILTERING
Filtering and Bayesian Updating
There is a whole spectrum of ways to model the
state space model. Having prior distributions on param- Filtering is about how to sequentially obtain
eters, we think the sequential Monte Carlo method is p(xt|y1:t ), the posterior distribution of the parameters.
an appropriate choice, namely the particle filter. The All filtering methods consist of two steps: prediction
particle filter method sequentially generates samples (i.e., and updating (filtering). By applying Bayes rule, we
particles) to approximate distributions that are dynami- can have a basic relationship that connects prediction
cally changing. It has an obvious advantage over other and updating. In our particular example of the bid–ask
methods—the distribution f lexibility. Using samples to spread, the particle xt contains αt and βt , which are the
approximate distributions is much less constrained than parameters of the gamma distribution. The relationship
the parametric form in terms of a distribution’s proper- can be rewritten in terms of αt and βt , as follows.
ties, such as multi-modal and fat tails. Given the fact that
computational power becomes less expensive, the particle
r g at t
filterin
likelihood

pred
r iction
(9)
filter also becomes more accessible. Due to the temporal p( t βt |y1:t ) |α t βt ) p((α t βt |y1:t 1 )
p( yt |α
association that we are after, we choose the auxiliary
particle filter (APF) for this analysis. In this section, we
pred
r iction
introduce the basics of the auxiliary particle filter and p( t βt |y1::tt −1 )

r g at t -1
then make a few changes to adapt to our data.
underly
r ing process
filterin
=∫ ∫ p(
p( t
βt |α t −1 , βt −1 α t −1 , βt 1 |y1:t −1 ) dα tt−1dβt −1
Particle Approximation (10)
and Importance Sampling
In practice, these functions are so complicated that
The foundation of the particle filter is to use a the integration in Equation (10) is impossible. Instead,
set of particles with associated weights to represent a we use particles { it 1 βit 1}iN=1 to approximate the filtering
distribution. A distribution p(x) can be approximated distribution (i.e., the filtering distribution at time t−1)
by particles via p( x ) = ∑ w i δ i ( x ), where δ i ( x ) is the
N
by p( t βt 1 |y1::tt −1 ) ∑ w ti δ i i (α t 1 ,β
N
i =1 x x i =1 α t 1 βt − 1
, t −1 ) . Hence,
Dirac function centered atNxi, { i i }iN=1 denotes the par- obtaining the filtering distribution at time t essentially
ticles and weights, and ∑ i =1 w = 1. With these particles,
i
becomes generating a new set of particles to approximate
any estimates from the distribution can be approxi- the posterior distribution p(αt , βt|y1:t ). We know the
mated easily. For example, mean and variance can be direct approximation of p(αt , βt|y1:t ) using { it 1 βit 1}iN=1 ,
approximated using summation instead of integration. which is the following:
The more particles we have, the more accurate is our
approximation.

p( t
βt |y1:tt ) p( yt | t
βt )∫ ∫ p(
p( t
βt |α t βt −1 ) q( t
βt i |y1::tt ) p( yt |uti vti ) p( t
βt |α it −1 , βit −11 )w ti−1
|α (16)
p(( t
βt 1 | y1:: )d
)dα t −1dβt −1 (11) where uti , vti are expectations of αt and βt is conditional
N on α it −1 and βit −1. p( yt | uti vti ) can be considered as a proxy
= p yt | t
, βt ) ∑ p( t , βt |α it −1 , βit −1 )w ti−1 of p(yt|αt ,βt ). We can also write

i =1
(12)
im

mporta
r nce function ( mixture
r model
d l)
( t
βt i | y1:t:t ) q( t
βt | i, y :t )q(i | y1:t ) (17)
t rget function
ta
By defining q( t βt | i y1:t ) p( t βt | α ti 1 βit 1 ) , we

Our problem comes down to how to generate a

have q(ii | y1::tt ) p( yt | uti vti )w ti−1. With this partition on q,
sample { it βit }iN=1 from this p( t βt |y1:t ). A easy approach
we can draw the pair step by step. First, we draw the index
can be found by noting that the second part in Equation based on q(ii | y1::tt ) p( yt | uti vti )w ti−1, and then we draw the
(12) is a mixture model, from which samples are easy to
new particle based on q( t βt | i y1:t ) p( t βt | α ti 1 βit 1 ) .

generate. So we take this mixture model as our impor-
After we get the particles { tj j }Nj =1 , we throw away
tance function and adjust weights to be w ti p( p yt |α t , βt ), i j. Eventually, we adjust the weights according to the
which is the ratio of the target function and the impor- ratio:
tance function.
p( βt i | y1:t ) p( yt | t , βt )
Auxiliary Particle Filter w ti = t
= (18)
q( t
βt i | y1:t ) p( yt | uti vti )
However, the previous approach often wastes a lot
A general algorithm for the APF is described in
of computation on generating particles { it βit } that have
Exhibit 1.
small weights p( yt | α ti βit ). It is because when generating
particles from the mixture model, we ignore the new
observation yt. So a lot of particles end up in the region ESTIMATION OF TIME VARYING HYPER
where p( yt | α ti βit ) is low (i.e., unlikely scenarios). To PARAMETERS IN STATE SPACE MODEL
overcome this problem, we simultaneously draw the
As mentioned previously, assuming constant hyper
old and new particles in a pair, where the new par-
parameters is not realistic in the real world, we need to
ticle propagates from the old one, conditional on the
have time varying hyper parameters. However, even
observation up to yt. In this way, the old particle that is
though estimating constant hyper parameters (Liu
most likely to propagate is also more likely to be drawn,
[2001]) can be done with classical inferential methods,
hence the new particle has a larger weight. Because old
little has been developed for time varying parameters.
particles are already generated, we only need to draw
Our estimation procedure is designed for this challenge.
the index of the old particle along with the new particle.
It is defined as
After drawing the pair, we throw the index and keep
the new particle. likelihood at time t
⎡
⎢
11 ⎤
⎥
The distribution we want to draw new particles ⎢ t ⎥
⎢ ⎥ = arg
r m11 ax22 p( yt | y1:tt−
t 1
) (19)
and the indices from is Σ
⎢ 22 ⎥ Σt Σt
⎢⎣ t ⎥⎦
p( t
βt i | y1:t ) p(yyt | t
βt ) pp(α
(α t βt , i | y1:t −1 ) (13)
The p(yt|y1:t−1) stands for the likelihood of observing
= p yt | t
, βt ) p( t , βt | i
t 1
, βit−
t 1
) p(i | y1::t −1 ) (14) the current observation yt given the hyper parameters
Σ11 and Σ t22. Maximizing it is essentially the maximum
= p yt | t
, βt ) p( t , βt | i
t 1
, βit−
t 1
)w ti−1 (15) t
likelihood estimation.
This estimation is a two-dimensional optimization
We use the importance sampling to obtain new problem that is computationally expensive and has many
particles. The importance function is defined local maximums. We regularize the parameter space and
transform the estimation to a one-dimensional optimi-
EXHIBIT 1 To solve the optimization problem, note that we
The General Algorithm of the Auxiliary have a good approximation of p(α t βt 1 |y1:t 1 ) using
Particle Filter particles at time t−1. p( yt |α t βt ) is the gamma distri-
bution function by assumptions. pΣ ∗t ( t , βt |α t−1 t−
t
, βt−
t 1
),
the underlying process, is the only part that contains the
hyperparameter. To evaluate pt (yt|y1:t−1) at each point of
pΣ ∗t , we first take draws from p(α t βt 1 |y1:t 1 ) , and
then draw new particles according to the underlying
process pΣ ∗t ( t , βt |α t−1
t−
t
, βt−
t 1
) , finally, we take draws
from p( yt |α t βt ) using the gamma distribution. By

doing so, we obtain new samples of yt , hence have an
empirical distribution p( |y1::tt −1 ) . We then plug in the
observed yt and take p( yt |y1:tt 1 ) as our approximation.
Efficient Estimation of Hyper Parameters
Although we are able to estimate the hyper param-

eter at each time, it is neither practical nor efficient to do
that. The estimation obtained in such a way has a large
variance and is computationally expensive, meaning
that we spend a lot of time estimating a quantity that is
changing all the time, and on top of that, our estimate
is not even that accurate. The hyper parameter is tuning
the algorithm’s sensitivity to a new observation. We need
to find a balance in between to trade off accuracy (sen-
sitivity) for computation speed.
zation problem by setting the constrain Σ ∗t Σ11 Given this dilemma, we propose an estimation
t
= 10 Σ t22 .
In other words, the standard deviation for log αt is about alternative by taking advantage of the asymmetric loss

three times larger than the standard deviation of log βt. associated with over- or underestimating ( ∗t Σ ∗t or
∗ ∗
This assumption is based on the empirical study. The Σt Σ t ). In the APF, if we underestimate the hyper
regularization on the parameter space is essentially a parameter, particles tend to be less sensitive to new
bias variance trade-off. By introducing the regulariza- observations, the estimated distribution moves relatively
tion, we bring more bias into the estimation, but our slowly and smoothly across time. On the other hand, if
estimation variance will be smaller than in the case of we overestimate the hyper parameter, our model tends
estimating two free parameters at the same time. When to overreact to new observations, and our estimated dis-
combining the bias with the variance, we are actually tribution tends to be volatile.
better off adding the regularization. However, more Both over- and underestimation hurt us, but on
elegant ways of regularization are needed in the future. different occasions, with different magnitudes. When
From now on, we assume only one hyper parameter Σ ∗t . the market is volatile, underestimation hurts us. Because
Our estimation becomes we anticipate a lot of new information coming to the
market, we expect our APF to respond to this infor-
Σ ∗t = p yt | y1::tt −1 ) mation quickly. On the other hand, when the market
Σ∗t
(20)
is stable, overestimation hurts us. However, since the
market is stable, new observations tend to coincide with
∗ ∫ ∫ ∫ ∫
= arg
r max ) Σ ∗t (α t , βt |
p( yt |α t , βt )p t−
, βt −1 )
Σt the estimated distribution, therefore the negative effect
p(α t − βt −1 |y1:t− α t −1 , βt−
)dα α t βt
dα is affordable to us.
t t 1 (21) With these properties in mind, we develop the
following algorithm for estimating the hyper parameter

EXHIBIT 2 Again, we use the same approach mentioned in the
The Algorithm for Estimating Time Varying Hyper previous section to obtain an empirical distribution of S,
Parameters along with its 2.5% and 97.5% quantiles. If the observa-
tion yt falls outside the critical values, we reject the null
hypothesis—decide this is a change point—and start to
estimate the hyper parameter thereafter.
RESULTS AND VALIDATION

Simulation Results
We use the state space model (Equation (5))

introduced previously to generate artificial data using
constant hyper parameter Σ ∗t = 0.1 and apply the APF

algorithm with the constant hyper parameter. We plot
the data, the parameters (αt, βt ) against their estimates in
Exhibit 3. In Part A of the exhibit, the line with nodes
is the actual data, the upper f lat curve presents the 90%
percentiles, and the lower f lat curve presents the 10%
percentiles of the estimated distribution. We see that the
percentiles cover the real data most of the time, which
means the estimated distribution works well. In Parts
B and C, the lines with nodes are the actual αt and βt ,
periodically (Exhibit 2). At each time, we perform a and the f lat curves, are the estimates α t and βt obtained
change point detection test. If the test rejects the null via particle filter. In these figures, we can see that the
hypothesis, then we believe it is a change point and estimated parameters f luctuate around the true values,
start estimating the hyper parameter at every single time which means particle filtering is doing a good job.
thereafter, until the estimated hyper parameter drops
back to the threshold that we pre-specify. We call this Real Data Results
period the “monitoring stage.” The monitoring stage
can only be set to “on” when the change point detection We now apply the methodology to the real data.
returns TRUE and be set to “off ” when the estimated The data used are Google’s bid–ask spread intraday data on
hyper parameter drops below the threshold. April 11, 2011. The average number of observations during
The threshold of the hyper parameter is decided every second is 1.64. The average number of observa-
based on empirical study. More elegant ways to decide the tions during every minute is 97.4. We plot the raw data in
threshold are definitely possible and open for the future Exhibit 4. As we can see, there are obvious patterns driving
research. the observable variable. The spread is large in the begin-
For the change point detection test, our test statis- ning of the day and stays low through the rest of the day.
tics S is just yt. Under null hypothesis There are some spikes during the middle of the day, which
Σ ∗t Σ t∗−1, S follows a distribution: indicate structural changes of the underlying process.
We take a piece of data and apply our APF algo-
S f S ( s) (22) rithm. We also use the APF algorithm with the constant
= p( yt = s | y :t −1 ) (23) hyper parameter for comparison. The results are presented
in Exhibit 5. In the exhibit, the line (with nodes) is the
= ∫ ∫ ∫ ∫ p( yt = s |α t , βt ) pΣ ∗t = Σ ∗t−
t
( α t , βt | t −1
, βt −1 ) actual spread. The solid line, on top of the actual spread,
(24) is the tracking of the spread using the time varying hyper
p( t−
, βt−
t 1
|y1:t−
: 1
) dα t −1 dβt−
t 1
dα t dβt
parameter with threshold of 0.01. The dashed line is the
tracking result using the small constant hyper parameter
EXHIBIT 3
Tracking the Simulated Data
Note: For a color version of this exhibit, please visit The Journal of Trading website at www.iijournals.com/jot.
( Σ ∗t = 0.001). The dotted line is the large constant hyper two parts, we can see that most of the jumps and spikes
parameter (Σ ∗t = 1). We can see from the exhibit that the are captured by this change point detection test.
APF generally tracks the spread very well. The model We are ultimately interested in the distribution,
with the small hyper parameter responds to jumps slowly. so we plotted the estimated quantiles of the spread for
The model with the large hyper parameter responds to different methods in Exhibit 6. In the exhibit, Part A
jumps swiftly, but it is very volatile even when there is is using small hyper parameter ( Σ ∗t = 0.001), Part B is
nothing happening to the data. Our time varying hyper using large hyper parameter (Σ ∗t = 1), and Part C is using
parameter combines the merits of these two algorithms. time varying hyper parameter with threshold of 0.01,
It can both move smoothly when the spread is stable and Part D is the method of moments (MOM) estimate
and track jumps very well when the data is volatile. The of the distribution based on a exponentially weighted
bottom part of Exhibit 5 indicates when the algorithm is sample window (sample size of 20). In these figures,
actively estimating the hyper parameter. Combining these the thin f lat lines indicate 90%, 50%, and 10% per-

62
EXHIBIT 5
EXHIBIT 4
The Tracking Comparison

The Bid–Ask Spread of Google, April 11, 2011
DYNAMIC DENSITY ESTIMATION OF M ARKET M ICROSTRUCTURE VARIABLES VIA AUXILIARY PARTICLE FILTERING
FALL 2012
FALL 2012
EXHIBIT 6
The Comparison of Different Methods
THE JOURNAL OF T RADING

63
centiles of estimated distribution. The line with nodes The rationale is that some information is more signifi-
indicates the actual spread. As we pointed out earlier, cant in the high-frequency data, while some other infor-
our time varying hyper parameter can track the spread mation can be obtained using the low-frequency data.
really well, whereas the small hyper parameter is falling Combining these models will give us a better under-
behind when there is a jump and the large hyper param- standing of the data and hence a more robust estimation
eter tends to overreact each new observation. The MOM of the distribution.
approach in Part D uses a moving windows that contains
20 previous data points. We conduct the MOM based REFERENCES
on exponentially weighted sample. When the spread
becomes stable, the MOM estimated distribution con- Adams, R.P., and D.J.C. MacKay. “Bayesian Online Change-
verges to a point, which is useless for us. Moreover, point Detection.” University of Cambridge Technical Report,
when there is a structural change to the spread, MOM (2007).
always falls behind other methods in terms of updating

the distribution. Arulampalam, M.S., S. Maskell, N. Gordon, and T. Clapp.
“A Tutorial on Particle Filters for Online Nonlinear/Non-
Gaussian Bayesian Tracking.” IEEE Transactions on Signal Pro-
CONCLUSION cessing, Vol. 50, No. 2 (2007), pp. 174-188.
In this article, we present a novel approach to Johannes, M., and N. Polson. Handbook of Financial Statistics,
sequential density estimation for market microstruc- Graduate School of Business, Columbia University (2009),
ture variables using the auxiliary particle filter. Results pp. 1015-1028.
from simulated and real data show an excellent perfor-
mance. The approach we introduce is f lexible and can ——. “Particle Filtering and Parameter Learning.” Working
be adopted to many other places. Paper Graduate School of Business, Columbia University
Although significant results are found using our (2006).
proposed model, there are many openings for future
research. The first one is the regularization of the hyper Liu, J., and M. West. “Combined Parameter and State Estima-
tion in Simulation-Based Filtering.” In Sequential Monte Carlo
parameter space. More elegant ways of regularization
Methods in Practice, (2001), pp. 197-217.
are needed. We can also incorporate more compli-
cated structures to ρ and Σ ijt . For example, we can use Pitt, M.K., and N. Shephard. “Filtering via Simulation: Aux-
a non-identity matrix for ρ to introduce a trend in the iliary Particle Filters.” Journal of the American Statistical Associa-
underlying process. We can also introduce a negative tion, Vol. 94, No. 446 (1999), pp. 590-599.
correlation between αt and βt so that the gamma dis-
tribution is more robust. Last but not the least, we can, Ross, S.M. Simulation, 4th ed. Elsevier Academic Press.
instead of modeling a single market microstructure vari- (2006).
able, build multiple variables state space models (e.g.,
trading volume, spread, and so on).
Another separate direction is that we can build To order reprints of this article, please contact Dewey Palmieri
multiple auxiliary particle filters at different frequencies. at dpalmieri@ iijournals.com or 212-224-3675.

Dynamic Density Estimation of Market Microstructure Variables Via Auxiliary Particle Filtering

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dynamic Density Estimation of Market Microstructure Variables Via Auxiliary Particle Filtering

Uploaded by

Copyright:

Available Formats

Dynamic Density Estimation of

Market Microstructure Variables

FALL 2012 THE JOURNAL OF T RADING 55

STATE SPACE MODELING (4)

advanced sequential estimation methodology, which will

introduce the basics of the auxiliary particle filter and p( t βt |y1::tt −1 )

FALL 2012 THE JOURNAL OF T RADING 57

By defining q( t βt | i y1:t ) p( t βt | α ti 1 βit 1 ) , we

Our problem comes down to how to generate a

new particle based on q( t βt | i y1:t ) p( t βt | α ti 1 βit 1 ) .

from p( yt |α t βt ) using the gamma distribution. By

observed yt and take p( yt |y1:tt 1 ) as our approximation.

Efficient Estimation of Hyper Parameters

Although we are able to estimate the hyper param-

FALL 2012 THE JOURNAL OF T RADING 59

RESULTS AND VALIDATION

We use the state space model (Equation (5))

constant hyper parameter Σ ∗t = 0.1 and apply the APF

FALL 2012 THE JOURNAL OF T RADING 61

The Tracking Comparison

THE JOURNAL OF T RADING

always falls behind other methods in terms of updating

You might also like