Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Hello everybody

Slide 1 - 2
Implementation of variational bayes on fitting multivariate mixture model
The information is extracted from my thesis, so-supervised by Minh Ngoc,
Robert, and Alex.
I will brief you through the ideas of Variational Methods and then
demonstrate the method by one simple example.
Then I will explain the implementation of VB to handle the estimation of the
density for the mixture model.
And show the results on simulated data.
And finally the conclusion will summarize some points on the evaluation of
the method and further direction in relation to the implementation on the
multivariate mixture model.
Slide 3 - 4:
My thesis focuses on the area of statistical inference by Bayesian analysis.
One important problem in Bayesian analysis is the computation of posterior
distribution which often involves complex high dimensional integrals.

Motivation
Markov chain Monte Carlo is a widely used method in computing this
quantities because of its automatic adaptability and asymptoticly exact
property.
However, the biggest problem of MCMC: it's . And often very slow to
converge to the true value of posterior ()
When dealing with a massive dataset, MCMC is too computationally
intensive.

Intro:
To solve this computational obstable, there is an alternative method is
Variational Bayes.
It is origional developed in Statistical physics to solve the functional
optimzation problems. In early 90s, the interests were ignited in computer
science literature by applying the variational method to graphical models.

Basic ideas:
VB uses a more tractable distribution to approximate the intractable integral
form of posterior distribution
The marginal likelihood is broken down into the lower bound built on the
approximate distribution and the KL divergence measuring the distributional
difference between the true posteriod and the approximate .
This inequality is valid thanks to the non-negativity of Divergence.
Now the integral computation problems become an optimization problem.
We wish to find such an approximate distribution that maximixes the value
of the LB

To derive the analytical form of q, one assumption is applied.


Mean Field assumes the unconditional independence between parameters
of model.
Using this strong assumption, we are able to have an optimal densities
satifying condition 1.
Each parameter's densities are proportional to its expotential form.
Expected value of log likehood function with respect to all
parameters expert the one of interest.
Silve 5 - 8
To see how VB works and its accuracy, we can take a look at normal distribuition.
For Normal distribution, Recll that we want to estimate the posterior distribution
of 2 parameters, mean and variance
Approximation of mean follows a Normal distribution.
And variance follows an inverse gamma distribution.
We have closed form experssions for each hyper-prameters and Lower bound
Using an iterative scheme to update each hyper-parameter sequentially.
And we will stop when the LB bound value does not improve.
For this simple model, I simulate 20 data points from a normal distribution with
mean 100 and sd = 10.
Priors are set to be very far from the true values and hence non-informative.
As you can see, the convergence of VB is very fast., after 3 iterations.
I compare the estimates based on VB and Gibbs sampling - one special case of
MCMC.
2 methods results almost identical results. So VB is quite accurate for simple
models.
Slide 9:
VB actually can be applied to a very flexible class - mixture modeling.
Mixture modeling often used to analyze complicated data with multmodality.
MCMC is claimed to be very slow and unstable in the case of mixture models.
That's our motivation to develop VB methods for this particular class.
You can imagine that the observed data y is generated from a mixture of Normal
components. There are K of them connected by a gating structure pi.
Each normal component has its own mean and variance. Gating, mean, and
variance linearly depends on covariate z - intuitively predicting variables.
This model allows the flexbile coefficient structures across different components.
In particular, for this problem of flexible regression density estimatoon on
mixture model, we are concerned about multivariate response y, dimensions of
y is bigger than 1.
The indicator indicates the component that response belongs to.
Mean is a linear regression against vcovariate V,
Mixing probabilities is modeled by a logistic regression function against covariate
W

So we allowed to mean and mixing probabilities to depend on different sets of


covariates.
And to keep the clarity of the analysis, we assume that the varianc/cov matrix is
constant within a component.
The priors here are all conjungate. Following common rules of choosing priors for
multivariate data model.
Coef of mean and gating models- normal, precision matrix - inveser of cov/var
matrix Is a wishart, Idicator has a prior depending the prior values of gamma coefficients of gating model.
We can set hyperparamters freely within some range
That completes the specification for Bayesian analysis.
We applied the Mean field assumption to factorize q - approximation and using
the formula 1 to come up with the following rules. The conjungate priors gave us
a nice closed form solution to Mean coefficients and previcision matrix.
However, we have to use Newton-Raphson method for updating gating
coefficient since there is no distribution having the density function of this form.
- Gradient ascent types of algo that updates each hyper-parameters
sequentially until the variational lower bound change is smaller than a given
tolerance value.
For simulated data of bivariate response, Looking at the results of the simulation,
the updates rules that we specified returns the true values of parameters.
The convergence of LB happens around the 10 th iteration. Which is very rapid.
Slide 16.
The above analysis is done based on Mean Field assumption where we assume
independece of parameter structure. This keeps the tractability in derivaing the
anaytical solutions of approximation.
For cases when it exists highly correlated parameters, the estimates will not be
accurate and not valid anymore.
However, when the inference speed is a big concern such as analysis of massive
datasets, VB is very fast in converging to the local optimal and it allows us to get
the results quickly.
The further analysis on this mixture model is continued with component
selections and variable selection. It will be tested in the real datasets to evaluate
the prediction power compared with other models/methods.

You might also like