Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Gaussian Process Vine Copulas for

Multivariate Dependence
Jose Miguel Hernandez-Lobato1,2
joint work with David L
opez-Paz2,3 and Zoubin Ghahramani1
1 Department
3 Max

of Engineering, Cambridge University, Cambridge, UK


Planck Institute for Intelligent Systems, T
ubingen, Germany

April 29, 2013

Both authors are equal contributors.


1

What is a Copula? Informal Definition


A copula is a function that links univariate marginal distributions into a
joint multivariate one.
Marginal Densities
Joint Density

0.0

0.1

0.2

0.3

Copula

10

0.0

0.1

0.2

0.3

0.4

The copula specifies the dependencies among the random variables.


2

What is a Copula? Formal Definition


A copula is a distribution function with marginals uniform in [0, 1] .
Let U1 , . . . , Ud be r.v. uniformly distributed in [0, 1] with copula C then
C (u1 , . . . , ud ) = p(U1 u1 , . . . , Ud ud ) .

Sklars theorem (connection between joints, marginals and copulas)


Any joint cdf F (x1 , . . . , xd ) with marginal cdfs F1 (x1 ), . . . , Fd (xd ) satisfies
F (x1 , . . . , xd ) = C (F1 (x1 ), . . . , Fd (xd )) ,
where C is the copula of F .
It is easy to show that the joint pdf f can be written as
f (x1 , . . . , xd ) = c(F1 (x1 ), . . . , Fd (xd ))

d
Y

fi (xi ) ,

i=1

c(u1 , . . . , ud ) and f1 (x1 ), . . . , fd (xd ) are the copula and marginal densities.
3

Why are Copulas Useful in Machine Learning?


The converse of Sklars theorem is also true:
Given a copula C : [0, 1]d [0, 1] and margins F1 (x1 ), . . . , Fd (xd ) then
C (F1 (x1 ), . . . , Fd (xd )) represents a valid joint cdf.
Copulas are a powerful tool for the modeling of multivariate data .
We can easily extend univariate models to the multivariate regime.
Copulas simplify the estimation process for multivariate models.
I 1 - Estimate the marginal distributions.
I 2 - Map the data to [0, 1]d using the estimated marginals.
I 3 - Estimate a copula function given the mapped data.
Learning the marginals : easily done using standard univariate methods.
Learning the copula : difficult, requires to use copula models that i) can
represent a broad range of dependencies and ii) are robust to overfitting.
4

Parametric Copula Models


There are many parametric 2D copulas. Some examples are...
Gaussian

Clayton

Frank

t Copula

Gumbel

Joe

Usually depend on a single scalar parameter which is in a one-to-one


relationship with Kendalls tau rank correlation coefficient, defined as
= p[(U1 U10 )(U2 U20 ) > 0] p[(U1 U10 )(U2 U20 ) < 0]
= p[concordance] p[discordance] ,
where (U1 , U2 ) and (U10 , U20 ) are independent samples from the copula.
However, in higher dimensions, the number and expressiveness of
parametric copulas is more limited .
5

Vine Copulas
They are hierarchical graphical models that factorize c(u1 , . . . , ud ) into
a product of d(d 1)/2 bivariate conditional copula densities.
We can factorize c(u1 , u2 , u3 ) using the product rule of probability as
c(u1 , u2 , u3 ) = f3|12 (u3 |u1 , u2 )f2|1 (u2 |u1 )
and we can express each factor in terms of bivariate copula functions

Computing Conditional cdfs


Computing c31|2 [F3|2 (u3 |u2 ), F1|2 (u1 |u2 )|u2 ] requires to evaluate the
conditional marginal cdfs F3|2 (u3 |u2 ) and F1|2 (u1 |u2 ).
This can be done using the following recursive relationship:

Cjk|B [Fj|B (uj |B), x|B]

Fj|A (uj |A) =
,

x
x=Fk|B (uk |B)
where A is a set of variables different from uj and B = A \ {k}.
For example,

C32 (u3 , x)
F3|2 (u3 |u2 ) =
,

x
x=u2


C21 (x, u1 )
F1|2 (u1 |u2 ) =
.

x
x=u2

Regular Vines
A regular vine specifies a factorization of c(u1 , . . . , ud ).
Formed by d 1 trees T1 , . . . , Td1 with node and edge sets Vi and Ei .
Each edge e in any tree has associated three sets of variables C (e), D(e),
N(e) {1, . . . , d} called conditioned, conditioning and constraint sets.
V1 = {1, . . . , d} and E1 forms a spanning tree over a complete graph G1
over V1 . For any e E1 , C (e) = N(e) = e and D(e) = .
For i > 1, Vi = Ei1 and Ei forms a spanning tree over a graph Gi with
nodes Vi and edges e = {e1 , e2 } such that e1 , e2 Ei1 and e1 e2 6= .
For any e = {e1 , e2 } Ei , i > 1, we have that C (e) = N(e1 )N(e2 ),
D(e) = N(e1 ) N(e2 ) and N(e) = N(e1 ) N(e2 ).
c(u1 , . . . , ud ) =

d1
Y

cC (e)|D(e) .

i=1 eEi
8

Example of a Regular Vine

Using Regular Vines in Practice


Selecting a particular factorization:
Many possible factorizations . Each one determined by the specific
choices of spanning trees T1 , . . . , Td1 .
In practice, each tree Ti is chosen by assigning a weight to each edge in
Gi and then selecting the corresponding maximum spanning tree.
The weight for the edge e is usually related to the dependence level
between the variables in C (e) (often measured in terms of Kendalls tau).
It is common to prune the vine and consider only a few of the first trees.
Dealing with conditional bivariate copulas:
Use the simplifying assumption : cC (e)|D(e) does not depend on D(e).
Our main contribution: avoid making use of the simplifying assumption.
10

A Semi-parametric Model for Conditional Copulas


We describe cC (e)|D(e) using a parametric model specified in terms of
Kendalls tau [1, 1].

Let z be a vector with the value of the variables in D(e).


Then we assume = [f (z)] , where f is an arbitrary non-linear function
and (x) = 2(x) 1 is a sigmoid function.
11

Bayesian Inference on f
We are given a sample DUV = {Ui , Vi }ni=1 from CC (e)|D(e) with
corresponding values for the variables in D(e) given by Dz = {zi }ni=1 .
We want to identify the value of f that was used to generate the data.

1.0

0.5

0.0

0.5

1.0

We assume that f follows a priori a Gaussian process .

10

10

10

10

12

Posterior and Predictive Distributions


The posterior distribution for f = (f1 , . . . , fn )T , where fi = f (zi ), is
p(f|DUV , Dz ) =

Qn

i=1 c(Ui , Vi |

= [fi ])] p(f|Dz )


,
p(DUV |Dz )

where p(f|Dz ) = N (f|m0 , K) is the Gaussian process prior on f.


Given zn+1 , the predictive distribution for Un+1 and Vn+1 is
Z
p(un+1 , vn+1 |zn+1 , DUV , Dz ) =

c(un+1 , vn+1 | = [fn+1 ])


p(fn+1 |f, zn+1 , Dz )p(f|DUV , Dz )df ,

For efficient approximate inference, we use Expectation Propagation .

13

Expectation Propagation
EP approximates p(f|DUV , Dz ) by Q(f) = N (f|m, V) , where

EP tunes m
i and vi by minimizing KL[qi (fi )Q(f)[
qi (fi )]1 ||Q(f)] . We use
numerical integration methods for this task.
Kernel parameters fixed by maximizing the EP approx. of p(DUV |Dz ).
The total cost is O(n3 ) .
14

Implementation Details
We choose the following covariance function for the GP prior:
n
o
Cov[f (zi ), f (zj )] = exp (zi zj )T diag()(zi zj ) + 0 .
MLE + 1)/2) ,
The mean of the GP prior is constant and equal to 1 ((
where MLE is the MLE of for an unconditional Gaussian copula.
We use the FITC approximation:
T
K approximated by K0 = Q + diag(K Q), where Q = Knn0 K1
n0 n0 Knn0 .

Kn0 n0 is the n0 n0 covariance matrix for n0  n pseudo-inputs .


Knn0 contains the covariances between training points and pseudo-inputs.
The cost of EP is now O(nn02 ) . We choose n0 = 20.
The predictive distribution is approximated using sampling .
15

Experiments I
We compare the proposed method GPVINE with two baselines:
1 - SVINE , based on the simplifying assumption.
2 - MLLVINE , based on the maximization of the local likelihood.
Can only capture dependencies on a single random variable.
Limited to regular vines with at most two trees .
All the data mapped to [0, 1]d using the ecdfs.
Synthetic Data: Z uniform in [6, 6] and (U, V ) Gaussian with
correlation 3/4 sin(Z ). Data set of size 50.

-0.6

U,V

-0.2

|Z

0.2

0.6

GPVINE
MLLVINE
TRUE

0.0

0.2

0.4

0.6
PZ (Z)

0.8

1.0
16

Experiments II
Real-world data: UCI datasets, meteorological data, mineral
concentrations and financial data
Data split into training and test sets (50 times) with half of the data.
Average test log likelihood when limited to two trees in the vine:

17

Results for More than Two Trees

GPVINE
SVINE

18

Conditional Dependencies in Weather Data

Conditional Kendalls tau for


atmospheric pressure and
cloud percentage cover
when conditioned to latitude
and longitude near Barcelona
on 11/19/2012 at 8pm.

x x xxxx x
x
xx xxxxxx x x x xxx
xxxxxx
x
x xx xxx
x x x xx x x x xxx x x
x x x x xxxxxxxxxxx
x
xxx xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx x xxx x
xx x x x x
x
x x xxxxxxxxxx xx
xx xxxxxxxxxxxxxxxxxx x x xxx
xx
x x x xxxxxxxxxxxxxxxxxx xxxx xxxxx x xxxxxxxxxxxxxxxxxxxxxxxx xxx x xxxxxxxxxxxxxx
xx xx x x xx x x xxxxxxxxxx
xxx x
x x xxx xxxxxxxxx xxx
x
x
x
x
x
x
xx
x
x
x
x
x
x
x x xxxxx x x x
xx x
x
x
xxx
x x x
x xx xx xx
x
x
x
x
x
x
xxxxxxx
xx
x xxxxxxxxxxxxxxxxx
x x
xx
xx x
x x
xxxx x
x xxx x
xxxxx x
x
x
xxxxxxx
x
x xx x
x
xx xxxx x x x x
x x x xx xxxxxx xxxxx
x
xxx xxxxx xx xx xxx
x x xxxxxx xx
x
x
xx
x xx x xxxx xxxxxxxxxxxxxxxxxxxxxxxx
x
x xxxx xxx xxx
xxxx xxxxxxx
x
x x xxxxxxxxxxxxxxxxxxxxx
xx x
x
x x xxxxxxxxxxxxxxxxxxxx
x x xx
x
xxxx
x
x xx x x x x x xx
x
x
x
x xxx
x
x
xx
x
xx
xxxx
x
x
x
x xxx
x
x
xx
xx
xx xxx
xxx xxxxxx
xxxxxxxxxxxxxx xxx
x
xxxxxxxxxx
xxxxxx xxxxxx
x
x
x
x
x
xxx xxxxx
xx
xxxxxxxxxxx
xxxxxxx
xx xxx
xxxx
xxxxxxx
xx
x

19

Summary and Conclusions


Vine copulas are flexible models for multivariate dependencies which
specify a factorization of the copula density into a product of conditional
bivariate copulas.
In practical implementations of vines, some of the conditional
dependencies in the bivariate copulas are usually ignored .
To avoid this, we have proposed a method for the estimation of
fully conditional vines using Gaussian processes (GPVINE).
GPVINE outperforms a baseline that ignores conditional dependencies
(SVINE) and other alternatives based on maximum local-likelihood
methods (MLLVINE).

20

References
Lopez-Paz D., Hernandez-Lobato J. M. and Ghahramani Z. Gaussian
Process Vine Copulas for Multivariate Dependence International Conference
on Machine Learning (ICML 2013).
Acar, E. F., Craiu, R. V., and Yao, F. Dependence calibration in conditional
copulas: A nonparametric approach. Biometrics, 67(2):445-453, 2011.
Bedford, T. and Cooke, R. M. Vines-a new graphical model for dependent
random variables. The Annals of Statistics, 30(4):1031-1068, 2002
Minka, T. P. Expectation Propagation for approximate Bayesian inference.
Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence,
pp. 362-369, 2001.
Naish-Guzman, A. and Holden, S. B. The generalized FITC approximation.
In Advances in Neural Information Processing Systems 20, 2007.
Patton, A. J. Modelling asymmetric exchange rate dependence.
International Economic Review, 47(2):527-556, 2006

21

Thank you for your attention!

22

You might also like