Cópula Vine Approach

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

C.

1 ‘Copula – vine’ approach to continuous BBN for the aircraft


separation time model

We use the protocol presented in [Kurowicka D., Cooke R.M. 2004] to specify (conditional) correlations
to be required from experts [see Cooke R.M., 1991]. As we already said these correlations are assigned
to the directed arcs of the BBN.

First we choose the sampling order 1, 2, 3, 4, 5, 6, 7, 8, 9 for the BBN structure, such that the ancestors
of a node appear before that node in the ordering. This order is not unique; we could have chosen a
different sampling order. Observe Figure C-1, the node “Prescribed spacing”, numbered 4 has as
ancestors the nodes “Error ATC Supervisor”, “Separation Mode Planner Failure”, and “Wind Prediction”;
thereby, they were placed in the ordering before node 4 as nodes 3, 2 and 1, respectively.

We write the complete factorization and underscore the nodes which do not have a direct “influence”
with the conditioned variable, i.e., which are not its parents, and hence are not necessary in sampling
it. This factorization is

P (1, 2, 3, 4, 5, 6, 7, 8, 9) = ( ) ( )
P (1) P (2 1) P 3 2 1 P 4 32 1 P (5 4321) P (6 54321)
( ) (
P 7 65 4321 P 8 7654321 P 9 874 65321 ) ( ) (1)

If we drop the underscored variables, we obtain the standard factorization for the BBN given as follows
[Pearl J. 1988, Jensen F.V. 1996]:

P ( X 1, X 2 , , X 9 ) = (
 P X i pa ( X i )
9

i =1
) (2)

where pa ( X i ) denotes the parents of variable X i .

To sample a distribution specified by a continuous BBN we use the sampling procedure for the D -vine
[Kurowicka D., Cooke R.M. 2005]. For each part of the factorization we build a D -vine on K variables
denoted by DK = D ( K , CK, IK). The ordering of the variables is very important. We start with the variable
K ; then the dependent variables, CK; and, at the end the independent variables, IK.

a) Let us start with the first term of the factorization, P (1) . Since variable X 1 neither has dependent
variables, nor independent ones, C1 = I 1 =  . Then, the D -vine for X 1 is trivial, we denote it by D1= D
(1). To sample X 1 , we can just sample a uniform random variable,

x1 = u1 . (3)

b) Second part of the factorization gets a bit more complicated. We take P 2 1 . ( )


r 21
2 1

C2 = 1 , I 2 =   r21
ATC-WAKE D3_6B APPENDIX, 20/02/2005

Figure C-2: D2 for the BBN for the aircraft separation time with 9 variables

In Figure C-2, we can see the D -vine D2 and sets of independent and dependent variables for X 2 .
There are no underscored variables, hence I2 =  . The set of dependent variables C2 consists of the
variable X 1 , so the ordering of D2 is as in Figure C-2. To specify dependence between X 1 and X 2 , it
is required to assign a rank correlation r12 to the edge between X 1 and X 2 in D2 and equivalently to
the corresponding arc in the BBN in Figure C-1. The graphical representation of the sampling
procedure is shown in Figure C-3:

X1
r12
u1 = x1
X2
x2
0

u2
1 F2 1

Figure C-3: Graphical representation of sampling value of x 2 in D2

We acquire a value of variable X 2 , say x2 in D2. The horizontal axis represents random variable X 2
, and its parent X 1 is placed on the vertical axis. The diagonal band copula1 [Cooke R.M., Waij R.
1986] realizes the correlation r12 between these random variables. Value X1 = x1 is known from the
first term of the factorization, this allows us to calculate the conditional distribution of X 2 given
variable X 1 = x 1 , denoted by F2 1 . If we sample value of the independent uniform variable U 2 = u2
and invert it with respect to F2 1 then we get the desired value x2 . So, the sampled value of variable
X 2 is obtained as

x2 = F2−11: x (u 2 ) .
1
(4)

Third part of the factorization can be now considered.

(
c) P 3 2 1 )
0
3 2 1

Figure C-4: D3 for the BBN for the aircraft separation time with 9 variables

1
This copula will be used in the text only to visualize the sampling procedure, since it can be easily drawn. Although, for
applications we will use Frank’s copula [Frank M.J. 1979] as it does not add much information to the product of margins, enjoys
the zero independence property and has a close form of conditional and inverse conditional distributions.

Page 2
ATC-WAKE D3_6B APPENDIX, 20/02/2005

For the third part of the factorization K =3, and variables X 1 and X 2 are underscored, that is, X 1
and X 2 are independent of X 3 . C 3 =  and I 3 =  2, 1 . Hence, the order of the variables is D 3 =
D(3, 2, 1). Variables X 1 and X 2 were already sampled so we are now interested only in information
about variable X 3 , hence the information in the left-most part of the vine (stood out area in Figure C-
4). Both r32 , r31 2 are equal to zero because X 3 is independent of X 1 and X 2 .

Therefore, to sample random variable X 3 we just sample the value of the independent uniform
variable U 3 , say u3

x3 = u3 . (5)
We turn to the fourth part of the factorization.

(
d) P 4 32 1 )

r 43 0
4 3 2 1

r 42 3

Figure C-5: D4 for the BBN for the aircraft separation time with 9 variables

For the fourth term of the factorization K = 4; the set of dependent variables consists of variables X 2
and X 3 , hence C4 = 3, 2  ; and, variable X 1 is underscored I 4 = 1 ., i.e., variable X 1 is
independent of variable X 4 given X 2 and X 3 . We have D4 = D (4, 3, 2, 1). Notice that the order of
the variables stays the same as in D3.
We are only interested in information about variable X 4 as
variables X 1 , X 2 and X 3 were already sampled. We have that r41 32 = 0 , due to independence
between variables X 1 and X 4 given X 2 and X 3 . The correlations r43 and r 42 3
need to be specified2.

The equality of the top correlation in D4, r 41 32


to zero, makes quantile functions F1 32 and F4 32
independent , hence we can reduce D4 to a vine on three variables, in this case D (4, 3, 2) (circled
area). Every time when some of the highest order (conditional) correlations of the left-most part of the
vine are equal to zero, the D-vine can be reduced in a similar way. This simplifies the sampling of
variable X 4 that does not depend on value of the variable X 1 . From previous factorizations we know

2
Note that we can change the ordering in D4 to 4, 2, 3, 1, which allows another possibility to specify conditional rank
correlations, given as r and r . Hence, we have the following two possibilities to specify (conditional) rank correlations in
42 43 2

D4.
 r   r 
K =4, C4 =  3, 2  , I4 = 1   
43 42

 or  
 r 42 3   r 43 2 

Page 3
ATC-WAKE D3_6B APPENDIX, 20/02/2005

that the rank correlation r 32 is equal to zero. The sampling procedure for the variable X 4 , say x 4 is
shown in Figure C-6.

X3 F2 3
F1 23

r34 F2 3
(x ) 2
x3 r 24 3

X4 F4 3

0
x4 F4 3
(x )
3

u4

1
F4 3 F4 23

Figure C-6: Graphical representation of sampling value of x 4 in D4

Since X 2 and X 3 were already sampled then values of X 3 = x 3 and F2 3 (x 2 ) are known. We
conditionalize copulas with correlations r34 and r 24 3
on value of X 3 = x 3 and F2 3 (x 2 ) , respectively.
We calculate conditional cumulative distribution functions F4 3 and F4 23 (see Figure C-6). We sample
the value of the independent uniform variable U 4 , say u 4 invert it with respect to F4 23 and get value
of the quantile F4 3 which leads to x 4 . Hence, x 4 is sampled as follows:

x4 = 3
(
F4−13 : x F4−123 : x (u 4 ) .
2
) (6)

Now, we consider the fifth term of Equation 1.

(
e) P 5 4321 )
In this term, we have K = 5, the set of dependent variables is empty (C 5 =  ) and the rest of the
variables are underscored I =  4, 3, 2, 1 , that is, variable X 5 is independent of X 1 , X 2 , X 3 , X 4 .
5

We can then use the following ordering for D5 = D (5, 4, 3, 2, 1), which after incorporating all zero
correlations in the left most part of the vine simplifies to D (5). We are not required to specify any
(conditional) rank correlation. Value x5 of X 5 in D5 is found by simply sampling the value of the
independent uniform random variable U5 = u5

x5 = u5 . (7)

Similarly, we can get value x6 for the sixth term of the factorization.

(
f) P 6 54321 )

Page 4
ATC-WAKE D3_6B APPENDIX, 20/02/2005

We have K =6, C6 =  and I 6 = 5, 4, 3, 2, 1 , that is variable X 6 is independent of X 1 , X 2 , X 3 , X 4


, X 5 . Then, the ordering of D6 is the following D6 = D (6, 5, 4, 3, 2, 1), which simplifies to D (6). Hence

x6 = u6 . (8)

We present the seventh term of the factorization.

(
g) P 7 65 4321 )
r76 0
7 6 5

r75 6

Figure C-7: D7 for the BBN for the aircraft separation time with 9 variables

This part of the factorization has K =7, the set of dependent variables consist of two variables X 5
and X 6 then C 7 =  6, 5  and there are four underscored variables I 7 =  4, 3, 2, 1 . Hence, D7 = D
(7, 6, 5, 4, 3, 2, 1), the order of the variables stays the same (7, 6, 5, 4, 3, 2, 1) as for the previous
vines. So far, we have sampled variables X 1 , X 2 , X 3 , X 4 , X 5 and X 6 , so we only need to
incorporate the information about variable X 7 given in the left-most part of D7. Notice that, we have
reduced D7 as we did for D4 to D (7, 6, 5). We must assign rank correlation r76 to the edge that
connects variables X 7 and X 6 in D7 and equivalently to the corresponding arc in the BBN in Figure
C-1. We must also incorporate information about the conditional dependence of variables X 5 and
X 7 given variable X 6 in form of conditional correlation r75 3, hence r75 is assigned to the arc
6 6

between X 7 and X 5 in the BBN in Figure C-1. From previous factorizations we find that r65 is equal
to zero.

Now the sampling procedure can be represented graphically as

3
As we mentioned for D4, variables in D7 can be given in different order (7, 5, 6), if it is the case r75 and r76 5
are being
needed. Hence, we have the following possibilities to specify (conditional) rank correlations in D7:
 r76   r75 
C 7 =  6, 5  , I 7 = 4, 3, 2, 1    or  r 
 r75 6   76 5 

Page 5
ATC-WAKE D3_6B APPENDIX, 20/02/2005

X6 F5 6

r76 r75 6
x6
x5
X7
x7 F7 6
(x )6
0

u7

F7 F7 65
1 6

Figure C-8: Graphical representation of sampling value of x7 in D (7,6,5).

Figure C-8 shows the sampling value of x7 in D (7, 6, 5). It can be obtained in a way analogous to
obtaining value x 4 . We get

x7 = 6
(
F7−16 : x F7−165 : x (u7 ) .
5
) (9)

Now, we shall explain the case of the eighth part of the factorization.

(
h) P 8 7654321 )
In this term, K = 8, the set of dependent variables is empty, C 8 =  and I 8 = 7, 6, 5, 4, 3, 2, 1 , that
is variable X 8 is independent of X 1 , X 2 , X 3 , X 4 , X 5 , X 6 and X 7 . Hence we use the following
ordering for D8 = D (8, 7, 6, 5, 4, 3, 2, 1) which reduces to D (8). The sampling value of x8 is obtained
by just sampling the independent uniform variable U 8 , say u8

x8 = u8 . (10)

Finally, the ninth part of the factorization is shown.

(
i) P 9 874 65321 )
r98 0 0
9 8 7 4

r97 8 0

r94 78

Figure C-9: D9 for the BBN for the aircraft separation time with 9 variables

We can see in this term of the factorization that K =9, the set of dependent variables has three

Page 6
ATC-WAKE D3_6B APPENDIX, 20/02/2005

variables, C9 = 8, 7, 4  and the underscored variables are I 9 =  6, 5, 3, 2, 1 . Hence, the ordering
of the variables is given as D9 = D (9, 8, 7, 4, 6, 5, 3, 2, 1). Finally, following the same procedure as
above, D9 is reduced to a sub-vine on four variables, namely, D (9, 8, 7, 4). We are only interested
in the information about variable X 9 . We can assign a rank correlation r89 to the edge of D9 and
equivalently to the arc between variables X 8 and X 9 in BBN in Figure C-1. We also need to
incorporate the information about two conditional dependences r 97 8 and r 94 87 (we know values of
variables X 7 and X 8 from D7 from D8, respectively, see Equations 9 and 10)4.

Figure C-10 shows the sampling procedure to realize (conditional) correlations in D9.

X8 F7 8 F4 87

r97 8
r98 F7 8
(x )
7
r94 87
F4 87 (x 4 )
x8
F9 8
x9
X9
F9 8
(x )8 F9 87
(x )F
7
9 87

u9
F9 F9 874
1 F9 8 87

Figure C-10: Graphical representation of sampling value of x 9 in D9

Since X 4 , X 7 and X 8 were already sampled then values of X 8 = x 8 , F7 8


(x )7 and F4 87
(x ) are
4

known. We conditionalize copulas with correlations r 98 , r 97 8


and r 94 87
on values of X 8 = x 8 , F (x )
7 8 7

, and F4 87
(x ) , respectively. We calculate conditional cumulative distribution functions
4 F9 8 , F9 87 and
F9 874 (see Figure C-10). We sample the value of the independent uniform variable U 9 , say u9 invert
it with respect to F9 874 and get value of the quantile F9 87 which is used to get quantile F9 8 , which
leads to x9 .

The sampling procedure of x 9 in D9 yields,

4
Again, if we can change the order of the parents; we may have several possibilities to specify conditional rank correlations.
r  r  r  r 
  98
    98
 97 97

C =  8, 7, 4  , I = 6, 5, 3, 2, 1    r  ,  r  ,  r  ,  r  ,
9 9
97 8 94 8 98 7 94 7

       
 r   r   r   r 
94 87 97 84 94 78 98 74

r  r 
 94
  94

r 98 4  or  r 97 4 
   
 r
97 48   r 98 47 

Page 7
ATC-WAKE D3_6B APPENDIX, 20/02/2005

x9 = 8
( 7
(
F9−18 : x F9−17, 8 : x F9−14, 7, 8 : x (u 9 ) .
4
)) (11)

We conclude that the following rank correlations must be specified:

r 21 , r43 , r42 3 , r76 , r75 6 , r98 , r97 8 , r94 87


 (12)

We have specified eight (conditional) correlations for the BBN structure shown in Figure C-1 the same
as the number of arcs in the BBN. Conditional independence properties of the BBN were used to
simplify the sampling procedure in D -vines.

In principle, it is not necessary to draw D -vines to see which (conditional) correlations are necessary
for calculations. One can follow the algorithm presented below:

• Find sampling ordering. An ordering such that all ancestors of node i appear before i in the
ordering. A sampling ordering begins with a source node and ends with a sink node.
• Index the nodes according to the sampling order 1, …, n.
• Factorize the joint in the standard way (Equation 2) following the sampling order.
• Underscore those nodes in each condition, which are not parents of the conditioned variable
and thus are not necessary in sampling it.
The underscored nodes could be omitted thereby yielding the familiar factorization of the
BBN as a product of conditional probabilities, with each node conditionalized on its parents
(for source nodes the set of parents is empty).
• For each term i with parents (non-underscored variables) i1 ... i p (i ) , associate the arc i p (i )−k → i
with the conditional rank correlation

r (i , i p (i ) ); k = 0

( )
r i , i p (i )−k i p (i ), ..., i p (i )−k +1 ; 1  k  p(i ) − 1 (13)

where the assignment is vacuous if i 1


... i p (i )  =  . Assigning conditional rank correlations
for i = 1, …, n, every arc in the BBN is assigned a conditional rank correlation between parent
and child.

The rank correlation specification on regular vine plus copula determines the whole joint distribution
[Kurowicka D., Cooke R.M, 2005].

Page 8

You might also like