Professional Documents
Culture Documents
Forum 15fforum Halliwell Complex
Forum 15fforum Halliwell Complex
Abstract: Rarely have casualty actuaries needed, much less wanted, to work with complex numbers. One readily
could wisecrack about imaginary dollars and creative accounting. However, complex numbers are well
established in mathematics; they even provided the impetus for abstract algebra. Moreover, they are essential in
several scientific fields, most notably in electromagnetism and quantum mechanics, the two fields to which most
of the sparse material about complex random variables is tied. This paper will introduce complex random
variables to an actuarial audience, arguing that complex random variables will eventually prove useful in the field
of actuarial science. First, it will describe the two ways in which statistical work with complex numbers differs
from that with real numbers, viz., in transjugation versus transposition and in rank versus dimension. Next, it
will introduce the mean and the variance of the complex random vector, and derive the distribution function of
the standard complex normal random vector. Then it will derive the general distribution of the complex normal
multivariate and discuss the behavior and moments of complex lognormal variables, a limiting case of which is
iΘ
the unit-circle random variable W = e for real Θ uniformly distributed. Finally, it will suggest several
foreseeable actuarial applications of the preceding theory, especially its application to linear statistical modeling.
Though the paper will be algebraically intense, it will require little knowledge of complex-function theory. But
some of that theory, viz., Cauchy’s theorem and analytic continuation, will arise in an appendix on the complex
moment generating function of a normal random multivariate.
Keywords: Complex numbers, matrices, and random vectors; augmented variance; lognormal and unit-circle
distributions; determinism; Cauchy-Riemann; analytic continuation
______________________________________________________________________________
1. INTRODUCTION
Even though their education has touched on algebra and calculus with complex numbers, most
casualty actuaries would be hard-pressed to cite an actuarial use for numbers of the form x + iy .
Their use in the discrete Fourier transformation (Klugman [1998], §4.7.1) is notable; however, many
would view this as a trick or convenience, rather than as indicating any further usefulness. In this
paper we will develop a probability theory for complex random variables and vectors, arguing that
such a theory will eventually find actuarial uses. The development, lengthy and sometimes arduous,
will take the following steps. Sections 2-4 will base complex matrices in certain real-valued matrices
called “double-real.” This serves the aim of our presentation, namely, to analogize from real-valued
random variables and vectors to complex ones. Transposition and dimension in the real-valued
realm become transjugation and rank in the complex. These differences figure into the standard
quadratic form of Section 5, where also the distribution of the standard complex normal random
vector is derived. Section 6 will elaborate on the variance of a complex random vector, as well as
introduce “augmented variance,” i.e., the variance of dyad whose second part is the complex
conjugate of the first. Section 7 derives of the formula for the distribution of the general complex
normal multivariate. Of special interest to many casualty actuaries should be the treatment of the
complex lognormal random vector in Section 8, an intuition into whose behavior Section 9 provides
on a univariate or scalar level. Even further simplification in the next two sections leads to the unit-
circle random variable, which is the only random variable with widespread deterministic effects. In
Section 12 we adapt the linear statistical model to complex multivariates. Finally, Section 13 lists
foreseeable applications of complex random variables. However, we believe their greatest benefit
resides not in their concrete applications, but rather in their fostering abstractions of thought and
imagination. Three appendices delve into mathematical issues too complicated for the body of
paper. Those who work on an advanced level with lognormal random variables should read
random variables.
course, X and Y also must be m×n. Since only square matrices have inverses, our purpose here
where In is the n×n identity matrix. Because such an inverse must be unique, we may say that
First, define the conjugate of Z as Z = X − iY . Since the conjugate of a product equals the product
−1
Therefore, Z too is non-singular, and Z = Z -1 . Moreover, if Z is non-singular, so too are i n Z
− X + iY , and − Y − iX is true for all eight or true for none. Invertibility is no respecter of the real
I n = (X + iY )(A + iB)
= XA + iYA + iXB + i 2 YB
= XA + iYA + iXB − YB
= (XA − YB) + i (YA + XB)
X − Y A I n
Y =
X B 0
X − Y − B 0
=
Y
X A I n
X − Y A − B I n 0
= = I 2n
Y
X B A 0 I n
X − Y A − B I n 0
Therefore, ZW = I n if and only if = . By a similar expansion of the
Y X B A 0 I n
A − B X − Y I n 0
last equality above, WZ = I n if and only if = . Hence, we conclude
B A Y X 0 I n
that the n×n complex matrix Z = X + iY is non-singular, or has an inverse, if and only if the 2n×2n
−1
X − Y X − Y
real-valued matrix is non-singular. Moreover, if exists, it will have the form
Y X Y
X
A − B
B and Z −1 will equal A + iB .
A
2n×2n real-valued matrix suggests that with complex numbers one somehow gets “two for the price
of one.” It even hints of a relation between the general m×n complex matrix Z = X + iY and the
X − Y X − Y
2m×2n complex matrix . If X and Y are m×n real matrices, we will call a
Y X Y X
double-real matrix. The matrix is double in two senses; first, in that it involves two (same-sized and
real-valued) matrices X and Y, and second, in that its right half is redundant, or reproducible from
its left.
X − Y1 X 2 − Y2
X 1 + iY1 + X 2 + iY2 ⇔ 1 +
Y1 X 1 Y2 X 2
And if Z1 is m×n and Z2 is n×p, so that the matrices are conformable to multiplication, then
double-real multiplication:
X 1 − Y1 X 2 − Y2 X 1 X 2 − Y1 Y2 − X 1 Y2 − Y1 X 2
Y =
1 X 1 Y2 X 2 X 1 Y2 + Y1 X 2 X 1 X 2 − Y1 Y2
Rather trivial is the analogy between the m×n complex zero matrix and the 2m×2n double-real zero
0 − 0
matrix m×n , as well as that between the n×n complex identity matrix and the 2n×2n double-
0 0
I −0
real identity matrix n .
0 I n
The general 2m×2n double-real matrix may itself be decomposed into quasi-real and quasi-imaginary
X − Y X 0 0 − Y
parts: = + . And in the case of square matrices ( m = n ) this extends
Y X 0 X Y 0
X − Y X 0 0 − I n Y 0
to the form = + , wherein the double-real matrix
Y
X 0 X I n 0 0 Y
0 − In
I is analogous with the imaginary unit, inasmuch as:
n 0
− In
2
0 0 − I n 0 − I n − I n 0 I 0
I = = = (− 1) n
n 0 I n 0 I n 0 0
− In 0 I n
Finally, one of the most important theorems of linear algebra is that every m×n complex matrix
34). In symbols, for every Z there exist non-singular matrices U and V such that:
I 0
U m×m Z m×n Vn×n = r
0 0 m×n
The m×n real matrix on the right side of the equation consists entirely of zeroes except for r
instances of one along its main diagonal. Since invertible matrix operations can reposition the ones,
it is further stipulated that the ones appear as a block in the upper-left corner. Although many
reductions of Z to canonical form exist, the canonical forms themselves must all contain the same
number of ones, r, which is defined as the rank of Z. Providing the matrices with real and complex
parts, we have:
I r 0
0 m×n
P − Q X − Y R − S A − B 0 0
Q = =
P Y X S R B A I r 0
0 m×n 0 0
P − Q
As shown in the previous section, is non-singular, or invertible, if and only if
Q P
R − S
U = P + i Q is non-singular; the same is true for . Therefore, the rank of the double-real
S R
analogue of a complex matrix is twice the rank of the complex matrix. Moreover, the 2r instances of
one correspond to r quasi-real and r quasi-imaginary instances. It is not possible for the
contribution to the rank of a matrix to be real without its being imaginary, and vice versa.
To conclude this section, there are extensive analogies between complex and double-real matrices,
analogies so extensive that one who lacked either the confidence or the software to work with
the jth element of x with the kth element. Since the covariance of two real-valued random variables is
symmetric, Σ must be a symmetric matrix. But a realistic Σ must have one other property, viz., non-
negative definiteness (NND). This means that for every real-valued n×1 vector ξ, ξ ′Σξ ≥ 0 . 3 This
must be true, because ξ ′Σξ is the variance of the real-valued random variable ξ ′x :
′ ′ ′
Var [ξ ′x] = E (ξ ′x − ξ ′µ )(ξ ′x − ξ ′µ ) = E ξ ′(x − µ )(x − µ ) ξ = ξ ′E (x − µ )(x − µ ) ξ = ξ ′Σξ
Although variances of real-valued random variables may be zero, they must not be negative. Now if
ξ ′Σξ > 0 for all ξ ≠ 0 n×1 , the variance Σ is said to be positive-definite (PD). Every invertible NND
matrix must be PD. Moreover, every NND matrix may be expressed as the product of some real
matrix and its transpose, the most common method for doing this being the Cholesky
x − y
2 The representation of the complex scalar z = x + iy as the real 2×2 matrix is a common theme in
y x
modern algebra (e.g., section 7.2 of the Wikipedia article “Complex number”). We have merely extended the
representation to complex matrices. Our representation is even more meaningful when expressed in the Kronecker-
X − Y X 0 0 − Y 1 0 0 − 1
product form Y = + = ⊗X+ ⊗ Y. Due to certain properties
X 0 X Y 0 0 1 1 0
of the Kronecker product (cf. Judge [1988], Appendix A.15), all the analogies of this section would hold even in the
1 0 0 − 1
commuted form X⊗ +Y⊗ . In practical terms this means that it matters not whether the form
0 1 1 0
is 2×2 of m×n or m×n of 2×2.
3 []
More accurately, ξ ′Σξ ≥ 0 , since the quadratic form ξ′Σξ is a 1×1 matrix. The relevant point is that 1×1 real-
valued matrices are as orderable as their real-valued elements are. Appendices A.13 and A.14 of Judge [1988] provide
introductions to quadratic forms and definiteness that are sufficient to prove the theorems used herein.
decomposition (Healy [1986], §7.2). Accordingly, if Σ is NND, then A ′ΣA ≥ 0 for any conformable
real-valued matrix A. Finally, if Σ is PD and real-valued n×r matrix A is of full column rank, i.e.,
X − Y
In the remainder of this section we will show how the analogy between X + i Y and
Y X
leads to a proper definition of the variance of a complex random vector. We start by considering
X − Y
Y as a real-valued variance matrix. In order to be so, first it must be symmetric:
X
′
X − Y X ′ Y ′ X − Y
Y = =
X − Y ′ X ′ Y X
X − Y
Hence, is symmetric if and only if X ′ = X and Y ′ = −Y . In words, X is symmetric and
Y X
Y is skew-symmetric. Clearly, the main diagonal of a skew-symmetric matrix must be zero. But of
′
a ′Yb = (a ′Yb )1×1 = (a ′Yb ) = b ′Y ′a = b ′(− Y )a = −b ′Ya
Consequently, if b = a :
Next, considering the specifications on X, Y, a, and b, we evaluate the 2×2 quadratic form:
′
a − b X − Y a − b a ′ b′ X − Y a − b
b =
a Y X b a − b′ a ′ Y X b a
a ′X + b′Y b′X − a ′Y a − b
=
− b′X + a ′Y a ′X + b′Y b a
a ′Xa + b′Ya − a ′Yb + b′Xb − a ′Xb − b′Yb + b′Xa − a ′Ya
=
− b′Xa + a ′Ya + a ′Xb + b′Yb a ′Xa + b′Ya − a ′Yb + b′Xb
a ′Xa + b′Ya − a ′Yb + b′Xb − a ′Xb − 0 + b′Xa − 0
=
− b′Xa + 0 + a ′Xb + 0 a ′Xa + b′Ya − a ′Yb + b′Xb
a ′Xa + b′Ya − a ′Yb + b′Xb − a ′Xb + b′Xa
=
− b′Xa + a ′Xb a ′Xa + b′Ya − a ′Yb + b′Xb
a ′Xa + b′Ya − a ′Yb + b′Xb 0
=
0 a ′Xa + b′Ya − a ′Yb + b′Xb
a ′ X −Y a
0
b Y X b
=
′
a X − Y a
0
b Y X b
′ ′
a − b X − Y a − b a X − Y a
Therefore, is PD [or NND] if and only if is PD
b a Y X b a b Y X b
[or NND].
a − b
Now the double-real 2n×2 matrix is analogous with the n×1 complex vector a + ib . Its
b a
′
a − b a ′ b ′
transpose = − b ′ a ′ is analogous with the 1×n complex vector a ′ − ib ′ . Moreover,
b a
′ ′ ′
a ′ − ib ′ = (a − ib ) = a + ib = (a + ib ) = (a + ib ) , where ‘*’ is the combined operation of
*
X − Y
transposition and conjugation (order irrelevant). 4 And is analogous with the n×n
Y X
complex matrix X + i Y . Accordingly, the complex analogue of the double-real quadratic form
′
a − b X − Y a − b X − Y
b is (a + ib )* (X + iY )(a + ib ) . Moreover, since is
a Y X b a Y X
the variance matrix of some complex random variable z n×1 = x + iy if and only if Γ is Hermetian
X − Y
and is non-negative-definite. 5
Y X
′ ′
a X − Y a a X − Y a
Because (a + ib ) (X + iY )(a + ib ) = +i⋅0 = , the definiteness
*
Y Y X b
b X b b
X − Y
of Γ = X + iY is the same as the definiteness of . Therefore, a matrix qualifies as the
Y X
variance matrix of some complex random vector if and only if it is Hermetian and NND. Just as the
variance matrix of a real-valued random vector factors as Σ = A ′A n×n for some real-valued A, so too
the variance matrix of a complex random vector factors as Γ = A * A n×n for some complex A.
Likewise, every invertible Hermetian NND matrix must be PD. Due to the skew symmetry of their
4 The transposed conjugate is sometimes called the “transjugate,” which in linear algebra is commonly symbolized with
the asterisk. Physicists prefer the “dagger” notation A†, though physicist Hermann Weyl [1950, p. 17] called it the
~
“Hermetian conjugate” and symbolized it as A.
X − Y
5 It is superfluous to add ‘symmetric’ here. For Γ = X + iY is Hermetian if and only if Y is symmetric.
X
complex parts, the main diagonals of Hermetian matrices must be real-valued. If the matrices are
NND [or PD], all the elements of their main diagonals must be non-negative [or positive].
Let Γ represent the variance of the complex random vector z. Its jkth element represents the
covariance of the jth element of z with the kth element. Since Γ is Hermetian,
(
Γ = Var [z ] = E (z − µ ) z − µ )′ = E[(z − µ )(z − µ ) ]
*
The complex formula is like the real formula except that the second factor in the expectation is
[
Γ * = E (z − µ )(z − µ )
* *
] = E [{(z − µ )(z − µ ) } ] = E[(z − µ )(z − µ ) ] = Γ
* * *
It also renders Γ NND. For since (z − µ )(z − µ ) is NND, its expectation over the probability
*
viz., (z − µ ) Γ −1 (z − µ ) , where Γ = Var [z ] . The expectation of this quadratic form equals n, the
*
rank of the variance. The following proof uses the trace function. The trace of a matrix is the sum
of its main-diagonal elements, and if A and B are conformable tr ( AB ) = tr (BA) . Moreover, the
[ ] [( )]
E (z − µ ) Γ −1 (z − µ ) = E tr (z − µ ) Γ −1 (z − µ )
* *
= E [tr (Γ (z − µ )(z − µ ) )]
−1 *
= tr (E [Γ (z − µ )(z − µ ) ])
−1 *
= tr (Γ E [(z − µ )(z − µ ) ])
−1 *
(
= tr Γ −1Γ )
= tr (I n )
=n
The analogies above between complex and double-real matrices might suggest the result to be 2n.
[ ]
However, for real-valued random variables E (x − µ ) Σ −1 (x − µ ) = n , and the complex case is a
*
superset of the real. So by extension, the complex case must be the same.
But an insight is available into why the value is n, rather than 2n. Let x and y be n×1 real-valued
random vectors. Assume their means to be zero, and their variances to be identity matrices (so zero
covariance):
x 0 n×1 I n 0
y ~ μ = 0 , Σ = 0
n×1 I n
−1
n x2 y 2j
[x′ y ′]Σ −1 = [x′ y ′] n
x I 0 x x
= [x y ] = x x + y y = ∑
′ ′ ′ ′ +
j
y 0 I n y
y j =1
1 1
x n x 2j y 2j n E x 2j E y 2j
E [x′ y ′]Σ −1 = E ∑ = ∑
[ ] [ ] = n
1 1
1
+
j =1 1
+
∑ 1 + 1 = 2n
y
j =1 1 1 j =1
Now let z be the n×1 complex random vector x + iy . Since E [x + iy ] = 0 n×1 , the variance of z is:
Γ = Var [z ]
= Var [x + iy ]
x
= Var [I n iI n ]
y
x x
*
= E [I n iI n ] [I n iI n ]
y y
x x
*
= E [I n iI n ] [I n iI n ]
*
y y
x x * I n
= [I n iI n ]E
y y − iI n
x I
= [I n iI n ]Var n
y − iI n
I 0 I n
= [I n iI n ] n
0 I n − iI n
I
= [I n iI n ] n
− iI n
= I n −i 2 I n
= 2I n
n ( x − iy )( x + iy ) n x2 + y2
z *z n z z
x
z Γ z = z (2I n ) = [x ′ y ′]Σ −1
1
=∑ =∑ =∑
−1 −1
z=
* * j j j j j j j j
2 j =1 2 j =1 2 j =1 1+1 2 y
The complex form is half the real-valued form; hence, its expectation equals n. The condensation of
the 2n real dimensions into n complex ones inverts the order of operations:
n x 2j y 2j n x2 + y2
∑ + ⇒ ∑
j j
j =1 1 1 j =1 1+1
Within the sigma operator, the sum of two quotients becomes the quotient of two sums. A proof
At this point we can derive the standard complex normal distribution. The normal distribution is
−
( x − µ )2
f X (x ) =
1
e 2σ 2
. The standard complex normal random variable is formed from two
2πσ 2
independent real normal variables whose means equal zero and whose variances equal one half:
( x − 0 )2 ( y − 0 )2
− − 1 − x 2 1 − y 2 1 − (x 2 + y 2 ) 1 − z z
f Z (z ) =
1 1
e 2(1 2 ) e 2(1 2 ) = e e = e = e
2π (1 2 ) 2π (1 2 ) π π π π
π n
(
Γ = Var [z ] = E (z − µ ) z − µ )′ = E[(z − µ )(z − µ ) ]
*
The naïve formula differs from this by one critical symbol (prime versus asterisk):
′
C = E (z − µ )(z − µ )
6Cf. Appendix C for eigen-decomposition and diagonalization. We believe the insight about commuting sums and
quotients to be valuable as an abstraction. But of course, a vector of n independent complex random variables of unit
2n
1
variance translates into a vector of 2n independent real random variables of half-unit variance, and ∑2 =n.
j =1
Because
of the half-unit real variance, the formula in the next paragraph for the standard complex normal distribution, lacking
any factors of two, is simpler than the formula for the standard real normal distribution.
This naïveté leads many to conclude that Var [iz ] = i 2Var [z ] = −Var [z ] , whereas it is actually: 7
[ *
] [ *
] [ *
]
Var [iz ] = E i (z − µ ){i (z − µ )} = E i (z − µ )i (z − µ ) = iiE (z − µ )(z − µ ) = i (− i )Var [z ] = Var [z ]
Nevertheless, there is a role for the naïve formula, which reduces to:
′ ′
[
C = E (z − µ )(z − µ ) = E (z − µ )(z − µ ) = E (z − µ ) z − µ
( ) ] = Cov[z, z ]
*
Veeravalli [2006], whose notation we follow, calls C the “relation matrix.” The Wikipedia article
“Complex normal distribution” calls it the “pseudocovariance matrix.” Because of the naïveté that
leads many to a false conclusion, we prefer the ‘pseudo’ terminology (better, “pseudovariance”) to
something as bland as “relation matrix.” However, a useful and non-pejorative concept is what we
The augmented variance is the variance of the complex random vector z augmented with its
z z E [z ] µ
conjugate z , i.e., the 2n×1 vector . Its expectation is E = = . And its variance is
z z E [z ] µ
In two ways this matrix is redundant. First, Cov[z , z ] = E [zz ′] = E [zz ′] = Cov[z, z ] ; equivalently,
z Cov[z, z ] Cov[z, z ] Γ C
Var = =
z Cov[z , z ] Cov[z , z ] C Γ
As with any valid variance matrix, the augmented variance must be Hermetian. Hence,
′ ′
Γ C Γ C Γ C Γ C Γ*
*
C′
C Γ = C Γ = C Γ = = * , from which follow Γ = Γ and C′ = C .
*
C Γ C Γ′
Moreover, it must be at least NND, if not PD. It is important to note from this that the
pseudovariance is an essential part of the augmented z; it is possible for two random variables to
have the same variance and to covary differently with their conjugates. How a complex random
vector covaries with its conjugate is useful information; it is even a parameter of the general complex
x μ x Σ xx Σ xy
μ = Σ = Σ
y Σ yy
~ N ,
× ×
μ y
2 n 1 2 n 2 n
yx
According to this variance structure, the real and imaginary parts of z may covary, as long as the
covariance is symmetric: Σ yx = Σ′xy . The grand Σ matrix must be symmetric and PD. The
x −μ x x −μ x
−
1
[
x ′ −μ ′x ]
y′ −μ ′y Σ −1 −
1
[
x ′ −μ ′x ]
y′ −μ ′y Σ −1
f x (x , y ) =
1 y −μ y 1 y −μ y
=
2 2
e e
y (2π )2 n Σ π n 2 2n Σ
8 As derived briefly by Judge [1988, pp 49f]. Chapter 4 of Johnson [1992] is thorough. To be precise, Σ under the
radical should be Σ , the absolute value of the determinant of Σ. However, the determinant of a PD matrix must be
positive (cf. Judge [1988, A.14(1)]).
z x + iy I n iI n x x
Since z = x + iy , the augmented vector is = = = Ξ n . We will call
z x − iy I n − iI n y y
I iI n
Ξn = n the augmentation matrix; this linear function of the real-valued vectors produces
I n − iI n
I iI n I n I n 2I n 0
Ξ n Ξ *n = n = = 2I 2 n
I n − iI n − iI n iI n 0 2I n
z μ x μ x + iμ y μ
The augmented mean is E = Ξ n = = . The augmented variance is:
z μ y μ x − iμ y μ
z
Var = Ξ n ΣΞ *n
z
I iI n Σ xx Σ xy I n I n
= n
I n − iI n Σ yx Σ yy − iI n iI n
Σ xx + iΣ yx Σ xy + iΣ yy I n I n
=
Σ xx − iΣ yx Σ xy − iΣ yy − iI n iI n
Σ xx + Σ yy − i (Σ xy − Σ yx ) Σ xx − Σ yy + i (Σ xy + Σ yx )
=
Σ xx − Σ yy − i (Σ xy + Σ yx ) Σ xx + Σ yy + i (Σ xy − Σ yx )
Γ C
=
C Γ
And so:
−1
z Γ C
−1
Var = (
= Ξ n ΣΞ *n )
−1
( )
= Ξ *n
−1
Σ −1 (Ξ n )
−1
z C Γ
−1
z −1 Γ C −1
This can be reformulated as Ξ Var Ξ n = Ξ *n
*
Ξn = Σ .
Γ
n
z C
We now work these augmented forms into the probability density function:
x −μ x
−
1
[
x ′ −μ ′x ]
y′ −μ ′y Σ −1
f x (x , y ) =
1 2 y −μ y
e
y
π n 2 2n Σ
z x −μ x
1 −
1
[
x ′ −μ ′x ]
y′ −μ ′y Ξ *nVar −1 Ξ n
z y −μ y
=
2
e
π n 2I 2 n Σ
*
1 x −μ x −1 z
x −μ x
− Ξn Var Ξ n y μ
1 2 y −μ y z − y
= e
π n Ξ n Ξ *n Σ
1 −
1
[ ] z z −μ
z′ −μ ′ z′ −μ ′ Var −1
z z −μ
= e 2
π n Ξ n ΣΞ *n
1 −
1
[ ] z z −μ
z′ −μ ′ z′ −μ ′ Var −1
z z −μ
= e 2
z
π n Var
z
However, this is not quite the density function of z, since the differential volume has not been
considered. The correct formula is f z (z )dVz = f x (x , y )dVxy . The differential volume in the xy
y
n
coordinates is dVxy = ∏ dx dy
j =1
j j . A change of dxj entails an equal change in the real part of dzj,
even as a change of dyj entails an equal change in the imaginary part of dzj. Accordingly,
∏ dx j (i ⋅ dy j ) = i n ∏ dx j dy j = i n
n n n n
dVz =
j =1 j =1
∏ dx j dy j = 1 ⋅ ∏ dx j dy j = dVxy . It so happens that
j =1 j =1
So finally, the probability density function of the complex random vector z is:
9 This will be abstruse to some actuaries. However, the integration theory is implicit in the change-of-variables
technique outlined in Hogg [1984, pp 42-46]. That the n×n determinant represents “the volume function of an n-
dimensional parallelepiped” is beautifully explained in Chapter 4 of Schneider [1973].
dVz
f z (z ) = f z (z ) ⋅ 1 = f z (z ) = f x (x , y )
dVxy
y
1 −
1
[ ] z z −μ
z′ −μ ′ z′ −μ ′ Var −1
z z −μ
= e 2
z
π n Var
z
−1
1 −
1
[ ] Γ C z −μ
z′ − μ ′ z′ − μ ′
C Γ z −μ
= e 2
Γ C
πn
C Γ
This formula is equivalent to the one found in the Wikipedia article “Complex normal distribution.”
Although Γ Γ − CΓ −1C appears within the radical of that article’s formula, it can be shown that
Γ C
= Γ Γ − CΓ −1C . As far as allowable parameters are concerned, µ may be any complex
C Γ
Γ C Γ C
vector. is allowed if and only if Ξ *n Ξ n = 4Σ is real-valued and PD.
C Γ C Γ
Veeravalli [2006] defines a “proper” complex variable as one whose pseudo[co]variance matrix is
0n×n. Inserting zero for C into the formula, we derive the probability density function of a proper
−1
−
1
[ ]
Γ C z −μ
z′ − μ ′ z′ − μ ′
f z (z ) =
1 2 C Γ z −μ
e
Γ C
πn
C Γ
−1
1 −
1
[
z′ − μ ′ z′ − μ ′ ]
Γ 0 z −μ
0 Γ z −μ
= e 2
Γ 0
πn
0 Γ
1
1
( ) −1
− z′ −μ ′ Γ −1 ( z −μ )+ ( z′ −μ ′ )Γ z −μ ( )
= e 2
πn Γ Γ
1 −
1
{( ) ( )
z′ −μ ′ Γ −1 ( z −μ )+ z′ −μ ′ Γ −1 ( z′ −μ ′ ) }
= e 2
πn Γ Γ
1 −
1
{( ) ( )
z′ −μ ′ Γ −1 ( z −μ )+ z′ −μ ′ Γ −1 ( z −μ ) }
= e 2
πn Γ
2
e −(z′−μ′ )Γ
1 −1
( z −μ )
=
π Γ
n
1
e − ( z −μ ) Γ ( z −μ )
* −1
=
π Γ
n
The transformations in the last several lines rely on the fact that Γ is Hermetian and PD. Now the
standard complex random vector is a proper complex random vector with mean zero and variance
1
In. Therefore, in confirmation of Section 5, its density function is e −z z .
*
πn
random vector: w n×1 = e z n×1 . Its conjugate also is lognormal, since w = e z = e z . Deriving the
Specifically, e z = w = e z +i (2πk ) for any integer k. So unlike the real-valued lognormal random
variable, whose density function can be found in Klugman [1998, §A.4.11], an analytic form for the
complex lognormal density is not available. However, even for the real-valued lognormal the
density function is of little value; its moments are commonly derived from the moment generating
function of the normal variable on which it is based. So too, the moment generating function of the
complex normal random vector is available for deriving the lognormal moments.
We hereby define the moment generating function of the complex n×1 random vector z as
[ ]
M z (s n×1 , t n×1 ) = E e s′z + t′z . Since this definition may differ from other definitions in the sparse
literature, we should justify it. First, because we will take derivatives of this function with respect to
s and t, the function must be differentiable. This demands simple transposition in the linear
combination, i.e., s′z + t ′z rather than the transjugation s*z + t * z . For transjugation would involve
ds
derivatives of the form , which do not exist, as they violate the Cauchy-Riemann condition. 10
ds
Second, even though moments of z are conjugates of moments of z, we will need second-order
moments involving both z and z . For this reason both terms must be in the exponent of the
10 Cf. Appendix D.1.3 of Havil [2003]. Express f ( z = x + iy ) in terms of real-valued functions, i.e., as
∂u ∂x ∂v ∂x
u ( x, y ) + i ⋅ v( x, y ) . The derivative is based on the matrix of real-valued partial derivatives .
∂u ∂y ∂v ∂y
For the derivative to be the same in both directions, the Cauchy-Riemann condition must hold, viz., that
1 0
∂u ∂x = ∂v ∂y and ∂v ∂x = − ∂u ∂y . But for f ( z ) = z = x − iy , the partial-derivative matrix is ;
0 −1
hence ∂u ∂x ≠ ∂v ∂y . The Cauchy-Riemann condition becomes intuitive when one regards a valid complex
1 0
derivative as a double-real 2×2 matrix (Section 3). Compare this with f ( z ) = z = x + iy , whose matrix is ,
0 1
which represents the complex number 1.
z x + iy I n iI n x x
We start with terminology from Section 7, viz., that = = = Ξ n and
z x − iy I n − iI n y y
x μ x Σ xx Σ xy
that ~ N μ 2 n×1 = , Σ 2 n×2 n = . According to Appendix A, the moment
y
μ y Σ yx Σ yy
x
generating function of the real-valued normal random vector is:
y
μ x 1 Σ xx Σ xy a
a [a ′ x
b′ ] [a ′ b′ ] + [a ′ b′ ]
μ
Σ yy b
Σ yx
M x = E e y =e y 2
b
y
Consequently:
[
M z (s, t ) = E e s′z + t′z ]
[s′ t′ ] I n iI n x
I n − iI n y
= E e
[s′+ t′ i (s′− t′ )]
x
= E e y
s + t
= M x
i (s − t )
y
[(s+ t )′ ] [
μ x 1
i (s − t )′ + (s + t )′
μ y 2
] Σ xx
i (s − t )′
Σ yx
Σ xy s + t
Σ yy i (s − t )
=e
It is so that we could invoke it here that Appendix B went to the trouble of proving that complex
′ μ x
(s + t )′ i (s − t ) = (s + t ) μ x + i (s − t ) μ y = s ′(μ x + iμ y ) + t ′(μ x − iμ y ) = s ′μ z + t ′μ z
′ ′
μ y
z Γ C Σ xx Σ xy *
And second, again from Section 7, Var = = Ξ n Σ 2 n×2 n Ξ *n = Ξ n Ξ , or
z C Γ Σ yx Σ yy n
Ξ *n Γ C Ξ n Σ xx Σ xy
equivalently, C Γ 2 = Σ Σ yy
. Hence:
2 yx
(s + t )′ ′ Σ xx Σ xy s + t ′ ′ Ξ Γ C Ξ n s + t
*
i (s − t ) = (s + t ) i (s − t ) n C Γ 2 i (s − t )
Σ yx
Σ yy i (s − t ) 2
Ξn s + t 1 I n iI n s + t 1 (s + t ) − (s − t ) t
On the right side, i (s − t ) = 2 I = =
− iI n i (s − t ) 2 (s + t ) + (s − t ) s
. And on the
2 n
′ ′ Ξ ′ ′ I In
*
left side, (s + t ) i (s − t ) n = (s + t ) i (s − t ) n = [s ′ t ′] . So:
1
2 2 − iI n iI n
(s + t )′ ′ Σ xx Σ xy s + t Γ C t
i (s − t ) = [s ′ t ′]
Σ yx
Σ yy i (s − t ) C Γ s
And the simplified expression, based on the mean and the augmented variance of z, is:
[
M z (s, t ) = E e s′z + t ′z ]
Γ C t
s′μ z + t ′μ z +
1
[s′ t ′ ]
C Γ s
=e 2
s′μ + t ′μ +
1
(s′Γt + s′Cs + t ′Ct + t ′Γs )
=e 2
As in Appendix A, let e j denote the jth unit vector of ℜ n , or even better, of C n . Then:
[ ] = M (e , 0) = e
1
zj μ j + C jj
2
Ee z j
E e = E [e ] = M (0, e ) = e [ ]
1
μ j + C jj
=Ee
zj zj 2 zj
z j
Moreover:
[ ] [ ]= M (e + e k , 0) = e
( )
1
z j +zk μ j +μ k + C jj + C jk + C kj + C kk
E e j ezk = E e
z 2
z j
[ ]= e ( )
[ ]E[e ]⋅ e
1
μ j +μ k +
1
(C jj + C jk + C kj + C kk ) 1 1
μ j + C jj + μ k + C kk + C jk + C jk
=e =Ee
zj zk 2 2 2 2 zj zk C jk
Ee e
Hence, mindful of the transjugation (*) in the definition of complex covariance, we have:
[ ]
z
z
z
[ ] z z
[
Cov e j , e z k = E e j e z k − E e j E e z k = E e j e z k − E e j E e z k = E e j E e z k ⋅ e jk − 1
z C
] [ ] [ ] [ ] [ ]( )
z z
[
Cov e j , e z k = Cov e j , e z k = E e j E e z k ⋅ e jk − 1
z C
] [ ]( )
This translates as Cov[w , w ] = E [w ]E [w ]′ e C − 1n×n .
( )
[ ]= e ( )
[ ]E[e ]⋅ e
1
μ j +μk +
1
(
Γ jk + C jj + Ckk + Γkj ) 1 1
μ j + C jj + μ k + Ckk + Γ jk + Γ jk Γ jk
=e =Ee
zj zk 2 2 2 2 zj zk
Ee e
Therefore, [ ] [
Cov e j , e z k = E e j e z k − E e
z z
] [ ]E[e ]= E[e ]E[e ]⋅ (e
zj zk zj zk Γ jk
−1 ,) which translates as
′
Cov[w, w ] = E [w ]E [w ] e Γ − 1n×n .
( ) By conjugation, Cov e j , e z k = E e j E e z k ⋅ e jk − 1 ,
z
z
Γ
[ ]( )
which translates as Cov[w , w ] = E [w ]E [w ]′ e Γ − 1n×n .
( )
11Elementwise multiplication is formally known as the Hadamard, or Hadamard-Schur, product, of which we will make
use in Appendices A and C. Cf. Million [2007].
We conclude this section by expressing it all in terms of w n×1 = e z n×1 . Let z be complex normal with
z Γ C
mean µ and augmented variance Var = . And let D be the n×1 vector consisting of the
z C Γ
main diagonal of C. Then D is the vectorization of the diagonal of C . So the augmented mean of
w eμ + D 2
w is E = μ + D 2 . And the augmented variance of w is:
w e
w Cov[w, w ] Cov[w, w ]
Var =
w Cov[w , w ] Cov[w , w ]
(
E [w ]E [w ]′ e Γ − 1
n× n ) (
E [w ]E [w ]′ e C − 1
n× n )
=
′
(
E [w ]E [w ] e − 1n×n
C
)
(
E [w ]E [w ]′ e Γ − 1
n× n )
E [w ]E [w ]′ ′
E [w ]E [w ] C
Γ C
− 12 n×2 n
= ′ e
Γ
E [w ]E [w ]′ E [w ]E [w ]
w z
′ Var z
= E E [w w] e − 12 n×2 n
w
w w * Var zz
= E E e − 12 n×2 n
w w
Scaling all the lognormal means to unity (or setting μ = − D 2 ), we can say that the coefficient-of-
z
Var
lognormal-augmented-variation matrix equals e z
− 12 n×2 n , which is analogous with the well-
this section provides some intuition into it. The complex lognormal random variable, or scalar,
X 0 σ 2 ρστ
derives from the real-valued normal bivariate ~ N , . Zero is not much of a
Y 0 ρστ τ 2
restriction; since eCN (μ, V ) = eμ +CN (0, V ) = eμ eCN (0, V ) , the normal mean affects only the scale of the
lognormal. The variance is written in correlation form, where − 1 ≤ ρ ≤ 1 . As usual, 0 < σ, τ < ∞ .
Z X 1 i X X + iY
Define = Ξ1 = = . Its mean is zero, and according to Section 7 its
Z Y 1 − i Y X − iY
Z 1 i σ 2 ρστ 1 1
Var = 2
Z 1 − i ρστ τ − i i
σ 2 + τ 2 − i (ρστ − ρστ ) σ 2 − τ 2 + i (ρστ + ρστ )
= 2
σ − τ 2 − i (ρστ + ρστ ) σ 2 + τ 2 + i (ρστ − ρστ )
σ 2 + τ2 σ 2 − τ 2 + 2iρστ
= 2
σ − τ 2 − 2iρστ σ 2 + τ 2
We will say little about non-zero correlation ( ρ ≠ 0 ); but at this point a digression on complex
correlation is apt. The coefficient of correlation between Z and its conjugate is:
σ 2 − τ 2 + 2iρστ σ 2 − τ 2 − 2iρστ
ρ ZZ = = = ρ ZZ
σ2 + τ2 σ2 + τ2
0 ≤ ρ ZZ ρ ZZ = ρ ZZ ρ Z Z =
(σ 2
) 2
− τ 2 + 4ρ 2 σ 2 τ 2
≤
(σ 2
)
− τ 2 + 4(1)σ 2 τ 2
2
=
(σ 2
+ τ2 )
2
=1
(σ 2
+ τ2 )
2
(σ 2
+ τ2 )
2
(σ 2
+ τ2 )
2
So, the magnitude of complex correlation is not greater than unity. The imaginary part of the
correlation is zero unless some correlation exists between the real and imaginary parts of the
first case, Z → Z in a statistical sense, and the correlation approaches one. In the second case,
W W W Var Z
* Z
Var = E E e − 12×2
W W W
e (σ 2 − τ 2 ) 2 ⋅ e iρστ 2 2 σ σ 2 − τ 2 + i ⋅2 ρστ
[ ]
2
+τ2
= (σ 2 − τ 2 ) 2 −iρστ e (σ − τ ) 2 ⋅ e −iρστ e (σ
2
) ⋅ e iρστ e σ
−τ2 2
2
− τ 2 −i ⋅2 ρστ σ 2 + τ 2
− 12×2
e ⋅e
1 e 2iρστ e σ + τ − 1 e σ − τ + 2iρστ − 1
2 2 2 2
σ2 −τ2
=e − 2iρστ σ − τ − 2 iρστ
− 1 e σ +τ − 1
2 2 2 2
e 1 e
e σ + τ − 1 eσ − e 2iρστ
− τ 2 + 4 iρστ
2 2 2
σ2 −τ2
=e σ 2 − τ 2 − 4iρστ
− e − 2iρστ eσ +τ2
−1
2
e
e 2σ − e σ − τ e 2σ − 2 τ + 4iρστ − e σ −τ2
⋅ e 2iρστ
2 2 2 2 2 2
= 2σ 2 − 2 τ 2 − 4iρστ
− e σ − τ ⋅ e − 2iρστ e 2σ − e σ − τ
2 2 2 2 2
e
W 1 W 2 2 1 1
In the first case above, as τ 2 → 0 + , E → e σ 2 and Var → e σ e σ − 1
2
. Since ( )
W 1 W 1 1
the complex part Y becomes probability-limited to its mean of zero, the complex lognormal
degenerates to the real-valued W = e X . The limiting result is oblivious to the underlying correlation
ρ, since W → W .
W 1 2 e − 1 e −τ − 1
τ
W
2 2
. As
2
τ2
W 1 W e − 1 e − 1
in the first case, both E [W ] = E [W ] and the underlying correlation ρ has disappeared. Nevertheless,
the variance shows W and its conjugate to differ; in fact, their correlation is the real-valued
( 2
)( 2
)
ρ WW = e −τ − 1 eτ − 1 = −e −τ = − E [W ] .
2
Since τ2 > 0 , − 1 < ρ WW < 0 and
0 < E [W ] = E [W ] < 1 .
Both these cases are understandable from the “geometry” of W = e Z = e X +iY = e X e iY . The
complex exponential function is the basis of polar coordinates; e X is the magnitude of W, and Y is
the angle of W in radians counterclockwise from the real axis of the complex plane. Imagine a
canon whose angle and range can be set. In the first case, the angle is fixed at zero, but the range is
variable. This makes for a lognormal distribution along the positive real axis. In the second case,
the canon’s angle varies, but its range is fixed at e 0 = 1 . This makes all the shots to land on the
complex unit circle; hence, their mean lies within the circle, i.e., E [W ] < 1 . Moreover, the symmetry
( )
of Y as N 0, τ 2 -distributed guarantees E [W ] to fall on the real axis, or − 1 < E [W ] < 1 .
Furthermore, since the normal density function strictly decreases in both directions from the mean,
more shots land to the right of the imaginary axis than to the left, so 0 < E [W ] = e −τ < 1 . A “right-
2
handed” canon, or a canon whose angle is measured clockwise from the real axis, fires W = e X e − iY
shots.
A shot from an unrestricted canon will “almost surely” not land on the real axis. 12 If we desire
negative values from the complex lognormal random variable, as a practical matter we must extract
them from its real or complex parts, e.g., U = Re(W ) . One can see in the second case, that as τ 2
grows larger, so too grows larger the probability that U < 0 . As τ 2 → ∞ , the probability
approaches one half. In the limit, the shots are uniformly distributed around the complex unit
f U (u ) =
1
, for − 1 ≤ u ≤ 1 . 13
π 1− u 2
This suggests a third case, in which τ 2 → ∞ while σ 2 remains at some positive amount. An
intriguing feature of complex variables is that infinite variance in Y leads to a uniform distribution of
W e (σ − τ + 2iρστ ) 2 0
2 2
E = lim (σ 2 − τ 2 + 2iρστ ) 2 =
W τ →∞ e 0
2
W e 2σ − e σ − τ e 2σ − 2 τ + 4iρστ − e σ −τ2
⋅ e 2iρστ e 2σ
2 2 2 2 2 2 2
0
Var = lim 2σ 2 − 2 τ 2 − 4iρστ =
− e σ − τ ⋅ e − 2iρστ e 2σ − e σ − τ
2 2 2 2 2
2σ 2
W τ →∞ e 0
2
e
Again, ρ has disappeared from the limiting distribution; but in this case ρ WW = 0 .
12 For an event almost surely to happen means that its probability is unity; for an event almost surely not to happen
means that its probability is zero. The latter case means not that the event will not happen, but rather that the event has
zero probability mass. For example, if X ~ Uniform[0, 1], Prob[X=½] = 0. So X almost surely does not equal ½, even
though ½ is as possible as any other number in the interval.
13 For more on this bimodal Arcsine(-1, 1) distribution see Wikipedia, “Arcsine distribution.”
14 The next section expands on this important subject. “Infinite variance in Y” means “as the variance of Y approaches
iY
infinity.” It does not mean that e is uniform for a variable Y whose variance is infinite, e.g., for a Pareto random
variable whose shape parameter is less than or equal to two.
15 Cf. Halliwell [2013] for a discussion on the right tails of the lognormal and other loss distributions.
In practical work with U = Re(W ) , 16 the angular part e iY will be more important than the
lognormal range e X . For example, one who wanted the tendency for the larger magnitudes of
U = Re(W ) to be positive might set the mean of Y at − π 2 and the correlation ρ to some positive
value. Thus, greater than average values of Y, angling off into quadrants 4 and 1 of the complex
plane, would correlate with larger than average values of X and hence of e X . Of course,
tolerably rare. Equivalently, one could set the mean of Y at π 2 and the correlation ρ to some
negative value. As a second example, one who wanted negative values of U to be less frequent than
positive, might set both the mean of Y and ρ to zero, and set the variance of Y so that
Prob[Y > π 2] is desirably small. Some distributions of U for τ 2 >> σ 2 are bimodal, as in the
specialized case σ 2 → 0 + and τ 2 → ∞ . But less extreme parameters would result in unimodal
In the previous section we claimed that as the variance τ 2 of the normal random variable Y
approaches infinity, e iY approaches a uniform distribution over the complex unit circle. The
explanation and justification of this claim in this section prepare for an important implication in the
next.
[ ]
Let real-valued random variable Y be distributed as N μ, σ 2 , and let W = e iY . According to the
[ ]
moment-generating-formula of Section 8, M Y (it ) = E e itY = e itμ + (it ) σ
2 2
2
= e itμ −t
2
σ2 2
. Although the
16 In the absence of an analytic distribution, practical work with the complex lognormal would seem to require
simulating its values from the underlying normal distribution.
formula applies to complex values of t, here we’ll restrict it to real values. With t ∈ ℜ M Y (it ) is
1 if t = 0
M Y (it ) = lim eitμ −t σ2 2
= eitμ lim e −t σ2 2
= δt 0 =
2 2
lim
σ →∞ σ →∞ σ →∞
0 if t ≠ 0
2 2 2
It is noteworthy, and indicative of a uniformity of some sort, that µ drops out of the result.
Next, let real-valued random variable Θ be uniformly distributed over [a, a + 2πn], where n is a
M Θ (it ) = E e itΘ [ ]
a + 2πn
1
= ∫
θ=a
e itθ
2πn
dθ
a + 2πn
e itθ
=
2πitn a
2πitn
−1 e
= e ita
2πitn
1 if t = 0
= 0 if tn = ±1, ± 2,
≠ 0 if tn not integral
e 2πitn − 1 1 if t = 0
lim M Θ (it ) = e ita lim = δt0 =
n →∞ 2πitn
n →∞
0 if t = 0
Hence, lim M Θ (it ) = lim M Y (it ) = δ t 0 . The equality of the limits of the characteristic functions of
n →∞ 2
σ →∞
the random variables implies the identity of the limits of their distributions; hence, the diffuse
17 Quotes are around ‘the same’ because the limiting distributions are not proper distributions. The notion of diffuse
distributions comes from Venter [1996, pp. 406-410], who shows there how different diffuse distributions result in
Indeed, for the limit to be δ t 0 it is not required that n be an integer. But for Θ ~ U[a, a + 2πn] , the
1 if j = 0
EW [ ] = E[e ]
j ijΘ
= 0 if jn = ±1, ± 2,
≠ 0 if jn not integral
So if n is an integer, jn will be an integer, and all the integral moments of W will be zero, except for
the zeroth. Therefore, the integral moments of W = eiΘ are invariant to n, as long as the n in 2πn,
the width of the interval of Θ, is a whole number. Hence, although we hereby define the unit-circle
rather than out of necessity. The probability for e iΘ to be in an arc of this circle of length l equals
l 2π .
EW[ ] = E [W
j j
]= E[W ] = δ
j
j0 [ ]
= δ j 0 = E W j . Alternatively, E W [ ] = E[e ] = δ(
j −ijΘ
− j )0 = δ j 0 . And
W W WW WW δ11 δ 20 1 0
Var = E [W W ] = E = = = I2
W W W W W W δ 02 δ11 0 1
Hence, W = eiΘ for Θ ~ U [0, 2π ] is not just a unit-circle random variable; having zero mean and
different Bayesian estimates. But here every continuous random variable Y diffuses through the periodicity of eiY into
the same limiting distribution, viz., the Kronecker δ t 0 (note 31).
Multiplying W by a complex constant α ≠ 0 affects the radius of the random variable, whose jkth
[
E (αW ) αW
j
( ) ] = α α E[W W ] = α α δ
k j k j k j k
jk
(α α ) j
=
0
if j = k
if j ≠ k
αW
The augmented variance is Var = α α Var [W ] = α α I 2 . One may consider α as an instance of a
αW
complex random variable Α. Due to the independence of Α from W, the jkth mixed moment of
ΑW is [
E ( ΑW ) ΑW
j
( ) ] = E [Α
k j k
Α E W jW][ k
] = E [Α j k
]
Α δ jk = E Α Α [( ) ]δj
jk . Its augmented
AW
variance is Var [ ] [ ]
= E ΑΑ Var [W ] = {Var [A] + E [Α]E Α }Var [W ]. Unlike the one-dimensional
AW
W, ΑW can cover the whole complex plane. However, like W, it too possesses the desirable
[
property that E ( ΑW ) = δ j 0 . 18
j
]
estimates of mean values would satisfy many users of actuarial analyses. Stochastic advances in
actuarial science over the last few decades notwithstanding, much actuarial work remains
random variable equals the function of the expectation of the random variable; in symbols,
Advances in computing hardware and software, as well as increased technical sophistication, have
made determinism more avoidable and less acceptable. However, the complex unit-circular random
variable provides a habitat for the survival of determinism. To see this, let f be analytic over the
domain of complex random variable Z. From Cauchy’s Integral Formula (Havil [2003, Appendix
D.8 and D.9]) it follows that within the domain of Z, f can be expressed as a convergent series
∞
f ( z ) = a 0 + a1 z + = a 0 + ∑ a j z j . Taking the expectation, we have:
j =1
[ ]
∞
E [ f (Z )] = a 0 + ∑ a j E Z j
j =1
[ ]
But if for every positive integer j E Z j = E [Z ] , then:
j
[ ]
∞ ∞
E [ f (Z )] = a 0 + ∑ a j E Z j = a 0 + ∑ a j E [Z ] = f (E [Z ])
j
j =1 j =1
Therefore, determinism conveniently works for analytic functions of random variables whose
Now a real-valued random variable whose moments are powers of its mean would have the
characteristic function:
[ ]
M X (it ) = E e itX = 1 + ∑
∞
(it ) j [ ] = 1 + ∑ (itj)! E[X ]
EX j
∞ j
j
= e itE [ X ] = M E [ X ] (it )
j =1 j! j =1
This is the characteristic function of the “deterministic” random variable, i.e., the random variable
whose probability is massed at one point, its mean. So determinism with real-valued random
variables requires “deterministic” random variables. But some complex random variables, such as
[ ]
the unit-circle, have the property E Z j = E [Z ] without being deterministic.
j
[ ]
In fact, when E Z j = E [Z ] , not only is E [ f (Z )] = f [E [Z ]] . For positive integer k, f
j k
(z ) is as
[
analytic as f itself; hence, E f k
(Z )] = f k
(E[Z ]) . So the determinism with these complex random
In Section 10 we saw that for the unit-circle random variable W = e iΘ and for j = ±1, ± 2, ,
[ ] [ ]
E W − j = E W j = 0 = E [W ] . Can determinism extend to non-analytic functions which involve
j
the negative moments? For example, let g ( z ) = 1 (η − z ) , for some complex η ≠ 0 . The function
is singular at z = η ; but within the disc {z : z η < 1} = {z : z < η } the function equals the
convergent series:
1 z z 1 z
2
z2
g ( z ) = 1 (η − z ) = ⋅
1 1
= 1 + + + = + 2 + 3 +
η z η η η η η η
1 −
η
Outside the disc, or for {z : z η > 1} = {z : z > η }, another convergent series represents the
function:
1 η η 1 η η2
2
g ( z ) = 1 (η − z ) = − ⋅
1 1
= − 1 + + + = − − 2 − 3 −
z η z z z z z z
1 −
z
1 W W2
E [g (W )] = E + 2 + 3 +
η η η
1 E [W ] E W
= + 2 +
2
+
[ ]
η η η3
1 0 0
= + 2 + 3 +
η η η
1
=
η
1 η η2
E [g (W )] = E − − 2 − 3 −
W W W
[ ]
−1
[ ]
= − E W − ηE W − η E W −3 −
−2 2
[ ]
= − 0 − η ⋅ 0 − η2 ⋅ 0 −
=0
Both answers are correct; however, only the first satisfies the deterministic equation
E [g (W )] = g (E [W ]) = g (0 ) = 1 η .
To understand why the answer depends on whether η is inside or outside the complex unit circle, let
us evaluate E [g (W )] directly:
2π
[(
E [g (W )] = E [1 (η − W )] = E 1 η − e iΘ
)] = ∫ 1 dθ
iθ
2π
θ =0 η − e
The next step is to transform from θ into z = e iθ . So dz = ie iθ dθ = izdθ , and the line integral
2π
1 dθ
E [g (W )] = ∫
θ =0
η − eiθ 2π
1 iz dθ
=∫
C
η − z iz 2π
1 1 dz
=∫
C
η − z z 2π i
1 dz
= ∫
2π i C z (η − z )
1 1 1 1
= ∫ + dz
2π i C z η − z η
1 1 dz 1 dz
=
∫ + ∫
η 2π i C z 2π i C η − z
1 1 dz 1 dz
=
∫ − ∫
η 2π i C z − 0 2π i C z − η
Now the value of each of these integrals is one if its singularity is within the unit circle C, and zero if
it is not. 19 Of course, the singularity of the first integral at z = 0 lies within C; hence, its value is
one. The second integral’s singularity at z = η lies within C if and only if η < 1 . Therefore:
1 1 dz 1 if η > 1
E [g (W )] = = g (E [W ]) ⋅
dz 1
∫ − ∫
η 2π i C z − 0 2π i C z − η 0 if η < 1
So the deterministic equation will hold for one of the Laurent series according to which the domain
of the non-analytic function is divided into regions of convergence. Fascinating enough is how the
1 if η > 1
function φ(η) =
1 dz
∫ =
2π i C z − η 0 if η < 1
serves as the indicator of a state, viz., the state of being
19Technically, the integral has no value if the singularity lies on C; but there are some practical advantages for “splitting
the difference” in that case.
complex numbers. A general real-valued form of such models is presented and derived in Halliwell
y1 X1 e1 e1 Σ11 Σ12
y = X β + e , Var e = Σ
2 2 2 2 21 Σ 22
The subscript ‘1’ denotes observations, ‘2’ denotes predictions. Vector y1 is observed; the whole
design matrix X is hypothesized, as well as the fourfold ‘Σ’ variance structure. Although the
variance structure may be non-negative-definite (NND), the variance of the observations Σ11 must
be positive-definite (PD). Also, the observation design X 1 must be of full column rank. The last
−1
two requirements ensure the existence of the inverses Σ11 and X1′Σ11
−1
X1 ( ) −1
. The best linear
−1
unbiased predictor of y 2 is yˆ 2 = X 2 βˆ + Σ 21Σ11 y 1 − X 1βˆ . ( ) The variance of prediction error
(
y 2 − ŷ 2 is Var [y 2 − yˆ 2 ] = X 2 − Σ 21Σ11
−1
) [ ]( −1
X 1 Var βˆ X 2 − Σ 21Σ11
′
) −1
X 1 + Σ 22 − Σ 21Σ11 Σ12 . Embedded
(
βˆ = X 1′ Σ11
−1
X1 )
−1
X 1′ Σ11
−1
[]
y 1 = Var βˆ ⋅ X 1′ Σ11
−1
y1 .
For the purpose of introducing complex numbers into the linear statistical model we will concern
ourselves here only the estimation of the parameter β. So we drop the subscripts ‘1’ and ‘2’ and
simplify the observation as y = Xβ + e , where Var [e] = Γ . Again, X must be of full column rank
and Γ must be Hermetian PD. According to Section 4, transjugation is to complex matrices what
transposition is to real-valued matrices. Therefore, the short answer for a complex model is:
(
βˆ = X * Γ −1 X )−1
[]
X * Γ −1 y = Var βˆ ⋅ X * Γ −1 y .
However, deriving the solution from the double-real representation in Section 3 will deepen the
y r − y i X r − X i β r − β i e r − ei
y = +
i y r X i X r β i β r e i e r
All the vectors and matrices in this form are real-valued. The subscripts ‘r’ and ‘i’ denote the real
and imaginary parts of y, X, β, and e. Due to the redundancy of double-real representation, we may
y r X r − X i β r e r
y = X +
X r β i ei
i i
Note that if X is real-valued, then X i = 0 , and y r and y i become two “data panels,” each with its
I iI t
Now let Ξ t = t , the augmentation matrix of Section 7, where t is the number of
I t − iI t
observations. Since the augmentation matrix is non-singular, premultiplying the left-column form
I t iI t y r I t iI t X r − X i β r I t iI t e r
I = +
t − iI t y i I t − iI t X i X r β i I t − iI t e i
y r + iy i X r + iX i − X i + iX r β r e r + ie i
y − iy = X − iX − X − iX β + e − ie
r i r i i r i r i
y X iX β r e
y = X − iX β + e
i
y Xβ r + iXβ i e
y = Xβ − i Xβ + e
r i
y Xβ e
y = X β + e
y X 0 β e
y = 0 X β + e
e
insight is that Var is an augmented variance, whose general form according to Section 6 is
e
e Γ C
Var = . Therefore, the general form of the observation of a complex linear model is
e C Γ
y X 0 β e e Γ C
y = 0 X β + e , where Var e = C Γ . Not only is y as observable as y, but also β
e
complex linear statistical model does not require to be “proper complex,” as defined in Section
e
7.
X 0
Since X is of full column rank, so too must be . And since Γ is Hermetian PD, both it and
0 X
Γ C
its conjugate Γ are invertible. But the general form of the observation requires to be
C Γ
Hermetian PD, hence invertible. A consequence is that both the “determinant” forms Γ − C Γ −1C
and Γ − C Γ −1C are Hermetian PD and invertible. With this background it can be shown, and the
−1
Γ C Η K
reader should verify, that = (
, where Η = Γ − C Γ −1 C )−1
and K = −Γ −1C Η . The
C Γ K Η
y X 0 β e e Γ C
The solution of the complex linear model = + , where Var = , is:
y 0 X β e e C Γ
−1
βˆ X 0 Γ C X 0 X 0 Γ C y
* −1 * −1
ˆ =
β 0 X C Γ 0 X 0 X C Γ y
−1
X * 0 Η K X 0 X * 0 Η K y
=
0
X ′ K Η 0 X 0 X ′ K Η y
−1
X *Η X X * K X X *Η y + X * K y
=
X′KX X′Η X X′Ky + X′Η y
−1
βˆ X *ΗX X *KX
And Var ˆ = , which must exist since it is a quadratic form based on the
β X′KX X′Η X
−1
Γ C X 0
Hermetian PD and the full column rank . Reformulate this as:
C Γ 0 X
The conjugates of the two equations in β̂ and β̂ are the same equations in β̂ and β̂ :
ˆ βˆ
Therefore, β = ˆ . It is well known that the estimator of a linear function of a random variable
βˆ β
is the linear function of the estimator of the random variable. But conjugation is not a linear
function. Nevertheless, we have just proven that the estimator of the conjugate is the conjugate of
the estimator.
complex random variables is a plane, rather than a line, their obvious application is bivariate. An
example is a random variable whose real part is loss and whose imaginary part is LAE. Another
application might pertain to copulas. According to Venter [2002, p. 69], “copulas are joint
distributions of unit random variables.” One could translate these joint distributions into
distributions of complex variables whose support is the complex unit square, i.e., the square whose
vertices are the points z = 0, 1, 1 + i, i . However, for now it seems that real-valued bivariates
Actuaries who have applied log-linear models to triangles with paid increments have been frustrated
applying them to incurred triangles. The problem is that incurred increments are often negative, and
the logarithm of a negative number is not real-valued. This has led Glenn Meyers [2013] to seek
modified lognormal distributions whose support includes the negative real numbers. The persistent
intractability of the log-linear problem was a major reason for our attention to the lognormal
random vector w n×1 = e z n×1 in Section 8. But to model an incurred loss as the exponential function
of a complex number suffers from two drawbacks. First, to model a real-valued loss as e z = e x ⋅ e iy
requires y to be an integral multiple of π. The mixed random variable e X with probability p and
− e X with probability 1 − p is not lognormal. No more suitable are such “denatured” random
( )
variables as Re e Z . Second, one still cannot model the eminently practical value of zero, because
for all z, e z ≠ 0 . 21 At present it does not appear that complex random variables will give birth to
useful distributions of real-valued random variables. Even the unit-circle and indicator random
variables of Sections 10 and 11, as interesting as they are in the theory of analytic functions, most
The complex version of the linear model in Section 12 showed us that conjugates of observations
are themselves observations and that conjugates of estimators are estimators of conjugates.
Moreover, there we found a use for augmented variance. Nonetheless we are still fairly bound to
our conclusion to Section 3, that one who lacked either the confidence or the software to work with
So how can actuarial science benefit from complex random variables? The great benefit will come
from new ways of thinking. The first step will be to overcome the habit of picturing a complex
number as half real and half imaginary. Historically, it was only after numbers had expanded from
rational to irrational that the whole set was called “real.” Numbers ultimately are sets; zero is just
the empty set. How real are sets? Regardless of their mathematical reality, they are not physically
real. Complex numbers were deemed “real” because mathematicians needed them for the solution
of polynomial equations. In the nineteenth century this spurred the development of abstract
algebra. At first new ways of thinking amount to differences in degree; at some point many develop
21 Ife a = 0 for some a, then for all z e z = e z − a + a = e z − a e a = e z − a ⋅ 0 = 0 . One who sees that
lim e x ⋅ eiy = 0 ⋅ eiy = 0 might propose to add the ordinate Re( z ) = x = −∞ to the complex plane. But not only
x → −∞
is this proposal artificial; it also militates against the standard theory of complex variables, according to which all points
infinitely far from zero constitute one and the same point at infinity.
into differences in kind. One might argue, “Why study Euclidean geometry? It all derives from a
few axioms.” True, but great theorems (e.g., that the sum of the angles of a triangle is the sum of
two right angles) can be a long way from their axioms. A theorem means more than the course of
its proof; often there are many proofs of a theorem. Furthermore, mathematicians often work
backwards from accepted or desired truths to efficient and elegant sets of axioms. Perhaps the
most wonderful thing about mathematics is its “unreasonable effectiveness in the natural sciences,”
to quote physicist Eugene Wigner. The causality between pure and applied mathematics works in
both directions. Therefore, it is likely that complex random variables and vectors will find their way
into actuarial science. But it will take years, even decades, and technology and education will have to
14. CONCLUSION
Just as physics divides into different areas, e.g., theoretical, experimental, and applied, so too
actuarial science, though perhaps more concentrated on business application, justifiably has and
needs a theoretical component. Theory and application cross-fertilize each other. In this paper we
have proposed to add complex numbers to the probability and statistics of actuarial theory. With
patience, the technically inclined actuary should be able to understand the theory of complex
random variables delineated herein. In fact, our multivariate approach may even more difficult to
understand than the complex-function theory; but both belong together. Although all complex
matrices and operations were formed from double-real counterparts, we believe that the “sum is
greater than the parts,” i.e., that the assimilation of this theory will lead to higher-order thinking and
creativity. In the sixteenth century the “fiction” of i = − 1 allowed mathematicians to solve more
equations. Although at first complex solutions were deemed “extraneous roots,” eventually their
practicality became recognized; so that today complex numbers are essential for science and
engineering. Applying complex numbers to probability has lagged; but even now it is part of signal
processing in electrical engineering. Knowing how rapidly science has developed with nuclear
physics, molecular biology, space exploration, and computers, who would dare to bet against the
usefulness of complex random variables to actuarial science by the mid-2030s, when many scientists
and futurists expect nuclear fusion to be harnessed and available for commercial purposes?
REFERENCES
[1.] Halliwell, Leigh J., Conjoint Prediction of Paid and Incurred Losses, CAS Forum, Summer 1997, 241-380,
www.casact.org/pubs/forum/97sforum/97sf1241.pdf.
[2.] Halliwell, Leigh J., “Classifying the Tails of Loss Distributions,” CAS E-Forum, Spring 2013, Volume 2,
www.casact.org/pubs/forum/13spforumv2/Haliwell.pdf.
[3.] Havil, Julian, Gamma: Exploring Euler’s Constant, Princeton University Press, 2003.
[4.] Healy, M. J. R., Matrices for Statistics, Oxford, Clarendon Press, 1986.
[5.] Hogg, Robert V., and Stuart A. Klugman, Loss Distributions, New York, Wiley, 1984.
[6.] Johnson, Richard A., and Dean Wichern, Applied Multivariate Statistical Analysis (Third Edition), Englewood
Cliffs, NJ, Prentice Hall, 1992.
[7.] Judge, George G., Hill, R. C., et al., Introduction to the Theory and Practice of Econometrics (Second Edition), New
York, Wiley, 1988.
[8.] Klugman, Stuart A., et al., Loss Models: From Data to Decisions, New York, Wiley, 1998.
[9.] Meyers, Glenn, “The Skew Normal Distribution and Beyond,” Actuarial Review, May 2013, p. 15,
www.casact.org/newsletter/pdfUpload/ar/AR_May2013_1.pdf.
[11.] Schneider, Hans, and George Barker, Matrices and Linear Algebra, New York, Dover, 1973.
[12.] Veeravalli, V. V., Proper Complex Random Variables and Vectors, 2006, http://courses.engr.
illinois.edu/ece461/handouts/notes6.pdf.
[13.] Venter, G.G., "Credibility," Foundations of Casualty Actuarial Science (Third Edition), Casualty Actuarial Society,
1996.
[14.] Venter, G.G., "Tails of Copulas," PCAS LXXXIX, 2002, 68-113, www.casact.org/pubs/proceed/
proceed02/02068.pdf.
[15.] Weyl, Hermann, The Theory of Groups and Quantum Mechanics, New York, Dover, 1950 (ET by H. P. Robertson
from second German Edition 1930).
[17.] Wikipedia contributors, “Complex normal distribution,” Wikipedia, The Free Encyclopedia,
http://en.wikipedia.org/wiki/Complex_normal_distribution (accessed October 2014).
APPENDIX A
decided to prepare for it in Appendices A and B. According to Section 7, the probability density
function of real-valued n×1 normal random vector x with mean µ and variance Σ is:
−
1
( x −μ )′ Σ −1 ( x −μ )
f x (x ) =
1 2
e
(2π )n Σ
Therefore, ∫ f (x )dV = 1 .
x The single integral over ℜ n represents an n-multiple integral over each
x∈ℜ n
xj from –∞ to +∞; dV = dx 1 dx n .
∑t jxj
n
vector. 22 Partial derivatives of the moment generating function evaluated at t = 0 n×1 equal moments
of x, since:
∂ k1 +kn M x (t )
∂ k1 x 1 ∂ kn x n
[
= E x1k1 x nkn e t ′x ] t =0
[
= E x1k1 x nk n ]
t =0
But lognormal moments are values of the function itself. For example, if t = e j , the jth unit vector,
then M x (e j ) = E e [ ] = E[e ].
e′j x Xj
[
Likewise, M x (e j + e k ) = E e j e X k . The moment generating
X
]
function of x, if it exists, is the key to the moments of e x .
22 All real-valued t vectors are suitable; Appendix B will extend the suitability to complex t.
M x (t ) = E e t′x [ ]
1 −
1
( x −μ )′ Σ −1 ( x −μ ) t ′x
= ∫ (2π ) n
Σ
e 2
e dV
x∈ℜ n
1 −
1
{
( x −μ )′ Σ −1 ( x −μ )− 2 t ′x }
= ∫ (2π ) n
Σ
e 2
dV
x∈ℜ n
(x − μ )′ Σ −1 (x − μ ) − 2t ′x = (x − [μ + Σt ])′ Σ −1 (x − [μ + Σt ]) − 2t ′μ − t ′Σt
We leave it for the reader to verify. By substitution, we have:
−
1
{
( x −μ )′ Σ −1 ( x −μ )− 2 t ′x }
M x (t ) =
1
∫ (2π ) n
Σ
e 2
dV
x∈ℜ n
1 −
1
{
( x −[μ + Σt ])′ Σ −1 ( x −[μ + Σt ])− 2 t ′μ − t ′Σt }
= ∫ (2π ) n
Σ
e 2
dV
x∈ℜ n
1 −
1
( x −[μ + Σt ])′ Σ −1 ( x −[μ + Σt ])
= ∫ (2π ) n
Σ
e 2
dV ⋅ e t′μ + t′Σt 2
x∈ℜ n
= 1 ⋅ e t′μ + t′Σt 2
= e t′μ + t′Σt 2
The reduction of the integral to unity in the second last line is due to the fact that
1 −
1
( x −[μ + Σt ])′ Σ −1 ( x −[μ + Σt ])
e 2
is the probability density function of the real-valued n×1 normal
(2π )n Σ
random vector with mean μ + Σt and variance Σ. This new mean is valid if it is real-valued, which
∂M x (t ) ∂M x (t )
= (μ + Σt )e t′μ + t′Σt 2 ⇒ E [x]= =μ
∂t ∂t t =0
∂ 2 M x (t ) ∂ (μ + Σt )e t′μ + t′Σt 2
=
∂t∂t ′ ∂t ′
′
= Σ + (μ + Σt )(μ + Σt ) e t′μ + t′Σt 2
∂ 2 M x (t )
⇒ E [xx ′] = = Σ + μμ ′
∂t∂t ′ t =0
⇒ Var [x] = E [xx ′] − μμ ′ = Σ
Ee[ ] = E[e ] = M (e ) = e
Xj e′j x
x j
e′j μ + e′j Σe j 2
=e
µ j + Σ jj 2
[ ]
E e j e X k = E e
X
(e j + e k )′ x
(e j + e k )′ μ + (e j + e k )′ Σ (e j + e k ) 2
=e
µ j + µ k + (Σ jj + Σ jk + Σ kj + Σ kk ) 2
=e
µ j + Σ jj 2 + µ k + Σ kk 2 + (Σ jk + Σ kj ) 2
=e
µ j + Σ jj 2 (Σ jk + Σ kj ) 2
=e ⋅ e µ k + Σ kk 2 ⋅ e
µ j + Σ jj 2 (Σ jk + Σ jk ) 2
=e ⋅ e µ k + Σ kk 2 ⋅ e
=Ee [ ]E[e ]⋅ e
Xj Xk Σ jk
23 The vector formulation of partial differentiation is explained in Appendix A.17 of Judge [1988].
[ X
] [
So, Cov e j , e X k = E e j e X k − E e
X
] [ ]E[e ] = E[e ]E[e ](e
Xj Xk Xj Xk Σ jk
)
− 1 , which is the multivariate
[ ]
[ ]
[ ]
we have Var [x] = diag E e x {e Σ − 1n×n }diag E e x . Because diag E e x is diagonal in positive
elements (hence, symmetric and PD), Var [x] is NND [or PD] if and only if e Σ −1n×n is NND [or
PD]. 25 Symmetry is no issue here, because for real-valued matrices, Σ is symmetric if and only if
e Σ −1n×n is symmetric.
The relation between Σ and Τ = e Σ − 1n×n is merits a discussion whose result will be clear from a
consideration of 2×2 matrices. Since Σ 2×2 is symmetric, it is defined in terms of three real numbers:
a b
Σ= . Now Σ is NND if and only if 1) a ≥ 0 , 2) d ≥ 0 , and 3) ad − b 2 ≥ 0 . Σ is PD if and
b d
only if these three conditions are strictly greater than zero. If a or d is zero, by the third condition b
also must be zero. 26 Since we are not interested in degenerate random variables, which are
effectively constants, we will require a and d to be positive. With this requirement, Σ is NND if and
only if b 2 ≤ ad , and PD if and only if b 2 < ad . Since a and d are positive, so too is ad, as well as
a+d
− γ < b < γ . It is well-known that min (a, d ) ≤ γ ≤ ≤ max(a, d ) with equality if and only if
2
a=d.
e a − 1 e b − 1
Now the same three conditions determine the definiteness of Τ = e Σ − 12×2 = b .
e − 1 e d
− 1
Since we required a and d to be positive, both e a − 1 and e d − 1 are positive. This leaves the
( )( ) ( )(
definiteness of Τ dependent on the relation between e b − 1 e b − 1 and e a − 1 e d − 1 . We will )
next examine this relation according to the three cases b = 0 , b > 0 , and b < 0 , all of which must
be subject to − γ ≤ b ≤ γ .
( )( ) (
First, if b = 0 , then − γ < b < γ and Σ is PD. Furthermore, e b − 1 e b − 1 = 0 < e a − 1 e d − 1 . )( )
Therefore, in this case, the lognormal transformation Σ → Τ = e Σ − 12×2 is from PD to PD. And
zero covariance in the normal pair produces zero covariance in the lognormal pair. In fact, since
zero covariance between normal bivariates implies independence (cf. §2.5.7 of Judge [1988]), the
In the second case, b > 0 , or more fully, 0 < b ≤ γ . Σ is PD if and only if b < γ . Define the
(( ) )
function ϕ (x ) = ln e x − 1 x for positive real x (or x ∈ ℜ + ). A graph will show that the function
strictly increases, i.e., ϕ ( x1 ) < ϕ ( x 2 ) if and only if x1 < x 2 . Moreover, the function is concave
upward. This means that the line segment between points ( x1 , ϕ ( x1 )) and ( x 2 , ϕ ( x 2 )) lies above
ϕ (x1 ) + ϕ (x 2 ) x + x1
the curve ϕ ( x ) for intermediate values of x. In particular, ≥ ϕ 1 .
2 2
x + x1
Equivalently, for all x1 , x 2 ∈ ℜ + , 2ϕ 1 ≤ ϕ ( x1 ) + ϕ ( x 2 ) with equality if and only if x1 = x 2 .
2
a+d a+d
Therefore, since a and d are positive, 2ϕ ≤ ϕ (a ) + ϕ (d ) . And since 0 < γ = ad ≤ ,
2 2
a+d
2ϕ (γ ) ≤ 2ϕ ≤ ϕ (a ) + ϕ (d ) . So 2ϕ (γ ) ≤ ϕ (a ) + ϕ (d ) with equality if and only if a = d .
2
eb − 1 ea −1 ed −1
2 ln = 2ϕ (b ) ≤ ϕ (a ) + ϕ (d ) = ln + ln
b a d
2
eb − 1 e a − 1 e d − 1
≤
b a d
2 2
eb − 1 b
( )
e − 1 = b
b 2 2
≤
b2 a
( )( ) (
e − 1 e d − 1 = e a − 1 e d − 1 ≤ 1 ⋅ e a − 1 e d − 1)( ) ( )( )
b ad γ
( ) (
2
)( )
Therefore, in this case e b − 1 ≤ e a − 1 e d − 1 with equality if and only if a = b = d . This means
that if b > 0 , the lognormal transformation Σ → Τ = e Σ − 12×2 is from NND to NND. But Τ is
( ) (
2
)( )
NND only if e b − 1 = e a − 1 e d − 1 , or only if a = b = d . Otherwise, Τ is PD. So, when
b > 0 , Τ = e Σ − 12×2 is PD except when all four elements of Σ have the same positive value. Even
a ad
the NND matrix Σ = log-transforms into a PD matrix, unless a = d . So all PD and
ad d
most NND normal variances transform into PD lognormal variances. A NND lognormal variance
indicates that at least one element of the normal random vector is duplicated.
In the third and final case, b < 0 , or more fully, − γ ≤ b < 0 . This is equivalent to 0 < −b ≤ γ , or
( ) (
2
)( )
to the second case with − b . In that case, e −b − 1 ≤ e a − 1 e d − 1 with equality if and only if
a = −b = d . But from this, as well as from the fact that 0 < e 2b < e 2⋅0 < 1 , it follows:
(e b
) 2
(
− 1 = e 2 b 1 − e −b )2
( )
2
( )( ) ( )(
= e 2 b e −b − 1 ≤ e 2 b e a − 1 e d − 1 < 1 ⋅ e a − 1 e d − 1 )
( ) (
2
)( )
So in this case the inequality is strict: e b − 1 < e a − 1 e d − 1 , and Τ is PD. Therefore, if b < 0 ,
not PD, but merely NND, Τ is almost always PD. Only when Σ is so NND as to conceal a
The Hadamard (elementwise) product and Schur’s product theorem allow for an understanding of
the general lognormal transformation Σ → Τ = e Σ − 1n×n . Denoting the elementwise nth power of Σ
n factors
∞
as Σ n = Σ Σ , we can express elementwise exponentiation as e Σ = ∑ Σ j j! . So
j =0
∞
Τ = e Σ − 1n×n = ∑ Σ j j! . According to Schur’s theorem (§3 of Million [2007]), the Hadamard
j =1
product of two NND matrices is NND. 27 Since Σ is NND, its powers Σ j are NND, as well as the
terms Σ j j! . Being the sum of a countable number of NND matrices, Τ also must be NND. 28
But if just one of the terms of the sum is PD, the sum itself must be PD. Therefore, if Σ is PD,
Now the kernel of m×n matrix A is the set of all x ∈ ℜ n such that Ax = 0 , or
By the Cholesky decomposition the NND matrix U can be factored as U n×n = W ′Wn×n . The
′
quadratic form in U is x ′Ux = x ′W ′Wx = (Wx ) (Wx ) . If x ′Ux = 0 , then Wx = 0 n×1 , and
Ux = W ′Wx = W ′0 n×1 = 0 n×1 . Conversely, if Ux = 0 n×1 , then x ′Ux = 0 . So the kernel of NND
matrix U is precisely the solution set of x ′Ux = 0 , i.e., x ′Ux = 0 if and only if x ∈ ker (U ) .
∞
Therefore, the kernel of Τ = ∑ Σ j j! is the intersection of the kernels of Σ j , or
j =1
∞
( )
ker (Τ) = ker Σ j . It is possible for this intersection to be of dimension zero, i.e., for it to equal
j =1
{0 n×1 }, even though the kernel of no Σ j is. Because of the accumulation of intersections in
k →∞
∑Σ j =1
j
j! , the lognormal transformation of NND matrix Σ tends to be “more PD” than Σ itself.
We showed above that if Σ is PD, then Τ = e Σ − 1n×n is PD. But even if Σ is NND, Τ will be PD,
Σ jj Σ jk
unless Σ contains a 2×2 subvariance whose four elements are all equal.
Σ kj Σ kk
APPENDIX B
x ~ N (μ, Σ ) , viz., M x (t ) = e t′μ + t′Σt 2 , is valid at least for t ∈ ℜ n . The validity rests on the identity
1 −
1
( x −[μ + Σt ])′ Σ −1 ( x −[μ + Σt ])
∫ (2π ) n
Σ
e 2
dV = 1 for real-valued ξ = μ + Σt . But in Section 8 we must
x∈ℜ n
1
− ( x − ξ )′ Σ −1 ( x − ξ )
know the value of ϕ (ξ ) =
1
∫ (2π ) n
Σ
e 2
dV when ξ is a complex n×1 vector. So in
x∈ℜ n
The proof begins with diagonalization. Since Σ is symmetric and PD, Σ −1 exists and is symmetric
and PD. According to the Cholesky decomposition (Healy [1986, §7.2]), there exists a real-valued
n×n matrix W such that W ′W = Σ −1 . Due to theorems on matrix rank, W must be non-singular, or
dVx
dVy = W dVx = W dVx = W′ W dVx = W′W dVx = Σ −1 dVx =
2
.
Σ
Hence:
1
− ( x − ξ )′ Σ −1 ( x − ξ )
ϕ (ξ ) =
1
∫ (2π ) n
Σ
e 2
dV
x∈ℜ n
1
1 − ( x − ξ )′ W′W ( x − ξ )
= ∫ (2π ) n
Σ
e 2
dVx
x∈ℜ n
1 −
1
[W ( x −ξ )]′ [W ( x −ξ )] dVx
= ∫ (2π ) n
e 2
Σ
x∈ℜ n
1 −
1
( y − ζ )′ ( y − ζ )
= ∫ (2π ) n
e 2
dVy
y∈ℜ n
∑( )2
1 n
− y j −ζ j
1
∫ (2π )
2 j =1
= e dV
n
y∈ℜ n
∑( )2
1 n
n − y j −ζ j
1
∫ ∏
2 j =1
= e dV
y∈ℜ n j =1 2π
∑( )2
1 n
n ∞ − y j −ζ j
1
=∏ ∫
2 j =1
e dy j
j =1 y j = −∞ 2π
= ∏ ψ(ζ j )
n
j =1
1
b
− ( x − ζ )2
In the last line ψ(ζ ) = lim dx . Obviously, if ζ is real-valued, ψ(ζ ) = 1 . So the
1
∫
a → −∞ x = a 2π
e 2
b → +∞
issue of the value of a moment generating function of a complex variable resolves into the issue of
the “total probability” of a unit-variance normal random variable with a complex mean. 29
To evaluate ψ(ζ ) requires some complex analysis with contour integrals. First, consider the
z2
−
standard-normal density function with a complex argument: f ( z ) =
1
e 2
. By function-
2π
composition rules, since both z 2 and the exponential function are “entire” functions (i.e., analytic
29 We deliberately put ‘total probability’ in quotes because the probability density function with complex ζ is not proper;
it may produce negative and even complex values for probability densities.
over the whole complex plane), so too is f ( z ) . Therefore, ∫ f (z )dz = 0 for any closed contour C
C
(cf. Appendix D.7 of Havil [2003]). Let C be the parallelogram traced from vertex z = b to vertex
0 = ∫ f ( z )dz
C
a a −ζ b −ζ b
= ∫ f ( z )dz + ∫ f (z )dz + ∫ f (z )dz + ∫ f (z )dz
b a a −ζ b −ζ
The line segments along which the second and fourth integrals proceed are finite; their common
a −ζ a −ζ a −ζ
a −ζ a −ζ a −ζ a −ζ
b b
Now, in general:
f (z ) = f (z ) f (z )
z2 z2
1 − 1 −
= e 2
⋅ e 2
2π 2π
z2 z2
− −
e 2
⋅e 2
=
2π
z2 z2
− −
∝ e 2
⋅e 2
z2 +z2
−
∝ e 2
∝ e − Re (z )
2
Therefore, since a ∈ ℜ :
a → −∞ a → −∞
− a 2 Re z
2
a z ζ
∝ lim Sup e
; ∈ 1, 1 − ⋅L
a 2 → +∞
a a
∝ e −∞ Re (1) ⋅ L
∝0
Similarly, lim M (b ) ⋅ L ∝ 0 . So in the limit as a → −∞ and b → +∞ on the real axis, the second
b → +∞
0 = lim {0}
a → −∞
b → +∞
a a −ζ b −ζ b
= lim ∫ f ( z )dz + ∫ f ( z )dz + ∫ f ( z )dz + ∫ f ( z )dz
a → −∞ b
b → +∞
a a −ζ b −ζ
a b −ζ
b −ζ a b +∞ 1
− z2
lim ∫ f (z )dz = − lim ∫ f (z )dz = lim ∫ f (z )dz = ∫
1
And so, e 2
dz = 1
a → −∞ a − ζ a → −∞ b a → −∞ a −∞ 2π
b → +∞ b → +∞ b → +∞
So, at length:
1
b
− ( x − ζ )2
ψ(ζ ) = lim
1
∫
a → −∞ x = a 2π
e 2
dx
b → +∞
1
b
− (( z + ζ )− ζ )2
d (z + ζ )
1
= lim ∫
a → −∞ x = a 2π
e 2
b → +∞
b −ζ 1
− z2
d (z + ζ )
1
= lim ∫
a → −∞ z = a − ζ 2π
e 2
b → +∞
b −ζ 1
1 − z2
= lim ∫
a → −∞ z = a − ζ 2π
e 2
dz
b → +∞
b −ζ
= lim ∫ f (z )dz
a → −∞ z = a − ζ
b → +∞
=1
+∞ 1
− ( x − ζ )2
So, working backwards, what we proved for one dimension, viz., ψ(ζ ) =
1
∫
x = −∞ 2π
e 2
dx = 1 ,
1
− ( x − ξ )′ Σ −1 ( x − ξ )
applies n-dimensionally: for all ξ ∈ C , ϕ (ξ ) =
1
∫ (2π ) dV = 1 . Therefore, even
n 2
e
Σ
n
x∈ℜ n
for complex t, M x (t ) = e t′μ + t′Σt 2 . Complex values are allowable as arguments in the moment
Though we believe the contour-integral proof above to be worthwhile for its instructional value, a
simple proof comes from the powerful theorem of analytic continuation (cf. Appendix D.12 of
Havel [2003]). This theorem concerns two functions that are analytic within a common domain. If
the functions are equal over any smooth curve within the domain, no matter how short, 30 then they
30 The length of the curve must be positive; equality at single points, or punctuated equality, does not qualify.
1
b
− ( x − ζ )2
are equal over all the domain. Now ψ(ζ ) = lim
1
∫
a → −∞ x = a 2π
e 2
dx is analytic over all the complex
b → +∞
plane. And for all real-valued ζ, ψ(ζ ) = 1 . So ψ(ζ ) and f (ζ ) = 1 are two functions analytic over the
complex plane and identical on the real axis. Therefore, by analytic continuation ψ(ζ ) must equal
one for all complex ζ. Analytic continuation is analogous with the theorem in real analysis that two
smooth functions equal over any interval are equal everywhere. Analytic continuation derives from
the fact that a complex derivative is the same in all directions. It is mistaken to regard the real and
imaginary parts of the derivative as partial derivatives, as if they applied respectively to the real and
imaginary axes of the independent variable. Rather, the whole derivative applies in every direction.
APPENDIX C
definite (NND) matrices is NND. Million [2007] proves it as Theorem 3.4; however, we believe our
Let Γ be an n×n Hermetian NND matrix. As explained in Section 4, ‘Hermetian’ means that
Γ = Γ * ; ‘NND’ means that for every complex n×n vector z (or z ∈ C n ), z * Γz ≥ 0 . Positive
Since ν = 0 n×1 is excluded as a trivial solution, vector ν can be scaled to unity, or ν * ν = 1 . But
be zero. Since the determinant is an nth-degree equation (with complex coefficients based on the
elements of Γ) in λ, it has n root values of λ, not necessarily distinct. So the determinant can be
zero solutions to (Γ − λI n )ν = 0 n×1 . So for every eigenvalue there is a non-zero eigenvector, even a
The first important result is that the eigenvalues of Γ are real-valued and non-negative. Consider the
valued and positive. Therefore, their quotient λj is a real-valued and non-negative scalar.
The second important result is that eigenvectors paired with unequal eigenvalues are orthogonal.
Let the two unequal eigenvalues be λ j ≠ λ k . Because the eigenvalues are real-valued, λ j = λ j . The
(λ j − λ k )ν *j ν k = λ j ν *j ν k − λ k ν *j ν k
= λ j ν *j ν k − λ k ν *j ν k
( ) −λ ν ν
= λ j ν *k ν j
*
k
*
j k
= (ν Γν ) − ν Γν
* * *
k j j k
= ν Γ ν k − ν Γν k
*
j
* *
j
= ν *j Γν k − ν *j Γν k
=0
distinct, the eigenvectors form an orthogonal basis of C n . But even if not, the kernel of each
{ }
eigenvalue, or ker (Γ − λ j I n ) = z ∈ C n : Γz = λ j z is a linear subspace of C n whose rank or
This means that the number of mutually orthogonal eigenvectors paired with an eigenvalue equals
how many times that eigenvalue is a root of its characteristic equation. Consequently, there exist n
Now, define W as the partitioned matrix Wn×n = [ν 1 ν n ] . The jkth element of W*W equals
W. 32 Furthermore, define Λ as the n×n diagonal matrix whose jjth element is λj. Then:
λ 1
ΓW = Γ[ν 1 ν n ] = [Γν 1 Γν n ] = [λ 1 ν 1 λ n ν n ] = [ν 1
ν n ] = WΛ
λ n
And so, Γ = ΓI n = ΓWW * = WΛW * , and Γ is said to be “diagonalized.” Thus have we shown,
33
assuming the theory of equations, the third important result, viz., that every NND Hermetian
matrix can be diagonalized. Other matrices can be diagonalized; the NND [or PD] consists in the
fact that all the eigenvalues of this diagonalization are non-negative [or positive].
The fourth and final “eigen” result relies on the identity W * ν j = e j , which just extracts the jth
Γ = WΛW *
n
= W ∑ λ j e j e*j W *
j =1
n
(
= W ∑ λ j W * ν j W * ν j)( ) W
* *
j =1
n
= W ∑ λ j W * ν j ν *j W W *
j =1
n
= WW * ∑ λ j ν j ν *j WW *
j =1
n
= I n ∑ λ j ν j ν *j I n
j =1
n
= ∑ λ j ν j ν *j
j =1
n
The form ∑λ ν
j =1
j j ν *j is called the “spectral decomposition” of Γ (§7.4 of Healy [1986]), which plays
the leading role in the following succinct proof of Schur’s product theorem.
If Σ and Τ are two n×n Hermetian NND definite matrices, we may spectrally decompose them as
n n
Σ = ∑ λ j ν j ν *j and Τ = ∑ κ j η j η*j , where all the λ and κ scalars are non-negative. Accordingly:
j =1 j =1
(Σ Τ) jk = (Σ ) jk (Τ) jk
n n
= ∑ λ r ν r ν *r ∑ κ s η s η*s
r =1 jk s =1 jk
n n
= ∑ λ r (ν r ) j (ν r )k ∑ κ s (η s ) j (η s )k
r =1 s =1
n n
= ∑∑ λ r κ s (ν r ) j (η s ) j (ν r )k (η s )k
r =1 s =1
n n
= ∑∑ λ r κ s (ν r η s ) j (ν r η s )k
r =1 s =1
( )
n n
= ∑∑ λ r κ s (ν r η s ) j ν r η s k
r =1 s =1
( )
n n
= ∑∑ λ r κ s (ν r η s )(ν r η s )
*
jk
r =1 s =1
n n *
= ∑∑ λ r κ s (ν r η s )(ν r η s )
r =1 s =1 jk
n n
Hence, Σ Τ = ∑∑ λ r κ s (ν r η s )(ν r η s ) . Since each matrix (ν r η s )(ν r η s ) is Hermetian
* *
r =1 s =1
NND, and each scalar λ r κ s is non-negative, Σ Τ must be Hermetian NND. Therefore, the