Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

QUANTUM NOTES:

matrix analysis – linear algebra on complex scalar product spaces

Ψ
Budapest Semesters in Mathematics / Aquincum Institute of Technology
Preface

The aim of this note is to deal with ortho-projections, self-adjoint and unitary matrices,
operator positivity and with (finite dimensional) operator- and trace-inequalities. These
are not purely linear algebraic concepts; they belong to the area which some would call
“matrix-analyses”. The crucial difference is that apart from a vector space, here we also
need a scalar product.

Since we intend to apply the machinery learned here in Quantum Information Theory, we
shall need to study these concepts over the field of complex numbers. However, according
to my experience, although students attending my course have some familiarity with —
what they call — “dot-products” or “inner-products”, they have never seen them employed
on complex spaces. Thus here we begin with the concept of scalar products on complex
spaces.

Over the years, starting as a couple of pages long handout, the notes kept getting steadily
longer and longer. So let me also use this preface to thank BSM student Ryan Utke, who
gave a hand in writting up this note.

Mihály Weiner, Budapest 2017.


Contents

1 Scalar products on complex spaces 3


1.1 From dot product to complex scalar products . . . . . . . . . . . . . . . . 3
1.2 The Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Orthogonality and orthogonal systems . . . . . . . . . . . . . . . . . . . . 6
1.4 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 The adjoint 11
2.1 Definition and elementary properties . . . . . . . . . . . . . . . . . . . . . 11
2.2 Normal operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Spectral and further characterizations . . . . . . . . . . . . . . . . . . . . . 16

3 Operator positivity 21
3.1 Fundamental properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Gramm-matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Operator and trace inequalities . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 The cone of positives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 The convex body of density operators . . . . . . . . . . . . . . . . . . . . . 28
2 CONTENTS
Chapter 1

Scalar products on complex spaces

1.1 From dot product to complex scalar products


Most students hear about the “dot-product” between vectors in high school, where it is
introduced by the formula “dot-product = product of lengths times the cosine of the angle
between the vectors”. In modern mathematics however, one usually defines new objects by
stating their fundamental properties rather than by giving an actual formula. Moreover,
in an abstract vector space we do not have an a priori meaning of concepts like that of the
angle.
Thinking about it, one may say that the dot-product is some kind of binary operation
resulting in a scalar, which is symmetric, linear in its variables, and has the further property
that v·v ≥ 0 with equality holding if and only if v = 0. The problem is that these properties
are contradictory over a complex space. Indeed, if v · v > 0, then using linearity, we have
that for the vector w = iv, the product with itself is w·w = (iv)·(iv) = i2 v·v = −v·v < 0.
It turns out that the right generalization of — which, following the physicist literature, we
shall call a scalar product — is the following.

Definition 1.1.1. Let F be either R or C and V a vector space over F. A binary operation
h·, ·i : V × V → F satisfying for all v, x, y ∈ V and λ ∈ F

i) hv, x + λyi = hv, xi + λhv, yi (i.e. linearity in the second variable),

ii) hx, yi = hy, xi,

iii) hv, vi ≥ 0 with equality holding if and only if v = 0

is called a scalar product on V . A vector space with a fixed scalar product is called a
scalar product space.

Note that already by the second listed property it follows that hv, vi must be real;
however, it still could be negative — this is why we made one more requirement. Note

3
4 CHAPTER 1. SCALAR PRODUCTS ON COMPLEX SPACES

also that the first and second listed property imply that scalar product is conjugate linear
(rather than just linear) in its first variable; that is, we have that

hx + λy, vi = hx, vi + λhy, vi.

Of course what is important here that we require linearity only in one variable. Whether
we list (hence write) this to be the first or the second variable, is a question of convention
and typography similar to choosing between writing f (x) (i.e. the function f applied to x)
or (x)f (the input x is “loaded” in f ). In fact, mathematicians usually follow a different
convention: they would call such an operation to be an inner product and they would
require linearity in its first, rather than in its second variable. Here we decided to follow
the physicist convention.

Examples. The formula


n
X
hx, yi := xk y k
k=1
n
evidently gives a scalar product on C . This is usually referred as the standard one; when
no other specification is given, this is what we usually mean by scalar product on Cn . Let
us immediately mention 2 infinite dimensional generalizations of the above formula. First,
one might try to consider infinite sequences rather than n-tuples and thus replace n by
the infinity symbol ∞ in the upper limit of the summation. However, in order for the sum
to be well-defined (resulting in a P
finite number),
Pwe then cannot allow just any sequences
∞ ∞
to appear. Indeed, for hx, yi = k=1 xk xk = k=1 |xk |2 to make sense, x needs to be a
square-absolute-summable sequence; so we should only consider elements of the space

X
2
l (N) ≡ {x : N → C | |xk |2 < ∞}.
k=1

The set l2 (N) is clearly closed under multiplication by scalars, but as |z + c|2 ≤ 2|z|2 + 2|c|2
for every z, c ∈ C, it is also closed under addition. Moreover, as for any z, c ∈ C we also
have that |zc| = |z| |c| ≤ 21 |z|2 + 12 |c|2 , our formula for the scalar product actually gives a
well-defined finite number for any pair of elements of l2 (N), turning l2 (N) to be an infinite
dimensional scalar product space.
Another infinite generalization can be obtained by replacing the summation by inte-
gration and thus define the scalar product scalar-valued functions on some (measure)space
X by the formula Z
hf, gi := f g.
X
Of course, also here, we need to make sure that the integral is well-defined and results in a
finite number. For example, we may choose the vector spaceRV := C([0, 1], C) to be the set
1
of continuous functions on [0, 1]; then the formula hf, gi := 0 f (x)g(x)dx will indeed give
a well-defined scalar product on V . Alternatively, like in our previous example, we may
consider the set of square-integrable functions on our space X. However, here some care
1.2. THE CAUCHY-SCHWARZ INEQUALITY 5

is needed, since —Runlike in the previous example where we used continuous functions on
[0, 1] — in general X |f |2 = 0 does not imply that f is constant zero. Thus one usually uses
the above integral formula to introduce a scalar product on the space L2 (X) equivalence
classes of square-integrable functions. So for example, consider two functions f and f˜ on
the real line that are square-integrable (with respect to the usual Lebesgue measure) and
differ only at a single point.R Then as functions f 6= f˜, but as elements of L2 (R), we regard
them to be the same since R |f − f˜|2 = 0.

Having a scalar product on V allows us to introduce the length of a vector v ∈ V by the


formula p
kvk := hv, vi
which we shall also the call the norm of v. Clearly, kvk ≥ 0 with equality holding if and
only if v = 0, and for any λ ∈ C
p q p
kλvk = hλv, λvi = λλhv, vi = |λ| hv, vi = |λ|kvk,

which is precisely how we expect that length should behave. However, we are also used of
another property; namely, the triangle inequality. Is that also automatic? As we shall see
in the next section, the answer is yes.

1.2 The Cauchy-Schwarz inequality


Since the cosine of an angle is a value between −1 and 1, the absolute value of the “high
school dot-product” of two vectors is always smaller or equal than the product of the
lengths of the vectors in question. It is an important observation that the same remains
true in our abstract setting.
Proposition 1.2.1 (The Cauchy-Schwarz inequality). Let V be a scalar product space and
u, v ∈ V . Then
|hu, vi| ≤ kuk kvk
with equality holding if and only if u and v are parallel1 vectors.
Proof. Suppose u and v are parallel; say v = λu for some λ ∈ C. Then

|hu, vi| = |hu, λui| = |λ||hu, ui| = |λ|kuk2 = kuk kλuk = kuk kvk.

So let us assume now that u and v are not parallel; then u + λv 6= 0 and hence

∀λ ∈ C : hu + λv, u + λvi > 0.

Expanding the above scalar product and noting that hu, λvi + hλv, ui = 2Re(hu, λvi) we
conclude that
∀λ ∈ C : kuk2 + kλvk2 + 2Re(hu, λvi) > 0.
1
i.e. one of them is a scalar multiple of the other
6 CHAPTER 1. SCALAR PRODUCTS ON COMPLEX SPACES

Then taking a real parameter t ∈ R and setting λ = thu, vi we find that the real polynomial

q(t) = kuk2 + |hu, vi|2 kvk2 t2 + 2|hu, vi|2 t

is positive for every t ∈ R. Of course, if he scalar product hu, vi is zero, we are ready since
the claimed inequality is then clearly satisfied in a strict manner (note that as u and v
are assumed to be non-parallel, in particular they cannot be zero and hence kuk and kvk
are strictly positive). On the other hand, when hu, vi 6= 0, the polynomial q is of second
degree and its strict positivity implies that its discriminant is negative:
2
2|hu, vi|2 − 4|hu, vi|2 kvk2 | kuk2 < 0

which, after simple reordering, is just the (strict version) of the claimed inequality.

Corollary 1.2.2 (Triangle inequality). ku + vk ≤ kuk + kvk

Proof. Expanding (the square of) the left-side:

ku + vk2 = hu + v, u + vi = kuk2 + kvk2 + hu, vi + hv, ui.

But hu, vi + hv, ui = 2Re(hu, vi) < 2|hu, vi| and hence using the Cauchy-Schwarz inequal-
ity

ku + vk2 ≤ kuk2 + kvk2 + 2|hu, vi| ≤ kuk2 + kvk2 + 2kuk kvk = (kuk + kvk)2 .

In a real scalar product space — which, especially in the finite dimensional case, is
usually called a Euclidean space — the Cauchy-Schwarz inequality implies that one can
introduce the notion of angle between vectors. If u, v are nonzero, then the ratio
hu, vi
kuk kvk

must be a number in [−1, 1] and hence there exists a unique α ∈ [0, π] such that

hu, vi = kuk kvk cos(α)

which, in accordance with the formula defining the “dot-product”, can be viewed as the
angle between the two nonzero vector in question.

1.3 Orthogonality and orthogonal systems


Two vectors u, v such that hu, vi = 0 are said to be orthogonal to each other; in written
notations u ⊥ v. Note that although in general hu, vi 6= hv, ui, one of these expressions
is zero if and only if so is the other one — hence orthogonality is symmetrical relation.
1.3. ORTHOGONALITY AND ORTHOGONAL SYSTEMS 7

We shall further say that two collections of vectors S1 and S2 are orthogonal to each other
when all vectors of S1 are orthogonal to all vectors of S2 ; in notations S1 ⊥ S2 . Also, we
introduce the orthogonal of a collection S of vectors of a scalar product space V is the set
S ⊥ ≡ {v ∈ V | u ⊥ v for all u ∈ V }
of all vectors that are orthogonal to every element of S. It is quite evident that
• S ⊥ is a linear subspace (regardless whether S was so or not),
• S ⊥ = (Span(S))⊥
• if S1 ⊂ S2 then S1⊥ ⊃ S2⊥ ,
• S ⊂ (S ⊥ )⊥ .
Our experience with the 3-dimensional Euclidean space suggest that we should also have
the following two more properties: if S is already a linear subspace, then S and S ⊥ are
complementary and (S ⊥ )⊥ = S. However, in infinite dimensions these are actually false
affirmations! (See exercise E 1.5.) So although we shall only need finite dimensional scalar
product spaces for our study, we do need some justification of the finite dimensional version
of these statements — we cannot just say that these are “evidently” true when in fact they
do not follow from the defining properties of a scalar product.
We shall address these questions in the next section; for the moment, we shall consider
some special collections of vectors e1 , . . . en having the property that

1 if j = k,
hej , ek i =
0 otherwise.
Such collections are said to form an ortho-normal system or in short, an ONS. Indeed,
each member of an ONS is “normalized” (as by the above relation it must be of length 1)
and elements of an ONS are pairwise orthogonal.
Lemma 1.3.1. An ONS e1 , . . . en is automatically a linearly independent set.
Proof. Suppose c1 e1 + . . . + cn en = 0 for some scalar coefficients c1 . . . cn . Then considering
the scalar product with ek for any k = 1, . . . n we have that
0 = hek , 0i = hek , c1 e1 + . . . + cn en i = c1 hek , e1 i + . . . + ck hek , ek i + . . . + cn hek , en i
= c1 0 . . . + ck 1 + . . . + cn 0 = ck .
Hence each of our scalar coefficients must be zero, which is exactly what we needed to
conclude.
Proposition 1.3.2. Let e1 , . . . en be an ONS. Then the map
n
X
P : v 7→ hek , vi ek
k=1

is a projection, Im(P ) = Span{e1 , . . . en } and Ker(P ) = Im(P )⊥ .


8 CHAPTER 1. SCALAR PRODUCTS ON COMPLEX SPACES

Proof. Since the scalar product is linear in its second variable P is indeed a linear map.
It is also quite evident that Im(P ) ⊂ Span{e1 , . . . en }, as by our defining formula P v is
actually given as a linear combination of the e vectors. Moreover, as for every j = 1, . . . n
n
X
P ej = hek , ej i ek = 0 e1 + . . . + 1 ej + . . . 0 en = ej
k=1

we have that e1 , . . . en ∈ Im(P ) and so actually Im(P ) = Span{e1 , . . . en }. Then the


previous computation shows that P acts identically on the whole of its image and hence
P 2 v = P P v = IP v = P v meaning that P 2 = P is a projection. Finally, as for the last
claim consider that v ∈ Ker(P ) if and only if
n
X
0 = Pv = hek , vi ek ⇔ ∀k : hek , vi = 0
k=1

as by our previous lemma the vectors e1 , . . . en are linearly independent. Thus Ker(P ) =
{e1 , . . . en }⊥ = (Span{e1 , . . . en })⊥ = (Im(P ))⊥ .
Corollary 1.3.3. If V is a finite dimensional scalar product space, then there exists an
ONS e1 , . . . en ∈ V which is also a basis of V .
Proof. We take a nonzero vector 0 6= v ∈ V and make the first vector of our ONS to be its
1
normalized version: e1 := kvk v. Then, if we find another unit-vector which is orthogonal
to our first element of our ONS, we set that to be our second element e2 and we continue
in this manner: whenever we find a unit-vector which is orthogonal to all elements of our
so-far-obtained ONS, we add that as next element.
This procedure will surely stop at some point: an ONS is linearly independent, thus
its cardinality cannot exceed the dimension of V . But when will this exactly happen?
Suppose at moment our ONS e1 , . . . ek is not a spanning set in V : there exists a vector
v ∈ V such that v ∈ / Span{e1 , . . . ek }.
Consider the projection P introduced in the previous proposition for which Im(P ) =
Span{e1 , . . . ek } and Ker(P ) = (Im(P ))⊥ . We have that the vector v − P v 6= 0 (as v ∈ /
Span{e1 , . . . ek }) and that it is actually in the orthogonal of our ONS (since P (v −P v) = 0
and so it is in Ker(P ) = (Im(P ))⊥ ). So setting ek+1 := kv−P 1
vk
(v − P v) creates a larger
ONS. Thus the conclusion is that when the procedure stops (and it indeed does stop after
finite many step), it gives an ONS which is a spanning set and hence is also a basis.
Such a basis discussed above is usually called an ortho-normal basis, or in short an
ONB. Note that if E = (e1 , . . . enP
) is an ONB in V , then by the formula appearing in the
main proposition of this section, nk=1 hek , vi ek = v for all v ∈ V . Thus we can just read
off the coordinates of v in this basis:
 
he1 , vi
(v)E =  ..  .
hen , vi
1.4. ORTHOGONAL PROJECTIONS 9

1.4 Orthogonal projections


The image and kernel of the projection P appearing in the main proposition of the last
section are orthogonal to each other. Projections with this property are usually called
ortho-projections. In general, to give a projection, one has to give two complementary
subspaces; fixing only the image does not fully determine to projection. With ortho-
projections the situation is different.

Lemma 1.4.1. Let U and W be complementary subspaces. Then U ⊥ W implies that


U ⊥ = W and W ⊥ = U .

Proof. It is clear that it suffices to show one of the relations (the other one follows by
exchanging the role of U and W ). So suppose U ⊥ W ; then of course U ⊥ ⊃ W . For
the other containment, consider a vector v ∈ U ⊥ . Since U and W are complementary,
there exists a decomposition v = u + w in which u ∈ U and w ∈ W . Then using the
orthogonality relations,

kuk2 = hu, ui + 0 = hu, u + wi = hu, vi = 0

showing that u = 0 and hence that v = w ∈ W .

Corollary 1.4.2. Let P be an ortho-projection. Then Ker(P ) is not only orthogonal, but
it is precisely the orthogonal of Im(P ). Hence an ortho-projection is fully determined by
the subspace its project onto.

Another important consequence of our lemma is that there exists an ortho-projection


onto a subspace U if and only if U and U ⊥ are complementary. Thus, in infinite dimensions
in general one cannot consider the ortho-projection onto an arbitrary subspace! On the
other hand, when U and U ⊥ are complementary, then by our lemma both are the orthogonal
of each other and hence (U ⊥ )⊥ = U .

Proposition 1.4.3. Let U be a finite dimensional subspace of a (possibly infinite dimen-


sional) scalar product space. Then the ortho-projection onto U exists and hence U and U ⊥
are complementary and (U ⊥ )⊥ = U .

Proof. By what was explained in the last chapter, there exists a finite ONB in U which
then can be used to explicitly construct the desired projection (see the main proposition
of the last section). The rest has been discussed just before our claim.

We finish this section by mentioning two more important properties of ortho-projections,


whose proofs are left for the reader (see them among the exercises just listed below). First,
that a projection P is an ortho-projection if and only it does not increase the length of any
vector. Second, that the ortho-projection P onto a subspace U actually assigns to each
vector v the closest point of U to v.
10 CHAPTER 1. SCALAR PRODUCTS ON COMPLEX SPACES

Exercises
E 1.1. If the vectors of a scalar product space v, u ∈ V are orthogonal, then one can easily
show that ku + vk2 = kuk2 + kvk2 . Is the converse true; i.e. does this “Pythagorean”
equality imply that the vectors in question must be orthogonal?

E 1.2. Can there be 4 (nonzero) vectors in a Euclidean space so that the angle between
any two larger or equal than 120◦ ?

E 1.3. Use the Cauchy-Schwarz inequality to show that for any a, b, c ≥ 0 we have
√ √ √ p
a + b + a + c + b + c ≤ 6(a + b + c).

When do we have precise equality?

E 1.4. Use the Cauchy-Schwarz inequality to show that


Z ∞ 2 Z ∞
1
f (x)dx ≤
x4 |f (x)|2 dx .

2 24 2

When do we have precise equality?

E 1.5. Let D ⊂ l2 (N) be the subspace formed by the “finite” sequences; i.e. the ones
having only finitely many nonzero terms. What is D⊥ ? Is it true that (D⊥ )⊥ coincides
with D? Is it true that D and its orthogonal are complementary?

E 1.6. Show that if U, W are linear subspaces in finite dimensional scalar product space,
then U ⊥ ∩ W ⊥ = (U + W )⊥ and U ⊥ + W ⊥ = (U ∩ W )⊥ . What can be said about the
validity of these relations in infinite dimensions?

E 1.7. We have seen that an ortho-normal system of vectors is always linearly independent.
Let us loosen the condition of “strict” orthogonality: show that if the vectors v1 , . . . vn+1
are of unit length with |hvj , vk i| < 1/n for all j 6= k, then they are linearly independent.

E 1.8. Let P ∈ Lin(V ) be a projection. Prove that P is an ortho-projection if and only if


kP vk ≤ kvk for every v ∈ V .

E 1.9. Let P be an ortho-projection. Prove that P v is precisely the point of the subspace
Im(P ) which is the closest to v.

E 1.10. Let P ∈ Lin(V ) be a projection. Prove that P is an ortho-projection if and only


if hv, P wi = hP v, wi for every v, w ∈ V .
Chapter 2

The adjoint

2.1 Definition and elementary properties


Let V be a finite dimensional scalar product space and A ∈ Lin(V ). An A∗ ∈ Lin(V ) such
that for all v, w ∈ V :
hv, Awi = hA∗ v, wi,
is said to be an adjoint of A. Note that although in the above equation A appears on
the right side, whereas A∗ on the left side of the scalar product, there is nothing specific
about that: by conjugating both sides we can swap their role. Thus our definition in simple
words is the following: A∗ is an adjoint of A if it can replace the action of A by acting on
the other side of the scalar product.
In introducing our new mathematical concept, we used the wording “an adjoint”. How-
ever, as we shall immediately see, the definition singles out a unique operator; hence from
now on we shall talk about the adjoint.

Lemma 2.1.1. If A∗1 and A∗2 are both adjoints of A, then A∗1 = A∗2 .

Proof. Using the defining property of the adjoint we find that h(A∗1 − A∗2 )v, wi = 0 for
all v, w ∈ V . In particular, setting w := (A∗1 − A∗2 )v, this implies that (A∗1 − A∗2 )v = 0
for all v ∈ V which means that A∗1 and A∗2 are the same.

After discussing uniqueness, let us turn to the question of existence. Of course here one
could start by simply trying to show that for each A ∈ Lin(V ) there exists a corresponding
adjoint A∗ . However, more than just knowing existence, what we would really prefer is to
be able to compute the adjoint in a concrete manner; i.e. to have a simple formula for A∗ .
For this reason, let us pick an ONB E = (e1 , . . . en ) in V and consider how an operator
X ∈ Lin(V ) “looks like” in this basis. The k th column of the matrix (X)E is simply the
coordinates of Xek in our ONB. Thus, by the remark at the end of section 1.3, the entry
value at the j th row, k th column is

((X)E )j,k = hej , Xek i.

11
12 CHAPTER 2. THE ADJOINT

So suppose we have an operator A with an adjoint A∗ . Then, by what was explained

((A∗ )E )j,k = hej , A∗ ek i = hAej , ek i = hek , Aej i = ((A)E )k,j .

To put it in simple words: in an ortho-normal basis, the matrix of A∗ must be equal to


the conjugate-transpose of the matrix of A. At this point, one can actually take this as a
given formula and easily check that it indeed gives an adjoint.
Corollary 2.1.2. Any operator A of a finite dimensional scalar product space admits a
(unique) adjoint A∗ . In any ortho-normal basis, the matrix of A∗ is the transpose-conjugate
of the matrix of A.
We finish this section by listing some elementary, but important properties. We leave
their justification for the reader; but they all follow in a straightforward manner from
definition and / or the formula involving the conjugate-transpose.

Elementary properties of the adjoint. Let V be a finite dimensional scalar product


space, A, B ∈ Lin(V ) and λ a scalar. Then
• (λA)∗ = λA∗ ,

• (A + B)∗ = A∗ + B ∗ ,

• (AB)∗ = B ∗ A∗ ,

• (A∗ )∗ = A,

• I ∗ = I,

• ∃A−1 ⇔ ∃(A∗ )−1 and in this case (A∗ )−1 = (A−1 )∗ ,

• Ker(A∗ ) and Im(A) are the orthogonal of each other, and hence

• Ker(A) and Im(A∗ ) are also the orthogonal of each other,

• Tr(A∗ ) = Tr(A),

• det(A∗ ) = det(A).

2.2 Normal operators


We will often use some operators that “behave” in some special manners with respect
to taking adjoints. In particular, we shall need the concept of self-adjoint, unitary and
positive operators.
Definition 2.2.1. An operator
• A such that A∗ = A is said to be self-adjoint,
2.2. NORMAL OPERATORS 13

• U such that U ∗ = U −1 is said to be unitary,


• A such that ∃X : A = X ∗ X is said to be positive; in notations we write A ≥ 0.
We begin our investigation by making some remarks about these classes. First let
us note, that by exercise E 1.10, a projection P is an ortho-projection if and only if it
is also self-adjoint: P = P 2 = P ∗ . Moreover, in this case P = P ∗ P , so actually every
ortho-projection is a positive operator.
Another thing to observe is that if A = X ∗ X, then A∗ = (X ∗ X)∗ = X ∗ (X ∗ )∗ = X ∗ X =
A; thus, positive operators form a subset of the set

S(V ) ≡ {A ∈ Lin(V)| A = A∗ }

of self-adjoint operators. Note that S(V ) is a real-linear subspace of Lin(V): if A, B are


self-adjoint, then so is A + B and tA for any real number t. (However, a complex multiple
of a self-adjoint operator in general will not remain self-adjoint.) On the other hand, while
sums remains inside, multiplication leads out of S(V ): if A = A∗ and B = B ∗ then

(AB)∗ = B ∗ A∗ = BA;

thus, the product remains self-adjoint if and only if A and B commute. This is the exact
opposite of what we have with unitary operators. Indeed, for example both I and −I are
unitary, but I + (−I) = 0 is of course not unitary. Thus the sum of the unitary operators
U1 , U2 need not be unitary; however

(U1 U2 )∗ = U2∗ U1∗ = U2−1 U1−1 = (U1 U2 )−1 ,

showing that the product does remain unitary. Moreover, if U ∗ = U −1 , then

(U −1 )∗ = (U ∗ )∗ = U = (U −1 )−1 ,

so the inverse also remains unitary. Thus, the set of unitary operators on V form a group
under composition.
An easy, but fairly important observation is that both unitary and self-adjoint operators
are such, that they commute with their adjoint. This is a common, nontrivial property
(random examples with matrices will quickly convince the reader that in general X and
X ∗ do not commute). Since in what follows we shall only use this property, it worth to
introduce a word for this; we shall say, that an operator
• N such that N ∗ N = N N ∗ is normal.
This extra class does not have such a nice structure as those of the self-adjoint operators
(that form a real linear subspace) or the unitary ones (that form a group under composi-
tion). Actually, when doing quantum physics, we shall not need to consider operators that
are just normal. However, since both unitary and self-adjoint operators are also normal, by
making statements with them we can avoid to repeat propositions for the separate classes
that are important for us.
14 CHAPTER 2. THE ADJOINT

Proposition 2.2.2. Suppose N ∗ N = N N ∗ . Then kN vk = kN ∗ vk for every v ∈ V .


Proof.

kN vk2 = hN v, N vi = hv, N ∗ N vi = hv, N N ∗ vi = hN ∗ v, N ∗ vi = kN ∗ vk2 .

Corollary 2.2.3. Suppose N ∗ N = N N ∗ and N v = λv for some λ ∈ C. Then N ∗ v = λv.


Thus, every eigenvector of N is also an eigenvector of N ∗ .
Proof. An easy check shows that if N is normal, then so is N − λI. (Indeed, the identity
commutes with every operator.) Thus, if N v = λv for some λ ∈ C, then (N − λI)v = 0
and hence
kN ∗ v − λvk = k(N − λI)∗ vk = k(N − λI)vk = 0,
showing that N ∗ v = λv.
Proposition 2.2.4. Eigenvectors associated to different eigenvalues of a normal operator
are pairwise orthogonal.
Proof. Suppose N is normal, N v = λv and N w = µw for some λ 6= µ scalars. Then by
the previous corollary, N ∗ w = µw and hence

λhw, vi = hw, λvi = hw, N vi = hN ∗ w, vi = hµw, vi = µhw, vi.

Thus, λhw, vi = µhw, vi and hence (λ − µ)hw, vi = 0, which – since λ − µ 6= 0 – implies


that hw, vi must be zero.
So far it did not really matter if we considered our statements on real or complex spaces.
However, for what will follow, it is crucial to use the field of complex numbers instead of
that of the real ones.
Proposition 2.2.5. Let V be a finite dimensional complex scalar product space. Then
an operator N on V is normal if and only if there exists an ONB of V consisting of
eigenvectors of N only.
Proof. Let us begin with the “only if” direction; assume that N is a normal operator. An
important observation is that an operator X over a complex vector space of dimension
∈ [1, ∞) must have at least one eigenvalue; this is because of the fundamental theorem of
algebra, the characteristic polynomial λ 7→ det(X − λI) must have a root then in C. So
also our normal operator N must have an eigenvector that we can normalize (divide by
its length), which shows that we can find some nonempty ortho-normal system of vectors
e1 , . . . ek that can be formed out of eigenvectors of N .
Now suppose this system is actually a maximal one and let w be a vector which is
orthogonal to each member of this ONS. Then

hej , N wi = hN ∗ ej , wi = hλj ej , wi = λj hej , wi = 0


2.2. NORMAL OPERATORS 15

where we have used that N ej = λj ej for some scalar λj (as the vectors e1 , . . . ek are all
eigenvectors) and our earlier corollary by which N ej = λj ej implies that N ∗ ej = λj ej .
This shows that the subspace W := {e1 , . . . ek }⊥ is invariant under N ; i.e. that N maps
vectors of W to vectors of W . Thus, if W was at least one-dimensional, then the restriction
of N to W would still have an eigenvector; say 0 6= w ∈ W . But then e1 , . . . ek , w/kwk
would be an even larger ONS consisting of eigenvectors of N , contradicting to the assumed
maximality. Therefore, W must be zero-dimensional, and hence e1 , . . . ek must actually
form a basis.
For the other direction — for the “if” part of the statement — the argument is simpler
and actually it works regardless whether our space is complex or real. Indeed, suppose
that there exists an ONB consisting of eigenvectors of N . Then the matrix of N in this
basis is a diagonal one and hence also its transpose-conjugate is a diagonal matrix. But
two diagonal matrices always commute, so N must commute with N ∗ .
Let us note that the “only if” part of the previous statement becomes false if one
considers real spaces instead of complex ones. Indeed, let V = R2 with its standard scalar
product and consider the 90◦ anticlockwise rotation; i.e. the multiplication with the matrix
 
0 −1
R= .
1 0
This is easily seen to be unitary (or, as it is usually called in the real case: an orthogonal
transformation) and hence also normal. However, it has not a single eigenvector, so clearly
one will find no ONB of R2 consisting of eigenvectors of R. In some sense of course R does
have eigenvalues — namely i and −i — but these are not real numbers and hence any
eigenvector of R must contain some non-real entry, too.
The previous proposition can be reformulated in terms of spectral decompositions, too.
By what was established, eigenvectors of a normal operator span the full space, so every
normal operator admits a spectral decomposition.
A spectral projection Pλ projects onto the eigenspace associated to the eigenvalue λ
along the sum of the other eigenspaces and as we proved in this section, eigenvectors
of associated to different eigenvalues of a normal operator are orthogonal. Thus, each
spectral projection appearing in the spectral decomposition of a normal operator is an
ortho-projection.
Corollary 2.2.6. Let V be a finite dimensional complex scalar product space. ThenP an
operator N on V is normal if and only if it admits a spectral decomposition N = λ λPλ
and each projection Pλ appearing in the spectral decomposition is an orthogonal one.
P
Proof. We have just discussed the “only if” direction. On the other hand, if N = λ λPλ
is a spectral decomposition and each projection Pλ is an orthogonal one, then
!∗
X X X
N∗ = λPλ = (λPλ )∗ = λPλ
λ λ λ
since every ortho-projection is self-adjoint. Thus N commutes with N ∗ since each is a
linear combination of the same collection of commuting projections.
16 CHAPTER 2. THE ADJOINT

2.3 Spectral and further characterizations


In the previous section we introduced self-adjoint, positive and unitary operators by their
algebraic properties. However, as we shall see now — at least in the complex case — they
admit some other nice characterizations, too. We begin with an observation regarding
eigenvalues of the different classes of operators under consideration.

Proposition 2.3.1. Let V be a finite dimensional (real or complex) scalar product space
and N an operator on V . Then

• if N is self-adjoint, then Sp(N ) ⊂ R,

• if N is positive, then Sp(N ) ⊂ R+


0 ≡ [0, ∞),

• if N is unitary, then Sp(N ) ⊂ {z ∈ C | |z| = 1}.

Proof. In words, the claim is that eigenvalues of self-adjoint operators are real, eigenvalues
of positive operators are nonnegative and eigenvalues of unitary operators are of unit
absolute value. So let us assume that λ is an eigenvalue of N and v 6= 0 is a corresponding
eigenvector. Then
hv, N vi
λ=
hv, vi
where the denominator is a positive number. Thus, if N = N ∗ then hv, N vi = hN v, vi
and hence
hv, N vi hN v, vi
λ= = = λ,
hv, vi hv, vi
showing that λ must be real. If N is positive, then N = X ∗ X for some operator X and
hence hv, N vi = hv, X ∗ Xvi = hXv, Xvi ≥ 0 which, by our expression λ, implies the
non-negativity of λ, too. Finally, if N is unitary, then N ∗ N = I so

|λ|2 kvk2 = kλvk2 = kN vk2 = hN v, N vi = hv, N ∗ N vi = hv, vi = kvk2

showing that |λ| = 1.

Clearly, the above spectral properties alone cannot give in general complete character-
izations. First, because looking at the eigenvalues only does not reveal anything about the
angle between the eigenspaces — and we know that all these types of operators are normal
and hence should have orthogonal eigenspaces. So to have some new characterizations, at
least we should require normality. But there is also another problem. The spectral decom-
position theorem for normal operators that we discussed in the last section holds only in
the complex case. In fact, if we are over the field of real numbers, then of course we cannot
have non-real eigenvalues; however, it is clear that even in the real case, not every normal
operator is self-adjoint. Thus, for turning the above properties into characterizations, we
also need to be above complex number.
2.3. SPECTRAL AND FURTHER CHARACTERIZATIONS 17

Theorem 2.3.2. Let V be a finite dimensional complex scalar product space and N a
normal operator on V . Then
• if N is self-adjoint if and only if Sp(N ) ⊂ R,

• if N is positive if and only if Sp(N ) ⊂ R+


0 ≡ [0, ∞),

• if N is unitary if and only if Sp(N ) ⊂ {z ∈ C | |z| = 1}.


Proof. For each type we only need to prove the “if” part (as we have already dealt with the
“only if” part in the previous proposition). Since N is normal and we are over a complex
scalar product space, by Prop. 2.2.5 there exists an ONB B in which
 
λ1 0 0 . . .
 0 λ2 0 . . . 
M = (N )B = Diag(λ1 , λ2 , . . .) ≡ 
 0 0 λ3 . . .

... ... ... ...


t
is diagonal with λ1 , λ2 . . . ∈ Sp(N ). Then (N ∗ )B = M = M , so if all eigenvalues are real
then (N ∗ )B = M and hence N ∗ = N . On the other hand
  t  
λ1 0 0 ... λ1 0 0 ... |λ1 |2 0 0 ...
2
0 λ2 0 . . .  0 λ2 0 . . . 0 |λ2 | 0 . . .
(N N ∗ )B = 
    
= .
 0 0 λ3 . . .  0 0 λ3 . . .  0 0 |λ3 |2 . . .
... ... ... ... ... ... ... ... ... ... ... ...

Hence if |λ| = 1 for every λ ∈ Sp(N ), then N N ∗ = I showing that N in this case is
unitary. Finally, as N admits a spectral decomposition, if Sp(N ) ⊂√R+ 0 then one can

apply the root function : R+0 → R +
0 to N . The resulting operator N is still normal,
as apart
√ from a √re-labeling, its spectral projections coincide with those of N . However,
as Sp( N ) = { λ|λ ∈ Sp(N )} is still a subset of reals, from what we have already
√ √ ∗√ √ √ √ 2
established, N is a self-adjoint operator and actually N N = N N = N = N
showing that in this case N is a positive operator.
Note that since positive operators form a subset of self-adjoint one which in turn form
a subset of normal operators, in the positive case the above characterization also results
the following slightly different one.
Corollary 2.3.3. Let V be a finite dimensional complex scalar product space and A an
operator on V . Then A ≥ 0 if and only if A∗ = A and Sp(A) ⊂ R+
0.

In practice, when dealing with given matrices, it is easier to check self-adjointness then
normality, so often instead of the characterization given by Theorem 2.3.2, we shall use
the one provided by our previous corollary.
Let us discuss now some characterizations based on scalar product values rather than
eigenvalues. Suppose A = X ∗ X is a positive operator on V . Then hv, Avi = hv, X ∗ Xvi =
18 CHAPTER 2. THE ADJOINT

hXv, Xvi ≥ 0 is a nonnegative real for every v ∈ V . If instead of positivity, we only


assume that A = A∗ is a self-adjoint, then

hv, Avi = hAv, vi = hv, A∗ vi = hv, Avi

showing that hv, Avi is still at least a real number. This latter property in the real case of
course is of not much use (since in that case every scalar product value is real). However,
it is nice to know that in the complex case it actually gives a characterization.

Theorem 2.3.4. Let V be a finite dimensional complex scalar product space and A an
operator on V . Then A = A∗ if and only if hv, Avi is real for all v ∈ V .

Proof. By what we have already discussed, we only need to show the “if” direction. So
suppose hv, Avi is real for all v ∈ V ; then

h(x + y), A(x + y)i − h(x + y), A(x + y)i


hx, Ayi + hy, Axi =
2
is real. Replacing y by iy, in a similar way we can then also conclude that the expression
= hx, Ayi − hy, Axi = −i(hx, Aiyi + hiy, Axi) is purely imaginary. Hence using that

hx, Ayi + hy, Axi hx, Ayi − hy, Axi


hx, Ayi = +
2 2

we find that hAy, xi = hx, Ayi =

hx, Ayi + hy, Axi hx, Ayi − hy, Axi hx, Ayi + hy, Axi hx, Ayi − hy, Axi
+ = − = hy, Axi.
2 2 2 2
Thus hAy, xi = hy, Axi for every x, y ∈ V , showing that the adjoint of A is itself.

Corollary 2.3.5. Let V be a finite dimensional complex scalar product space and A an
operator on V . Then A ≥ 0 if and only if hv, Avi is a nonnegative real for all v ∈ V .

Proof. Again, by what we have discussed earlier, we only need to show the “if” direction.
So assume hv, Avi ≥ 0 for all v ∈ V . Then in particular by the previous theorem A
is self-adjoint. If λ ∈ Sp(A), then there is a corresponding eigenvector v 6= 0 such that
Av = λv and then by our assumption λ = hv,Avi hv,vi
is a nonnegative. Thus the self-adjoint
operator A has nonnegative eigenvalues only and hence by Cor. 2.3.3 it is actually a positive
operator.
We finish this section with two further characterizations of unitary operators. The first
is rather straightforward to show while the second one is easily deduced from the first one
by noticing that scalar product values can be expressed with norms; in the real case, we
have
kx + yk2 − kx − yk2
hx, yi =
4
2.3. SPECTRAL AND FURTHER CHARACTERIZATIONS 19

and in the complex case we have


kx + yk2 − kx − yk2 + ikx − iyk2 − ikx + iyk2
hx, yi = .
4
Both the above equalities (which are called the polarization identities) and the following
two characterizations are left for the reader to be verified.
Theorem 2.3.6. Let V be a (real or complex) finite dimensional scalar product space and
U an operator on V . Then U is unitary if any (and hence both) properties hold:
• hU x, U yi = hx, yi for all x, y ∈ V ,
• kU vk = kvk for all v ∈ V .
In words, what the above theorem says is that U ∈ Lin(V ) is unitary if and only if
it preserves the scalar product or the norm. Thus the unitary elements of Lin(V ) are
precisely the automorphisms of V when it is viewed as scalar product or normed space.

Exercises
E 2.1. Let A and B be self-adjoint operators. Show that AB = 0 if and only if the images
of A and B are orthogonal subspaces.
E 2.2. Let
   
i + 1 3i − 1 i + 1 0 3−i 0
A = i − 1 2 1  and B = 3 + i −1 −2 .
i−1 1 1 0 −2 2
Find all values of t ∈ R for which the equation

A + t2 A∗ + tB − X ∗ X = 0

admits a solution, and for each such value present an X ∈ M3,3 (C) satisfying the equation.
E 2.3 (Polar decomposition). Show that for any operator (of a finite dimensional space)
X there exists a positive operator A ≥ 0 and a unitary one U = (U ∗ )−1 such that

X = U A.

Show further that the positive operator A appearing in the above decomposition is uniquely
determined by X, whereas the unitary U is uniquely determined if and only if ∃X −1 .
E 2.4. We proved that if on a complex (finite dimensional) scalar product space an operator
N is normal then there exists an ONB consisting of eigenvectors of N only. Though in the
real case this is false, prove that if in particular N is self-adjoint, then it still follows that
there exists an ONB consisting of eigenvectors of N only.
20 CHAPTER 2. THE ADJOINT
Chapter 3

Operator positivity

3.1 Fundamental properties


We gave three equivalent characterizations of the positivity of an operator A given on
a finite dimensional complex scalar product space V : an algebraic one (which was our
definition), a spectral one and finally one involving scalar product values. One may ask:
which one is the best? The answer, of course, is that it depends on the situation.
In what follows we state and prove some important properties of positive operators.
The key is to use for each of them a suitable characterization of positivity: when looking
it from the right point of view, the proofs become short and straightforward. For example,
we shall begin by showing that the sum of positive operators is again a positive operator.
However, the algebraic point of view does not seem to help much here. Indeed, given that
A is of the form A = X ∗ X and B is of the form A = Y ∗ Y , how are we going to find a
Z such that A + B = Z ∗ Z? Neither the spectral way seems promising: given Sp(A) and
Sp(B) in general does not allow to figure out the eigenvalues of A + B.
Proposition 3.1.1. Sums and nonnegative multiples of positives are positive: if t ∈ R+
0,
A, B ∈ Lin(V ) and A, B ≥ 0 then tA + B ≥ 0.
Proof. By our third characterization of positivity, for any v ∈ V we have that

hv, (tA + B)vi = thv, Avi + hv, Bvi ≥ t0 + 0 ≥ 0

showing that tA + B ≥ 0.
Proposition 3.1.2. For a positive operator A ≥ 0 we have Av = 0 ⇔ hv, Avi = 0.
Proof. The implication to the right is trivial; it does not need any assumption of positivity.
As for the other direction, using the algebraic definition of positivity, we have that A =
X ∗ X for some X and

0 = hv, Avi = hv, X ∗ Xvi = hXv, Xvi

showing that Xv = 0 and Av = X ∗ Xv = X ∗ (Xv) = X ∗ 0 = 0.

21
22 CHAPTER 3. OPERATOR POSITIVITY

If Av = 0 and Bv = 0 then of course (A + B)v = 0, but in general the other direction


is false. However, considering the previous proposition and the third characterization of
positivity, it clearly does follow when A, B are positive operators.
Corollary 3.1.3. If A, B ≥ 0 then Ker(A + B) = Ker(A) ∩ Ker(B) and hence by using
that Im(X)⊥ = Ker(X ∗ ) we also have that Im(A + B) = Im(A) + Im(B).
Consider the matrix M of a positive operator A ≥ 0 in an ONB B = (b1 , . . . bn ), which
we shall just simply call a positive matrix. (Note that M is a positive matrix if and
only if the multiplication by M , when viewed as a linear map on Cn which is considered
with its standard scalar product, is a positive operator.) The value at the (j, k) − th entry
is Mj,k = hbj , Abk i; so first of all Mk,k is a nonnegative real, and second, if it is equal to
zero, then (by the same proposition we just used) Abk = 0 and hence Mj,k = Mk,j = 0 for
every j. So we actually proved the following.
Lemma 3.1.4. Every diagonal element of a positive matrix is a nonnegative real. More-
over, if a diagonal element is zero, then the full column (and hence also row) containing
that element consists of zeros, only.
The previous lemma often helps to “spot” something which is not positive. Suppose
we are give the following five matrices:
         
−1 1 1 1 3 1+i 1/2 i 3 i
, , , , .
1 1 1 0 1−i 2 −i 1/2 i 2

Which one could be positive? Our original definition (according to which positives are
of the form X ∗ X) in itself does not seem to give an easy way to answer this question.
However, since we know that positives are also self-adjoint, we can immediately rule out
the last one, whose transpose conjugate is not itself. Moreover, the first one has a negative
number on its diagonal, and the second one has a zero at a diagonal place with a further
non-zero element in the same row; so according to our lemma, we can rule out also those.
This leaves us with just 2 matrices: the 3rd and 4th ones. They are self-adjoint and have no
“evident” problems, so we might decide to use our spectral characterization and compute
their eigenvalues. The eigenvalues of the 3rd one turn out to be 1 and 4, so it is indeed a
positive matrix. On the other hand, the 4th matrix has also a negative eigenvalue (−1/2),
so it is not a positive matrix.
Apart from such uses, the previous lemma also has an important consequence regarding
trace values. Let us formally state this.
Corollary 3.1.5. For a positive operator A ≥ 0 we have that Tr(A) ≥ 0 with equality
holding if and only if A = 0.
We can also easily establish the non-negativity of the determinant. However, to do so,
this time the best is to use the original definition.
Proposition 3.1.6. For a positive operator A ≥ 0 we have that det(A) ≥ 0.
3.2. GRAMM-MATRICES 23

Proof. If A is of the form A = X ∗ X, then

det(A) = det(X ∗ X) = det(X ∗ )det(X) = det(X)det(X) = |det(X)|2 ≥ 0.

We finish this section with the uniqueness of a positive square-root. Both for estab-
lishing its existence (which we have “accidentally” also done in the last chapter, when we
proved the equivalence of the definition and the spectral characterization) and to conclude
its uniqueness we shall use the spectral decomposition given by the spectral characteriza-
tion.

Proposition 3.1.7. Let A be a positive operator. Then there exists a unique positive
operator B whose square is A.

Proof. Since A ≥ 0, it is a normal operator √ and thus have a spectral decomposition. As


+ +
Sp(A)
√ ⊂ R0 , one can apply
√ the√function · : R 0 → R+0 to obtain another normal operator
+
√A. 2 Its spectrum Sp( A)+ = { t|t ∈ Sp(A)} ⊂ R0 √ so it is still a positive operator, and as
2
( ·) is the identity on R0 ⊃ √Sp(A), we have that ( A) = A. For similar reasons, √ if B √ is
any positive operator, then B 2 = B; hence if B ≥ 0 and B 2 = A, then B = B 2 = A
which proves the uniqueness part of the claim.

3.2 Gramm-matrices
Let V be a scalar product space over F = R or C and v1 , . . . vn ∈ V a collection (more
precisely: a list) of vectors. The matrix n × n square-matrix G defined by the formula

Gj,k ≡ hvj , vk i

is called the Gram matrix of the system of vectors v1 , . . . vn . Considering Fn with its
standard scalar product, we have that
n n n
! n
! n
X X X X X
hc, Gci = ck Gk,l cl = ck hvk , vl icl = h c k vk , c l vl i = k ck vk k2 ≥ 0
k,l=1 k,l=1 k=1 l=1 k=1

for any c ∈ Fn . This shows that G is a positive matrix. This also works in the other
direction; if M is an arbitrary positive matrix, then there exists a matrix X such that
G = X t X and it is straightforward to check that G is actually the Gram matrix of the
system of vectors formed by the columns of X. Thus, every Gram matrix is positive and
every positive matrix is the Gram matrix of a vector collection.
Considering the Gram matrix G of a system of vectors v1 , . . . vn is a simple but powerful
trick which is often the key in solving certain type of problems (see exercise E 3.5 for an
example). The reason for its usefulness lies in the way it encodes some essential properties
of our collection of vectors. In particular, we have the following.
24 CHAPTER 3. OPERATOR POSITIVITY
Pn Pn Pn
• The trace of the Gram matrix is Tr(G) = k=1 Gk,k = k=1 hvk , vk i = k=1 kvk k2 .

• The trace of the square of the Gram matrix is


n
X n
X n
X
2
Tr(G ) = Gk,j Gj,k = hvk , vj i hvj , vk i = |hvk , vj i|2 .
k,j=1 k,j=1 k,j=1

• The maximum number of linearly independent colums of G coincides with the max-
imum number of linearly independent elements of v1 , . . . , vn ; that is, the rank of the
Gram matrix is
rk(G) = dim(Span{v1 , . . . , vn }).
P
Indeed, if vj is Pexpressed asPa linear combination P vj = k6=j ck vk , then Gl,j =
hvl , vj i = hvl , k6=j ck vk i = k6=j ck hvl , vk i = k6P c G
=j k l,k for all l = 1, . . . , n. As
for the other direction, now suppose that Gl,j = ck Gl,k for all l = 1, . . . , n.
k6=jP
Then, going backward, we get that the difference vj − k6=j ck vk has a zero scalar
product with all of the vectors vl for l = 1, . . . , n and hence it is orthogonal to
the subspace Span{v1 , . . . , vn }. However, as it is also evidently contained P in this
subspace, this difference must be zero and hence we must have that vj = k6=j ck vk .

Finally — as we shall not use this — without proof we also mention here that in the real
case, the determinant of the Gram matrix is the square of the volume of the parallelotope
formed by the vectors v1 , . . . , vn .

3.3 Operator and trace inequalities


Given two self-adjoint operators A and B we say that they satisfy the operator inequality
A ≥ B if A − B is a positive operator. If A − B ≥ 0 then Tr(A) − Tr(B) = Tr(A − B) ≥ 0,
hence the operator-inequality A ≥ B implies the trace-inequality Tr(A) ≥ Tr(B) (note
the trace of a self-adjoint operator is real, so this is indeed an inequality between two
real values). On the other hand, it is easy to find examples of a self-adjoint operator C
which is not positive but has a positive trace; thus operator-inequality is stronger than
trace-inequality.

Proposition 3.3.1. Operator-inequality is a partial-ordering; that is,

• for every A = A∗ , we have that A ≥ A,

• A ≥ B and B ≥ A implies that A = B,

• A ≥ B and B ≥ C implies that A ≥ C.

Moreover, operator-inequalities can be multiplied by a t ∈ R+


0 and can be added; that is,
A1 ≥ B1 and A2 ≥ B2 implies that A1 + A2 ≥ B1 + B2 and tA1 ≥ tB1 .
3.3. OPERATOR AND TRACE INEQUALITIES 25

Proof. Most of the above affirmations are fairly easy to show and are left to the reader
to check. Here we only comment on one point; namely, that both transitivity and the
claim regarding the sum of operator-inequalities relays on the fact that the sum of positive
operators is positive. Indeed, the operator-inequalities A ≥ B and B ≥ C mean that A−B
and B − C are positive operators and hence that (A − B) + (B − C) = A − C ≥ 0, too.
What we have learned so far about operator-inequalities makes one to think that it is
something similar to inequalities between real numbers. Some differences however must be
pointed out. First, operator-inequality is only a partial-ordering. Indeed, if for example the
self-adjoint operator A has both a positive and a negative eigenvalue, then neither A, nor
−A is not a positive operator. Second, while operator-inequalities can be summed, they
cannot be squared for example. That is, whereas for two nonnegative reals a, b ≥ 0 the
inequality a ≥ b implies that a2 ≥ b2 , for two positive operators A, B ≥ 0 the inequality
A ≥ B does not imply that A2 ≥ B 2 . In fact, such discrepancies lead to the concept
of operator-monotone functions. A function f : R → R (or f : R+ 0 → R) is said to be
operator-monotone on self-adjoint (or on positive) operators, if for any A, B self-adjoint
(or positive) operators
A ≥ B ⇒ f (A) ≥ f (B).
If f is operator-monotone then in particular it is monotone on 1 × 1 matrices; i.e. it is also
monotone in the conventional sense. However, the mentioned case of squares show that
the in general the contrary is false.
There are several existing characterizations allow us to find a wide variety of operator-
monotone functions. For example, if 0 < q ≤ 1, then so is the function x 7→ xq . However,
such theorems are outside of our present scope and most of them also require some further
tools that we did not discuss here. Nevertheless, even with what we have, as a good
example and as an interesting problem, the reader can try to find an elementary proof

showing that is operator-monotone on positive operators.
Let us now move on to the topic of trace-inequalities. If A, B are self-adjoint then
(AB)∗ = B ∗ A∗ = BA, hence the product is again self-adjoint if and only if A and B com-
mute. Nevertheless, Tr(AB) = Tr((AB)∗ ) = Tr(BA) = Tr(AB), showing that although
not necessarily self-adjoint, the product still has a real trace.
Since a positive operator is in particular self-adjoint, by what was just explained, the
product of positive operators cannot be in general a positive operator. However, regarding
trace-values we have the following.
Proposition 3.3.2. Let A, B be positive operators. Then Tr(AB) ≥ 0 with equality holding
if and only if AB = 0.
Proof. We may assume A = X ∗ X and B = Y ∗ Y for some operators X and Y . Then,
setting Z := XY ∗ , we have that
Tr(AB) = Tr(X ∗ XY ∗ Y ) = Tr(XY ∗ Y X ∗ ) = Tr(Z ∗ Z) ≥ 0,
as the trace of the positive operator Z ∗ Z is nonnegative. Let us now move on to the part
regarding equality. Moreover, by corollary 3.1.5, equality implies that Z ∗ Z = 0. It is
26 CHAPTER 3. OPERATOR POSITIVITY

interesting on its own to note, that Z ∗ Z = 0 ⇔ Z = 0. Indeed, one implication is trivial,


whereas for the other one consider that

0 = Z ∗ Zv ⇒ 0 = hv, Z ∗ Zvi = hZv, Zvi ⇒ Zv = 0.

Therefore if the trace value in question is zero, then Z = 0 and hence AB = X ∗ XY ∗ Y =


X ∗ ZY = 0 (the converse implication, AB = 0 ⇒ Tr(AB) = 0 is of course trivial).

As was mentioned, trace-inequality is weaker, than operator-inequality. Using our


previous proposition, we can now present a nice example highlighting the difference between
them. Suppose A and B are positive operators and A ≥ B. Then, although in general the
operator-inequality A2 ≥ B 2 does not follow, we have that

Tr(A2 − B 2 ) = Tr((A + B)(A − B) + AB − BA) = Tr((A + B)(A − B)) ≥ 0

since both A + B and A − B are positive operators. This shows that A ≥ B ≥ 0 does imply
that Tr(A2 ) ≥ Tr(B 2 ). We now move on to discuss a simple but important construction.

Proposition 3.3.3 (The Hilbert-Schmidt scalar product). For a finite dimensional scalar
product space V the formula hA, Bi := Tr(A∗ B) defines a scalar product on Lin(V ).

Proof. It is rather evident that the above formula is linear in B. Moreover,

hA, Bi = Tr(A∗ B) = Tr((B ∗ A)∗ ) = Tr((B ∗ A)) = hB, Ai.

Finally, by corollary 3.1.5 and by the remark made in the proof of the previous proposition;
namely, that Z ∗ Z = 0 implies Z = 0, it follows that

hA, Ai = Tr(A∗ A) ≥ 0

with equality holding if and only if A = 0.

It is easy to check that the Hilbert-Schmidt scalar product satisfies the following two
additional properties:

i) hA, Ai = hA∗ , A∗ i, ii) hA, XBi = hX ∗ A, Bi.

We finish our discussion by mentioning, that the Cauchy-Schwarz inequality, when applied
to the Hilbert-Schmidt scalar product, also gives an important trace-inequality:

|Tr(AB)|2 = |hA∗ , Bi|2 ≤ hA∗ , A∗ i hB, Bi = Tr(A∗ A)Tr(B ∗ B).


3.4. THE CONE OF POSITIVES 27

3.4 The cone of positives


So far we were concerned with relations between “individual” operators. We will now
investigate geometrical properties of various sets of operators. We shall denote by S(V )
and by S + (V ) the set of self-adjoint and the set of positive operators (respectively) on our
finite dimensional complex scalar product space V .
As we have already noted earlier, real-linear combinations of self-adjoint operators
remain self-adjoint; thus S(V ) is a real-linear subspace of Lin(V ). Thinking about the
number of (real) parameters in a matrix which coincides with its transpose-conjugate, one
can easily see that dim(S(V )) = d2 where d = dim(V ). This — namely that apparently
S(V ) has the same dimension than Lin(V ) and yet clearly S(V ) ( Lin(V ) — is not a
contradiction, since we are comparing real and complex dimensions. In fact, as a real
vector space, the dimension of Lin(V ) would be (2d)2 , which is clearly greater than d2 .
For A, B ∈ S(V ) the Hilbert-Schmidt product hA, Bi = Tr(AB) ∈ R; thus S(V ) is
naturally a Euclidean space — we have all our usual geometrical notions including angle.
So how does the set of positive operators S + (V ) “look like” in this space? Its apex is at
the origin (the zero operator), and from our geometrical point of view, what Prop. 3.3.2
says is that the angle between any two rays of this cone is acute. Actually, somewhat
more is true: this cone is self-dual. This means that apart from having the property that
hA, Bi ≥ 0 for any two elements A, B ∈ S + (V ) (i.e. that we have an acute angle between
A and B), we have that any operator X which in this sense forms acute angles with all
elements of the cone S + (V ) must already lie in this cone. However, before we turn this
into a proposition regarding trace values, we shall introduce a notation which is often used
in the physicist letterature as it will be quite handy in the proof of this duality property.
For any pair of vectors v, w ∈ V we define the symbol |vihw| to stand for the map

|vihw| : V 3 u 7→ hw, uiv ∈ V.

Since the scalar product is linear in its second variable, our map is in fact a linear operator;
|vihw| ∈ Lin(V ). What makes this formalism nice is that all sort of equalities that the
symbol seems to suggest turn out to be valid formulas. In particular, the reader can easily
justify the following properties:
• |vih(w1 + w2 )| = |vihw1 | + |vihw2 | and |(v1 + v2 )ihw| = |v1 ihw| + |v2 ihw|,

• |vihλw| = λ |vihw| and |λvihw| = λ |vihw|

• |vihw|∗ = |wihv|,

• |vihw| |xihy| = hw, xi |vihy|,

• X |vihw| = |Xvihw| and |vihw| X = |vihX ∗ w|.


Some further things to note are the following:
• A = |vihv| ≥ 0 since hx, Axi = |hx, vi|2 ≥ 0 for every x ∈ V ,
28 CHAPTER 3. OPERATOR POSITIVITY

• if e1 , . . . ek is an ONS then by Prop. 1.3.2, the operator kj=1 |ej ihej | is precisely
P
ortho-projection onto the subspace spanned by our ONS,
• in particular, if e1 , . . . en is an ONB then nj=1 |ej ihej | = I and if v is a unit-length
P
vector then |vihv| is the ortho-projection onto the line given by v.
It is worth to consider the case when our vector space is Cn with its standard scalar
product. Let us think of elements of Cn as column vectors and interpret the symbol hw| to
be the row vector (w1 , w2 , . . . wn ) and regard |vi ≡ v to be an “ordinary” column vector.
Then  
(w1 , w2 , . . . wn ) v1
n
 v2  X
hw| |vi =  =
. . .  w k vk
k=1
vn
is the scalar product between w and v, whereas writing them in the reversed order gives
   
v1 (w1 , w2 , . . . wn ) w1 v1 w2 v1 . . . wn v1
 v2   w1 v2 w2 v2 . . . wn v2 
|vihw| =  
. . .  =
 ...

... ... ... 
vn w1 vn w2 vn . . . wn vn

an n × n matrix; it is easy to see that this is exactly (the concrete matrix realization of)
the linear operator |vihw| we introduced in an abstract manner before. Considering the
matrix realization, we can also note one more important property:
• Tr(|vihw|) = hw, vi.
Let us now return to the question of self-duality.
Proposition 3.4.1. Let V be a finite dimensional scalar product space and X ∈ Lin(V).
Then X ∈ S + (V ) if and only if Tr(AX) ≥ 0 for all A ∈ S + (V ).
Proof. The “if part” has been already dealt with. So suppose Tr(AX) ≥ 0 for all positive
operators. Then in particular for the positive operator X = |vihv| we have

0 ≤ Tr(AX) = Tr(A|vihv|) = Tr(|Avihv|) = hv, Avi

which – as the vector v was arbitrary – shows that A ≥ 0.

3.5 The convex body of density operators


Every nonzero positive operator has a strictly positive trace. Thus, every nonzero positive
operator can be uniquely “scaled” to have trace equal to 1; if 0 6= A ≥ 0, then the multiple
tA where t = 1/Tr(A) is a positive operator with trace one. In geometrical terms, this
means that the hyperplane of operators {X ∈ Lin(V )|Tr(X) = 1} intersects every ray of
the cone S + (V ) of positive operators exactly once.
3.5. THE CONVEX BODY OF DENSITY OPERATORS 29

It worth to give a schematical illustration


of the relative position of the cone and the hy- Figure 3.1: The intersection of the cone
+
perplane. Note though that the printed figure S with the hyperplane {Tr = 1}
you find here is indeed just an illustration: the
exact shape of the cone and the set obtained by the intersection is not claimed to coincide
with what you see in our simplistic draw.
The operators in the intersection are called density operators. That is, by density
operator we mean a positive operator with trace equal to 1. We shal denote the set of
density operators by

S1+ (V ) ≡ {A ∈ Lin(V )|A ≥ 0, Tr(A) = 1}.

Exercises
E 3.1. Show by example that if A and B are positive operators such that A ≥ B, then it
does not follow that A2 ≥ B 2 . Show also that if A and B commute, then the implication
in question becomes true.

E 3.2. Let A and B be two self-adjoint operators. Show that Tr((AB)2 ), Tr(A2 B 2 ) are
real and satisfy the inequality

|Tr((AB)2 )| ≤ Tr(A2 B 2 ).
√ √
E 3.3. Show that if A, B are positive operators and A ≥ B, then A≥ B.

E 3.4. Suppose V is a finite dimensional vector space on which we are given two (possibly
different) scalar products. Show that there exists a basis of V whose members are pairwise
orthogonal with respect to both scalar products.

E 3.5. Let v1 , . . . v16 be sixteen unit-length vectors of a scalar product space V . Show
that if
1
|hvj , vk i| <
3
for all j 6= k, then dim(V ) ≥ 7.

E 3.6. Let A and B be two positive operators. Show that there exists a t > 0 such that
tA ≥ B if and only if Im(B) ⊂ Im(A).

E 3.7. Let |X| := X ∗ X. Show that |Tr(X)| ≤ Tr(|X|).

E 3.8. Let A be a self-adjoint operator


p with unit trace. Show that the density operator ρ
minimizing the distance d(ρ, A) = Tr((ρ − A)2 ) must be a function of A.

You might also like